Supplemental figure legends

Figure S1. Related to Figure 1. (A) Saturation analysis for evaluating ChIA-PET data quality. The left and middle panels represent RNA Pol II ChIA-PET from this study. The right panel represents RAD21 ChIA-PET from ENCODE. By employing a down-sampling strategy, the total ChIA-PET sequencing reads were randomly portioned into 10%, 20%, …, 80%, 90% sub-samples representing varying sequencing depth, which were then individually used to do the interaction-calling separately. Then, interactions detected from sub-samples were compared with interactions detected from the total sequencing reads. (B) Length of interactions across the four cell lines. Three criteria of the PETs (>= 1, 2 or 3) are shown in pink, blue and green. The y axis is distance transformed by log10. The blue, red, and green lines connecting the violins signify the median, mean, and Huber M-estimator, respectively. The black dotted line distinguishes TAD interactions (less than or equal to 1MB) from longer interactions. (C) The number of chromatin interactions between different genomic elements across four cell lines, including interaction types between promoter, body, other region and enhancers. The color density shows the ratio of each type of interactions. (D) Distributions of PET numbers along with the distance between peak anchors in the four cell lines. The length of PET is transformed to log10bp. Anova test was used for multiple groups comparison. Boxplot represents median, 0.25 and 0.75 quantiles with lines at 1.5x interquartile range. Distribution of enhancers per gene (E) and per enhancer (F) in the four cell lines. (G) The number of enhancers for each cell line (yellow) and the number of enhancers that display RNA Pol II interactions with gene promoters (orange). (H) Correlation between transcript abundance and enhancers per gene. The transcript abundance was measured by FPKM transformed by log2. The correlation efficient was calculated by spearman correlation and p-value is labeled. (I) The interactions are well fitted as NB distribution in four cells. The PETs scores were normalized as the raw pairs among per 10k region peak per 10 million PETs. The distributions of chromatin interactions were fitted as NB distribution by using Matlab fitdist toolbox. The fitting goodness was tested by using K-S test. (J) Motif analysis of promoter and enhancer regions in the four cell lines. Upper panel: motif enrichment of enhancer regions in the four cell lines. The type of motif sequences and p-values are labelled. Lower panel: motif enrichment of promoter regions in the four cell lines.

Figure S2. Related to Figure 1. Interaction comparison. The top, left panel displays all shared and unique interactions. The middle-left and bottom-left panels divide these interactions into “short” and “long”, respectively. The analysis was repeated excluding interactions that only have one PET (middle columns) and excluding interactions that have only one or two PETs (right columns). (B-C) Enriched pathways and (GO) terms based on RNA-seq and ChIA- PET data, upon comparison with the benign RWPE-1 cells. (B) Enriched GO terms for differential gene expression. (C) Enriched GO terms for differential chromatin interactions. Blue: comparison between DU145 and RWPE-1; red: comparison between LNCaP and RWPE-1; yellow: comparison between VCaP and RWPE-1.

Figure S3. Related to Figure 2. Integrated genome view of HOXB13 gene and its adjacent regions in LNCaP cells. (A) The data tracks represent RNA-seq, ChIP-seq for CTCF, FOXA1, AR, H3K27ac and RNA Pol II, ChIA-PET for RNA Pol II and RAD21, and HiC data in the LNCaP cell line. (B) ChIA-PET contact heatmap representing RNA Pol II (left) and RAD21 (right) associated chromatin interactions for the HOXB13 gene and neighborhood regions. The HOXB13 gene is shown in light green color.

Figure S4. Related to Figure 3. (A) Immunoblot analysis to determine the effect of treatment with various doses of BETi, JQ1 for 24 hours on the steady-state levels of AR in LNCaP and VCaP cells. MYC is used as a positive control and actin is the loading control. The antibody against AR recognizes its N-terminus. (B) Immunoblot representing the steady-state levels of AR in four cell lines using antibodies recognizing its N- and C-terminus, respectively. (C) Amplification status analysis of AR, AR enhancers peaks, MYC and FOXA1 in mCRPC patient samples by ddPCR. (D) Correlation of all 6 AR enhancer peaks amplification versus AR gene copy number amplification in mCRPC patient samples. (E) Correlation of all 6 AR enhancer peaks gain versus AR gene copy number gain in mCRPC patient samples. (F) Correlation between AR copy number and AR gene expression (R = 0.43, P = 4x10–6). (G) Integrated genome view of a genomic region (chrX:66389092-66738319) using whole genome sequencing data of a CRPC sample. A 349227 bp deletion is discovered which is supported by 139 reads out of total 7775 reads in the region.

Figure S5. Related to Figure 3. Transcriptional regulation of FOXA1 gene and its neighborhood genes. (A-B) Integrated genome view of FOXA1 and its adjacent regions based on data of RNA-seq, ChIP-seq of CTCF, FOXA1, AR, H3K27ac and RNA Pol II, ChIA-PET of RNA Pol II are shown in LNCaP and VCaP cell lines, respectively. Additionally, phospho RNA Pol II and ERG ChIP- seq is described for VCaP cells. The FOXA1 gene is highlighted in light-blue color. (C) Immunoblot analysis showed Increased expression of FOXA1 in LNCaP and VCaP cell lines. Actin is the loading control.

Figure S6. Related to Figure 5. Integrated genome view of KLK3 gene and its neighborhood regions representing RNA-seq, CTCF, FOXA1, AR, H3K27ac and RNA Pol II ChIP-seq and RNA Pol II ChIA-PET data from LNCaP (A) and VCaP cells (B), respectively. In addition, phospho RNA Pol II and ERG ChIP-seq is described for VCaP cells. The KLK3 gene and up-stream regions are highlighted in light blue color.

Figure S7. Related to Figure 5. Integrated genome view of gene KLK3 and its adjacent regions based on H3K27ac ChIP-seq data. H3K27ac ChIP-seq in 96 datasets show universal peaks in the constitutively active distal enhancer of KLK3. The KLK3 gene and up-stream regions are highlighted in light-blue color.

Figure S8. Related to Figure 6. (A-B) Integrated genome view of gene MYC and its adjacent regions representing RNA-seq, CTCF, FOXA1, AR, H3K27ac and RNA Pol II ChIP-seq, and RNA Pol II ChIA-PET are shown for LNCaP (A) and VCaP cells (B), respectively. In addition, phospho RNA Pol II and ERG ChIP-seq is described for VCaP cells. The MYC gene locus is highlighted in light- blue color. SNP sites located in MYC neighborhood are shown. (C-D) Integrated genome view of EZH2 locus and its adjacent regions based on data from RNA- seq, ChIP-seq of CTCF, FOXA1, AR, H3K27ac and RNA Pol II, ChIA-PET of RNA Pol II are shown in LNCaP (C) and VCaP (D) cells, respectively. Additionally, phospho RNA Pol II and ERG ChIP-seq is described for VCaP cells. The EZH2 gene locus is highlighted in light-blue color. (E) Integrated genome view of gene MYC and its adjacent regions based on H3K27ac ChIP-seq data in a panel of cell lines. The MYC gene region is highlighted in light-blue color. (F) This scatterplot shows the correlation between RNA expression level of MYC and the maximum enhancer peak score within the region shown in panel E (chr8: 127,800kb-129,400kb). All samples which have expressions of MYC (TPM>1) are retained in this plot, and the enhancer score is measured by the maximum ChIP-seq peak height. The correlation efficient is calculated by Pearson correlation and p-value is labeled. (G) Integrated genome view of gene EZH2 and its adjacent regions based on H3K27ac ChIP-seq data in a panel of cell lines. The EZH2 gene region is highlighted in light-blue color. (H) This scatterplot shows the correlation between RNA expression level of EZH2 and the maximum enhancer peak score within the region shown in panel G (chr7: 148,400kb- 149,400kb). All samples which have expression of EZH2 (TPM>1) are included in this plot, and the enhancer score is measured by the maximum ChIP-seq peak height. The correlation efficient is calculated by Pearson correlation and p-value is labeled.

Figure S9. Related to Figure 9. (A) siRNA mediated knockdown of FAM57A, GEMIN4 and VPS53 genes resulted in a modest increase in the viability of LNCaP cells as shown by Cell titer glow analysis conducted 7 days post siRNA treatment. (B) Validation of siRNA-based gene knockdown in LNCaP cells. qRT- PCR analysis of the expression of VPS53, GEMIN4 and FAM57A genes with the treatment of siRNA as indicated. (* P<0.05, ** P<0.01, *** P<0.001, **** P<0.0001, by two-tailed Student's t-test; Error bars, standard deviation of 3 technical replicates).

Figure S1 (Page 1) A ChIA-PET data generated in this study ENCODE ChIA-PET data VCaP RNA Pol II LNCaP RNA Pol II LNCaP RAD21 100% 100% 100%

80% 80% 80%

60% 60% 60%

40% 40% PET=1 40% PET=1 PET=1 PET=2 PET=2 PET=2 20% Percentage of PETs 20% PET=3 Percentage of PETs 20% PET=3 Percentage of PETs PET=3 PET>=4 PET>=4 PET>=4 0% 0% 0% 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Ratio of sequencing depth Ratio of sequencing depth Ratio of sequencing depth

RWPE-1 RNA Pol II DU145 RNA Pol II DU145 RAD21 100% 100% 100%

80% 80% 80%

60% 60% 60%

40% PET=1 40% PET=1 40% PET=1 PET=2 PET=2 PET=2 20% 20% 20% Percentage of PETs PET=3 Percentage of PETs PET=3 Percentage of PETs PET=3 PET>=4 PET>=4 PET>=4 0% 0% 0% 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Ratio of sequencing depth Ratio of sequencing depth Ratio of sequencing depth

4286 6718 B C 7934 8865 19857 8330 7884 3562 7203 11808 10 PETs >= 1 PETs >= 2 PETs >= 3 Promoter 21870 Gene Body 7959 8 3523 3444 RWPE-1 3962 4204 Ratio 6 LNCaP 8979 4749 VCaP 3123 1858 4 0 DU145 2610 2298 2 Other Region 0.2 Log10Distance(bp) 2402 873

VCaP VCaP VCaP 0.4 LNCaP DU145 LNCaP DU145 LNCaP DU145 RWPE−1 RWPE−1 RWPE−1 5497 4286 Median length 8374 7934 Mean length 10731 19857 Huber M-estimator length 2703 7152 7203 Enhancer 11157 Promoter 24974 6842 D

RWPE1 LNCaP 15 Anova, p < 2.2e−16 Anova, p < 2.2e−16 10 5

10 bp ) 0 VCaP DU145 th (lo g 15 Anova, p < 2.2e−16 Anova, p < 2.2e−16 Len g 10 5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 PET number Figure S1 (Page 2) E F Enhancer number per gene Gene number per enhancer 4000 14000 RWPE-1 RWPE-1 3500 LNCaP 12000 LNCaP VCaP VCaP 3000 DU145 10000 DU145 2500 8000 2000 1500 6000 Gene number

1000 Enhancer number 4000 500 2000 0 0 1 2 3 4 5 6 7 8 9 1011121314151617181920>20 0 0 1 2 3 4 5 6 7 8 9 1011121314151617181920>20 Enhancer usage Targeted gene number

G H

100K Total enhancer number 90K Enhancers interacted with 80K genes’ promoters 70K 60K 50K 40K 30K Enhancer number 20K 10K 0 RWPE-1 LNCaP VCaP DU145

I DU145 VCaP 0.3 0.3 DU145 VCaP (mean, var)=(10.54, 99.86) (mean,var)=(5.20, 23.54) y y Fitted NB(9.27, 0.72) 0.2 Fitted NB(10.75, 0.6) 0.2 P-value 1.26e-07 P-value 1.08e-08 uen c uen c re q re q F F 0.1 0.1

0 0 0 10 20 30 40 50 0 10 20 30 40 50 PETs Score PETs Score

LNCaP RWPE-1 0.3 0.3 LNCaP RWPE1 (mean,var)=(4.65, 11.36) (mean,var)=(7.93, 43.82)

y y Fitted NB(9.18, 0.58) 0.2 Fitted NB(9.93, 0.72) 0.2 P-value 6.26e-08 P-value 4.02e-07 uen c uen c re q re q

F 0.1 F 0.1

0 0 0 10 20 30 40 50 0 10 20 30 40 50 PETs Score PETs Score Figure S1 (Page 3) J

RWPE-1 LNCaP VCaP DU145 Enhancer Enhancer Enhancer Enhancer Fra1(bZIP), P-value=1e-514 FOXA1(Forkhead), P-value=1e-285 CTCF(Zf), P-value=1e-323 Fra1(bZIP), P-value=1e-414 GA C AT G C GA G C T A G T A C AG AC C A TGA TA GG CA AA C T G TA G C G G G AT AT A T TT C T G A C T A T T C A C A T T C T C A CA T C C G C A G T TGACC TG C GC C G G T G T T C A C C A T T A T C A C G T A A T GT CT G C CC T A T A G AAA A GCC CT TG T A T A p53(p53), P-value=1e-58 NF1(CTF), P-value=1e-74 FOXA1(Forkhead), P-value=1e-305 CTCF(Zf), P-value=1e-27 CA CC GA TC GA TC A CT CTG A ATAG AC G CC A T TTGA TTG T AGACT CC T AA AT A T G AT C A G G T T AT T A T C A G G T CA A T GAAA T T A T TT A G TT A G A T CCC A C A C G G G A CG C T C C G G A CA T C C GC G T GC A G C T AG T CG T C G C G A C T G C G CA C G C A C G A T A A A C G G T G G C CT C T TG CC T G C GG CC T GATAAATA C C G TEAD4(TEA), P-value=1e-44 HOXB13(Homeobox), P-value=1e-54 GRHL2(CP2), P-value=1e-167 ETV1(ETS), P-value=1e-24 CCA G A T GG GC G CG TT A CA TC A A T G A GA C C T T A T T T CA T C A A T G T A G T C A C G C A T AT TG C C G T G CT T T GT CG C A CCG G A C C C A G A AA TC T C C A G A T TG GCG C GG TT A G AT TTTAT AG CAAGAA GCTTG A AGGAA Fli1(ETS), P-value=1e-31 GRHL2(CP2), P-value=1e-50 NF1(CTF), P-value=1e-156 TEAD(TEA), P-value=1e-24 C T CG TAA CA T CTG GAC G A TT T TC CT AGACT A CC C G C G T CCA T A AT CG T T T G C G G T T C CG C A T A G C AA A T T T T G TG CG CG T A T T AAG G C G AG T CG A T G A A C T A G C G TT A T G CA C T A T C G G TA A A A GA C G T GAT C ACA TCCAAG G C GG CC GTAGGAATTA ERG(ETS), P-value=1e-30 EHF(ETS), P-value=1e-48 ETV1(ETS), P-value=1e-120 A G A G GTA C GA C C C T C G G G CG C TACA AAT T CCG GAG A T C C C A A C A G TT A GG A TA A A A A G G TGCGGAAA GGA KLF5(Zf), P-value=1e-30 Jun-AP1(bZIP), P-value=1e-48 HOXB13(Homeobox), P-value=1e-106 G CC A AA C TTA G G T T TG CAG G A T T G CCT GT GAGT A T T GT C T C AC C C A AA A A CG CA C GAGG GGG TGA TCA TTTAT AG CTCF(Zf), P-value=1e-22 CTCF(Zf), P-value=1e-32 Fli1(ETS), P-value=1e-104 ATAG AC G CC AT G C CAC T CC T G TAA C AG AC C A G G G G G C T A G C A T TT T C T A A T A C G A G G T C G C C T A CA T GC A G C G T C G A T TT T A TT A C A C A TT T C T A CA T C C G A C C A G G G C G CT T C G A G T T C C G T A A A C CT G GCCCTCT TGG AC TCC AG Fli1(ETS), P-value=1e-22 GRE(NR), P-value=1e-71 CAC T C A TTC T G GGC AA G GAG C G T C G C T A A T T G GC T T A T A C T G G CTA T A A A C G T T AGT G ACA ATCCAAG G AC TGT C

RWPE-1 LNCaP VCaP DU145 Promoter Promoter Promoter Promoter Atf3(bZIP), P-value=1e-21 Elk4(ETS), P-value=1e-38 Elk4(ETS), P-value=1e-57 NFY(CCAAT), P-value=1e-29 G C CC C C T AA TTA TAC T TAC TG C C A C C T G T G A G G CC G GGG AGT AGT T T T T A CG TGA TCA GCGAT GTAA GCGAT GTAA G G C C C T C T C C A C GA GTCCAAT CAA Elk4(ETS), P-value=1e-20 C T ELF1(ETS), P-value=1e-35 ETS(ETS), P-value=1e-47 TAC C AGT CGG A Fli1(ETS), P-value=1e-25 T T GCGAT GTAA A AA TCC GC T CG T C C G A AC C G A A GG A CTCA T A C T T CT A G A C G G CGG AG G C G G AA A G T GCC TTT T GAG CTA G T ACA ATCCAAG ELF1(ETS), P-value=1e-13 NFY(CCAAT), P-value=1e-25 NFY(CCAAT), P-value=1e-37 A YY1(Zf), P-value=1e-23 GAC T A CG T C C G GG A G A CA C A CG G C T A CA C GA G T C A G G C C C G A C G A A G G A TA G A AA C G CC CA C C GA AAT GTCCAAT CAA AA ATGGC G NFY(CCAAT), P-value=1e-11 YY1(Zf), P-value=1e-24 YY1(Zf), P-value=1e-37

A ELF1(ETS), P-value=1e-22 A CG G TA A G C C C C G G C G G C A C A A A C C A A A G A A TA C G GA T C G A C C T G A T C C G G A GG A C C A C GG G GG A CCAAT AA AT AA AT CT CGGAAG

BMYB(HTH), P-value=1e-6 BMYB(HTH), P-value=1e-12 TFE3(bHLH), P-value=1e-7 Sp1(Zf), P-value=1e-14 CT T CCA CT T CCA CC T G G T G G A G C GA T C GA T C G CT C T C T C T G G A A A T A A T A A A G G T A TG A T A T C G C T A A C CG C AC AC T AAC G AAC G ATCACTGATGA GT GC CCGCCC YY1(Zf), P-value=1e-5 NRF(NRF), P-value=1e-8 NRF(NRF), P-value=1e-6 Atf3(bZIP), P-value=1e-11 G CC G G A C TTA G T T AG C T C C A G C T C A T A A T C CC GT A GG A A GTA C G AA ATGGC GC GCGCATGCGC TCGCGCATGCGC TGA TCA TFE3(bHLH), P-value=1e-7 Sp1(Zf), P-value=1e-5 E2F4(E2F), P-value=1e-10 CC G C G T G CTA C T G ATC C C G G G G A CCG A ATT A A A A T A TG G T C C C AT A C CG C A T T TCG ATCACTGATGA GT GC CCGCCC AGCCGC A NRF(NRF), P-value=1e-7 G CT TCGCGCATGCGC Figure S2 A

B Enriched gene ontology terms for C Enriched gene ontology terms for differential gene expression differential chromatin interactions

10 12 14 16 0 1 2 3 4 5

0 2 4 6 8

developmental process metabolic process biological adhesion primary metabolic process nucleobase-containing compound cell adhesion metabolic process cellular process nitrogen compound metabolic process biosynthetic process phosphate-containing compound metabolic process organelle organization cellular metabolic process cellular component biogenesis regulation of catalytic activity RNA metabolic process DU145 cellular component organization or biogenesis regulation of phosphate metabolic process LNCaP cellular process transmembrane receptor tyrosine kinase VCaP signaling pathway translation cytoskeleton organization chromatin organization cell differentiation transcription, DNA-dependent cell-cell signaling regulation of cell cycle cellular component organization regulation of molecular function phosphate-containing compound metabolic process anion transport mRNA processing cellular component morphogenesis oxidative phosphorylation mesoderm development cell proliferation system development circadian rhythm cellular component movement transcription elongation from RNA Pol II promoter RNA splicing, via transesterification reactions ion transport protein folding metabolic process transcription from RNA Pol II promoter lipid metabolic process mitochondrial transport ectoderm development mRNA splicing, via spliceosome negative regulation ofapoptotic process mitochondrion organization chromatin assembly regulation of cell cycle Figure S3

A 217 kb 46,740 kb 46,780 kb 46,820 kb 46,860 kb 46,900 kb 46,940 kb LNCaP Chr17 RNA-Seq [0 - 1000] [0 - 0.31] CTCF [0 - 1.27] FOXA1 [0 - 5.19] AR [0 - 7.24] ChIP-Seq H3K27ac [0 - 436] RNA Pol II

RNA Pol II ChIA-PET

RAD21 ChIA-PET

RefSeq Genes PRAC1 HOXB13 TTLL6 CALCOCO2 TAD

HiC

B RNA Pol II ChIA-PET RAD21 ChIA-PET 10.0 46.95 46.95 8.0 46.9 46.9 6.0 46.85 46.85 4.0 46.8 46.8 Chr17 Mb (Resolution: 10kb) Chr17 Mb (Resolution: 10kb) 2.0 Chr17 Mb (Resolution: 10kb) 46.75 46.75 0.0 46.75 46.8 46.85 46.9 46.95 46.75 46.8 46.85 46.9 46.95 Figure S4 A B LNCaP VCaP RWPE-1 LNCaP VCaP DU145 [AR-N] JQ1(µM) 0 0.1 0.5 1 10 0 0.1 0.5 1 10 AR

AR ARv

[AR-C] AR cMyc

Actin

β- Tubulin

Color Key C and Histogram FOXA1 140 MYC 120 Peak4 100 Peak6 80 Peak5 Count 60 AR 40 Peak1 20 Peak3 0 Peak2 2 1 0 1 2 Value D E F

80 Peak1, r=0.96, p=5.7E-9 5 10 Peak2, r=0.95, p=1.5E-8 umber

umber Peak3, r=0.94, p=5E-8 Peak4, r=0.96, p=5.7E-9 4 8 y n y n 60 p

p Peak5, r=0.93, p=1.2E-7 Peak6, r=0.97, p=5.3E-10 3 6

40 2 4

xpression log2(FPKM) Peak1, r=0.63, p=1.9E-4 Peak2, r=0.49, p=5.6E-3 20 2 1

Peak3, r=0.55, p=1.8E-3 AR e Peak4, r=0.63, p=1.9E-4 AR enhancer peak co

AR enhancer peak co Peak5, r=0.52, p=0.003 0 0 0 Peak6, r=0.83, p=2E-8 20 40 60 1 2 3 0 20 40 60 80 AR copy number AR copy number AR CNA G

chrX : 66,385,332 - 66,392,252 chrX : 66,734,857 - 66,741,777 LP2000302-DNA_A01.RG.bam RefSeq Genes

Deletion: chrX:66389092-66738319 (139 supporting reads/total 7775 reads – 1.8%) Figure S5

A 1,408 kb 37,400 kb 37,600 kb 37,800 kb 38,000 kb 38,200 kb 38,400 kb 38,600 kb Chr14 [0 - 1000] RNA-Seq [0 - 0.44] CTCF [0 - 10] FOXA1 [0 - 8.95]

LNCaP AR

ChIP-Seq [0 - 6.56] H3K27ac [0 - 389] RNA Pol II

RNA Pol II ChIA-PET B [0 - 1000] RNA-Seq [0 - 0.43] CTCF [0 - 0.33] FOXA1 [0 - 0.62] AR [0 - 5.50] H3K27ac VCaP

ChIP-Seq [0 - 0.48] ERG [0 - 707] RNA Pol II [0 - 1.63] RNA Pol II phosp

RNA Pol II ChIA-PET

RefSeq Genes SLC25A21 NR_033240 MIPOL1 FOXA1 TTC6 LINC00517 SSTR1

C

RWPE-1 LNCaP VCaP DU145

FOXA1 * Actin

* Non specific bands Figure S6

774 kb A 51,000 kb 51,100 kb 51,200 kb 51,300 kb 51,400 kb 51,500 kb 51,600 kb 51,700 kb LNCaP Chr19 [0 - 3000] RNA-Seq [0 - 0.31] CTCF [0 - 1.27] FOXA1 [0 - 8.69] AR [0 - 2.76] ChIP-Seq H3K27ac [0 - 593] RNA Pol II RNA Pol II ChIA-PET

B 774 kb 51,000 kb 51,100 kb 51,200 kb 51,300 kb 51,400 kb 51,500 kb 51,600 kb 51,700 kb VCaP Chr19 [0 - 3000] RNA-Seq [0 - 0.22] CTCF [0 - 0.69] FOXA1 [0 - 1.19] AR [0 - 9.44] H3K27ac [0 - 1.64] ChIP-Seq ERG [0 - 1208] RNA Pol II [0 - 3.93] RNA Pol II phospho

RNA Pol II ChIA-PET

GPR32 LINC01869 RefSeq Genes JOSD2 SNAR-FC19orf81 C19orf48 KLK3 KLK5 KLK9 KLK13 CTU1 MIR8074 SNPs rs2735839 Figure S7

774 kb 774 kb 51,000 kb 51,200 kb 51,400 kb 51,600 kb 51,000 kb 51,200 kb 51,400 kb 51,600 kb Chr19 Chr19 [0 - 5804] Ascending aorta, ENCFF002VCM [0 - 96] 22Rv1, ENCFF620QMB [0 - 61] SK-N-SH, ENCFF020DAV [0 - 47] OCI-LY1, ENCFF643VAM [0 - 4605] Spleen, ENCFF036NIB [0 - 252] RWPE1, ENCFF648UKZ [0 - 158] MCF-7, ENCFF051GGA [0 - 282] Gastrocnemius medialis, ENCFF649TBC [0 - 34] Thyroid gland, ENCFF051GYP [0 - 33] Thoracic aorta, ENCFF652JBU DOHH2, ENCFF656KBE [0 - 217] Neutrophil, ENCFF065VTJ [0 - 605] PC-9, ENCFF670FVV [0 - 208] Upper lobe of left lung, ENCFF071QUR [0 - 52] [0 - 213] SK-N-SH, ENCFF073KYI [0 - 303] Neuroepithelial stem cell, ENCFF678YWF [0 - 125] Endothelial cell of umbilical vein, ENCFF089DEG [0 - 77] Right lobe of liver, ENCFF680QZX [0 - 125] Uterus, ENCFF090IUI [0 - 113] Fibroblast of arm, ENCFF704JGE [0 - 723] GM23338, ENCFF132AIF [0 - 50] HeLa-S3, ENCFF706HLO [0 - 143] Stomach, ENCFF134NHD [0 - 159] SU-DHL-6, ENCFF710FJI [0 - 138] Fibroblast of lung, ENCFF138ZMZ [0 - 41] Heart left ventricle, ENCFF729BAZ [0 - 74] Body of pancreas, ENCFF147LSN [0 - 24] Vagina, ENCFF736SWP Panc1, ENCFF750VPZ [0 - 25] A549, ENCFF153QRX [0 - 362] [0 - 78] Sigmoid colon, ENCFF156QWL [0 - 117] Prostate, ENCFF753JCJ [0 - 54] Adrenal gland, ENCFF162SNB [0 - 7.65] Keratinocyte, ENCFF757GRD OCI-LY7, ENCFF764QRU [0 - 167] HEK293, ENCFF175HIN [0 - 182] [0 - 353] H1-hESC, ENCFF766NYI [0 - 62] CD14-positive monocyte, ENCFF195PCP [0 - 55] RWPE2, ENCFF770MSW [0 - 322] Adrenal gland, ENCFF217RXL [0 - 128] Transverse colon, ENCFF780IFS [0 - 79] Hepatocyte, ENCFF227OHE [0 - 187] Skeletal muscle myoblast, ENCFF788PRN [0 - 58] DND-41, ENCFF233SGJ Right atrium auricular region, ENCFF790FPZ [0 - 106] Gastrocnemius medialis, ENCFF241KYN [0 - 218] [0 - 103] A673, ENCFF794NZR [0 - 261] Adrenal gland, ENCFF256CHZ [0 - 132] Epithelial cell of prostate, ENCFF823OTM [0 - 2094] Osteoblast, ENCFF261BXV [0 - 12] K562, ENCFF840LLW [0 - 115] MCF-7, ENCFF276WAM [0 - 444] Neural progenitor cell, ENCFF842EXW [0 - 68] KOPT-K1, ENCFF277BMG [0 - 115] Astrocyte, ENCFF854VET [0 - 176] Esophagus muscularis mucosa, ENCFF282WYM [0 - 237] Transverse colon, ENCFF856JEH [0 - 81] MM.1S, ENCFF284AZM [0 - 102] Breast epithelium, ENCFF865HUX [0 - 96] Peyer's patch, ENCFF284VHL [0 - 177] HepG2, ENCFF866RVV [0 - 150] Body of pancreas, ENCFF301QHM [0 - 107] Gastroesophageal sphincter, ENCFF870AFA [0 - 85] Mammary epithelial cell, ENCFF305KUC [0 - 233] [0 - 248] KMS-11, ENCFF312KWS SK-N-MC, ENCFF874TEV [0 - 92] VCaP, ENCFF884RIP [0 - 2967] Esophagus squamous epithelium, ENCFF320HXP [0 - 98] Thyroid gland, ENCFF908ZKL [0 - 65] Smooth muscle cell, ENCFF323VHZ [0 - 78] Myotube, ENCFF919CDP [0 - 88] Thoracic aorta, ENCFF344YDV [0 - 39] [0 - 87] Transverse colon, ENCFF939NAG Fibroblast of dermis, ENCFF350EFY [0 - 260] [0 - 13] PC-3, ENCFF940SZQ Thyroid gland, ENCFF367IJJ [0 - 349] [0 - 355] A549, ENCFF384KPD Gastrocnemius medialis, ENCFF957CMX [0 - 153] Fibroblast of lung, ENCFF977TER [0 - 31] HCT116, ENCFF401XSL [0 - 158] [0 - 350] ACC112, ENCFF420AVK Cardiac muscle cell, ENCFF987RQQ [0 - 45] [0 - 2040] GM12878, ENCFF440GZA C4-2B, ENCFF994HZA [0 - 173] HCT116, ENCFF445BLD [0 - 225] RWPE2, ENCFF453JGO [0 - 82] Ascending aorta, ENCFF472VCY OCI-LY3, ENCFF530OSB [0 - 273] [0 - 213] Body of pancreas, ENCFF554AUP B cellENCFF561, APY [0 - 182] [0 - 51] Tibial nerve, ENCFF563ABT Neural cell, ENCFF574HAY [0 - 244] [0 - 708] Bipolar neuron, ENCFF593QNT Coronary artery, ENCFF609WYJ [0 - 82] [0 - 3263] 22Rv1, ENCFF615PSS Karpas-422, ENCFF618INJ [0 - 96] LINC01869 LINC01869 RefSeq Genes RefSeq Genes LRRC4B C19orf48 KLK3 KLK7 MIR8074 LRRC4B C19orf48 KLK3 KLK7 MIR8074 Figure S8 (Page 1)

A 1,797 kb LNCaP Chr8 127,800 kb 128,200 kb 128,600 kb 129,000 kb 129,400 kb RNA-Seq [0 - 1000] CTCF [0 - 0.31] [0 - 1.27] FOXA1 [0 - 6.40] AR [0 - 2.76] SNPs ID

ChIP-Seq H3K27ac [0 - 326] 1: rs12543663 RNA Pol II 2: rs10086908 3: rs1016343 RNA Pol II ChIA-PET 4: rs1456315 5: rs13254738 6: rs6983561 B 1,797 kb 7: rs16901979 VCaP Chr8 127,800 kb 128,200 kb 128,600 kb 129,000 kb 129,400 kb 8: rs10505483 RNA-Seq [0 - 1000] 9: rs16902094 10: rs445114 CTCF [0 - 0.22] 11: rs620861 [0 - 0.69] FOXA1 12: rs6983267 [0 - 1.19] AR 13: rs1447295 [0 - 5.98] 14: rs4242382 H3K27ac 15: rs4242384 [0 - 0.88] 16: rs7837688 ChIP-Seq ERG [0 - 599] RNA Pol II [0 - 2.12] RNA Pol II phospho RNA Pol II ChIA-PET RefSeq Genes SNPs PCAT1CASC19 CCAT2 MYC PVT1 MIR1208 LINC00824 1 2 3 7,8 9,10 12 1316 4,5,6 11 14,15 C 1,055 kb 148,400 kb 148,600 kb 148,800 kb 149,000 kb 149,200 kb 149,400 kb LNCaP Chr7 [0 - 1000] RNA-Seq [0 - 1.64] CTCF [0 - 1.63] FOXA1 [0 - 3.00] AR [0 - 5.74]

ChIP-Seq H3K27ac [0 - 444] RNA Pol II

RNA Pol II ChIA-PET

1,055 kb 148,400 kb 148,600 kb 148,800 kb 149,000 kb 149,200 kb 149,400 kb D VCaP Chr7 [0 - 1000] RNA-Seq [0 - 0.78] CTCF [0 - 0.18] FOXA1 [0 - 0.21] AR [0 - 5.85] H3K27ac [0 - 1.45] ChIP-Seq ERG [0 - 640] RNA Pol II [0 - 1.19] RNA Pol II phospho

RNA Pol II ChIA-PET

RefSeq Genes CUL1 EZH2 GHET1 ZNF786 ZNF282 ZNF777 ZNF767P KRBA1 ZNF398 ZNF783 Figure S8 (Page 2)

E 1803 kb 1803 kb 127,800 kb 128,400 kb 129,000 kb 127,800 kb 128,400 kb 129,000 kb Chr8 Chr8 [0 - 45] [0 - 1181] Ascending aorta, ENCFF002VCM 22Rv1, ENCFF620QMB [0 - 40] [0 - 267] SK-N-SH, ENCFF020DAV OCI-LY1, ENCFF643VAM [0 - 188] [0 - 8661] Spleen, ENCFF036NIB RWPE1, ENCFF648UKZ [0 - 110] MCF-7, ENCFF051GGA [0 - 441] Gastrocnemius medialis, ENCFF649TBC [0 - 13] Thyroid gland, ENCFF051GYP [0 - 16] Thoracic aorta, ENCFF652JBU DOHH2, ENCFF656KBE [0 - 129] Neutrophil, ENCFF065VTJ [0 - 202] [0 - 636] Upper lobe of left lung, ENCFF071QUR [0 - 31] PC-9, ENCFF670FVV [0 - 65] SK-N-SH, ENCFF073KYI [0 - 810] Neuroepithelial stem cell, ENCFF678YWF [0 - 156] Endothelial cell of umbilical vein, ENCFF089DEG [0 - 182] Right lobe of liver, ENCFF680QZX [0 - 349] Uterus, ENCFF090IUI [0 - 62] Fibroblast of arm, ENCFF704JGE [0 - 451] GM23338, ENCFF132AIF [0 - 240] HeLa-S3, ENCFF706HLO [0 - 210] Stomach, ENCFF134NHD [0 - 151] SU-DHL-6, ENCFF710FJI [0 - 79] Fibroblast of lung, ENCFF138ZMZ [0 - 121] Heart left ventricle, ENCFF729BAZ [0 - 120] Body of pancreas, ENCFF147LSN [0 - 21] Vagina, ENCFF736SWP Panc1, ENCFF750VPZ [0 - 82] A549, ENCFF153QRX [0 - 690] [0 - 26] Sigmoid colon, ENCFF156QWL [0 - 114] Prostate, ENCFF753JCJ [0 - 58] Adrenal gland, ENCFF162SNB [0 - 20] Keratinocyte, ENCFF757GRD [0 - 519] HEK293, ENCFF175HIN [0 - 36] OCI-LY7, ENCFF764QRU [0 - 47] CD14-positive monocyte, ENCFF195PCP [0 - 220] H1-hESC, ENCFF766NYI RWPE2, ENCFF770MSW [0 - 3071] Adrenal gland, ENCFF217RXL [0 - 40] Transverse colon, ENCFF780IFS [0 - 78] Hepatocyte, ENCFF227OHE [0 - 67] [0 - 164] DND-41, ENCFF233SGJ [0 - 66] Skeletal muscle myoblast, ENCFF788PRN [0 - 52] Gastrocnemius medialis, ENCFF241KYN [0 - 134] Right atrium auricular region, ENCFF790FPZ [0 - 160] Adrenal gland, ENCFF256CHZ [0 - 194] A673, ENCFF794NZR [0 - 3809] Osteoblast, ENCFF261BXV [0 - 149] Epithelial cell of prostate, ENCFF823OTM K562, ENCFF840LLW [0 - 128] MCF-7, ENCFF276WAM [0 - 36] Neural progenitor cell, ENCFF842EXW [0 - 105] KOPT-K1, ENCFF277BMG [0 - 422] [0 - 174] Esophagus muscularis mucosa, ENCFF282WYM [0 - 105] Astrocyte, ENCFF854VET [0 - 64] MM.1S, ENCFF284AZM [0 - 269] Transverse colon, ENCFF856JEH Breast epithelium, ENCFF865HUX [0 - 95] Peyer's patch, ENCFF284VHL [0 - 51] HepG2, ENCFF866RVV [0 - 191] Body of pancreas, ENCFF301QHM [0 - 110] [0 - 76] Mammary epithelial cell, ENCFF305KUC [0 - 109] Gastroesophageal sphincter, ENCFF870AFA [0 - 366] KMS-11, ENCFF312KWS [0 - 390] SK-N-MC, ENCFF874TEV VCaP, ENCFF884RIP [0 - 4125] Esophagus squamous epithelium, ENCFF320HXP [0 - 117] Thyroid gland, ENCFF908ZKL [0 - 25] Smooth muscle cell, ENCFF323VHZ [0 - 230] Myotube, ENCFF919CDP [0 - 116] Thoracic aorta, ENCFF344YDV [0 - 49] [0 - 36] Fibroblast of dermis, ENCFF350EFY [0 - 114] Transverse colon, ENCFF939NAG PC-3, ENCFF940SZQ [0 - 642] Thyroid gland, ENCFF367IJJ [0 - 4.81] [0 - 199] A549, ENCFF384KPD [0 - 507] Gastrocnemius medialis, ENCFF957CMX [0 - 410] [0 - 111] HCT116, ENCFF401XSL Fibroblast of lung, ENCFF977TER ACC112, ENCFF420AVK [0 - 565] Cardiac muscle cell, ENCFF987RQQ [0 - 67] [0 - 109] [0 - 781] GM12878, ENCFF440GZA C4-2B, ENCFF994HZA HCT116, ENCFF445BLD [0 - 395] [0 - 1810] RWPE2, ENCFF453JGO Ascending aorta, ENCFF472VCY [0 - 37] RefSeq Genes PCAT1 CASC19 MYC MIR1208 OCI-LY3, ENCFF530OSB [0 - 835] PVT1 LINC00824 Body of pancreas, ENCFF554AUP [0 - 62] B cellENCFF561, APY [0 - 92] F Correlation of interacted enhancer peak Tibial nerve, ENCFF563ABT [0 - 48] and expression level of MYC Neural cell, ENCFF574HAY [0 - 78] 150 Bipolar neuron, ENCFF593QNT [0 - 232] Coronary artery, ENCFF609WYJ [0 - 116] [0 - 1001]

22Rv1, ENCFF615PSS 100 Karpas-422, ENCFF618INJ [0 - 756] 50 RefSeq Genes PCAT1 CASC19 MYC MIR1208 Cor: 0.60 PVT1 LINC00824

0 Pvalue:3.36E-07 Expression level (TPM>1) 0 200 40060080010001200 1400 Maximum enhancer peak score Figure S8 (Page 3)

G 1,090 kb 1,090 kb 148,400 kb 148,800 kb 149,200 kb 148,400 kb 148,800 kb 149,200 kb Chr7 Chr7 [0 - 214] Ascending aorta, ENCFF002VCM Karpas-422, ENCFF618INJ [0 - 208] [0 - 49] SK-N-SH, ENCFF020DAV 22Rv1, ENCFF620QMB [0 - 1925] [0 - 398] Spleen, ENCFF036NIB OCI-LY1, ENCFF643VAM [0 - 146] [0 - 324] MCF-7, ENCFF051GGA RWPE1, ENCFF648UKZ [0 - 5453] [0 - 33] Thyroid gland, ENCFF051GYP Gastrocnemius medialis, ENCFF649TBC [0 - 168] Neutrophil, ENCFF065VTJ [0 - 286] [0 - 34] Thoracic aorta, ENCFF652JBU [0 - 148] [0 - 589] Upper lobe of left lung, ENCFF071QUR DOHH2, ENCFF656KBE [0 - 349] [0 - 229] SK-N-SH, ENCFF073KYI PC-9, ENCFF670FVV [0 - 31] [0 - 181] Endothelial cell of umbilical vein, ENCFF089DEG Neuroepithelial stem cell, ENCFF678YWF [0 - 185] [0 - 281] Uterus, ENCFF090IUI Right lobe of liver, ENCFF680QZX [0 - 176] [0 - 190] GM23338, ENCFF132AIF Fibroblast of arm, ENCFF704JGE [0 - 249] [0 - 339] Stomach, ENCFF134NHD HeLa-S3, ENCFF706HLO [0 - 34] [0 - 103] Fibroblast of lung, ENCFF138ZMZ SU-DHL-6, ENCFF710FJI [0 - 38] [0 - 131] Body of pancreas, ENCFF147LSN Heart left ventricle, ENCFF729BAZ [0 - 187] [0 - 250] A549, ENCFF153QRX Vagina, ENCFF736SWP [0 - 202] Sigmoid colon, ENCFF156QWL Panc1, ENCFF750VPZ [0 - 37] [0 - 12] [0 - 94] Adrenal gland, ENCFF162SNB Prostate, ENCFF753JCJ [0 - 73] [0 - 28] HEK293, ENCFF175HIN Keratinocyte, ENCFF757GRD [0 - 123] CD14-positive monocyte, ENCFF195PCP OCI-LY7, ENCFF764QRU [0 - 229] [0 - 72] Adrenal gland, ENCFF217RXL [0 - 26] [0 - 182] H1-hESC, ENCFF766NYI Hepatocyte, ENCFF227OHE [0 - 1590] [0 - 137] RWPE2, ENCFF770MSW DND-41, ENCFF233SGJ [0 - 151] [0 - 251] Transverse colon, ENCFF780IFS Gastrocnemius medialis, ENCFF241KYN [0 - 20] [0 - 246] Skeletal muscle myoblast, ENCFF788PRN Adrenal gland, ENCFF256CHZ [0 - 183] [0 - 80] Right atrium auricular region, ENCFF790FPZ Osteoblast, ENCFF261BXV [0 - 154] [0 - 13] A673, ENCFF794NZR MCF-7, ENCFF276WAM [0 - 6731] [0 - 294] Epithelial cell of prostate, ENCFF823OTM KOPT-K1, ENCFF277BMG [0 - 69] [0 - 207] K562, ENCFF840LLW Esophagus muscularis mucosa, ENCFF282WYM [0 - 34] [0 - 268] Neural progenitor cell, ENCFF842EXW MM.1S, ENCFF284AZM [0 - 36] [0 - 173] Astrocyte, ENCFF854VET Peyer's patch, ENCFF284VHL [0 - 219] [0 - 236] Transverse colon, ENCFF856JEH Body of pancreas, ENCFF301QHM [0 - 249] [0 - 77] Breast epithelium, ENCFF865HUX Mammary epithelial cell, ENCFF305KUC [0 - 72] [0 - 275] HepG2, ENCFF866RVV KMS-11, ENCFF312KWS [0 - 179] [0 - 217] Gastroesophageal sphincter, ENCFF870AFA Esophagus squamous epithelium, ENCFF320HXP [0 - 122] [0 - 141] SK-N-MC, ENCFF874TEV Smooth muscle cell, ENCFF323VHZ [0 - 1706] [0 - 136] VCaP, ENCFF884RIP Thoracic aorta, ENCFF344YDV [0 - 84] [0 - 23] Thyroid gland, ENCFF908ZKL Fibroblast of dermis, ENCFF350EFY [0 - 79] [0 - 32] Myotube, ENCFF919CDP Thyroid gland, ENCFF367IJJ [0 - 87] [0 - 316] Transverse colon, ENCFF939NAG A549, ENCFF384KPD [0 - 148] [0 - 190] HCT116, ENCFF401XSL PC-3, ENCFF940SZQ [0 - 273] [0 - 354] ACC112, ENCFF420AVK Gastrocnemius medialis, ENCFF957CMX [0 - 46] [0 - 24] GM12878, ENCFF440GZA Fibroblast of lung, ENCFF977TER [0 - 204] [0 - 193] HCT116, ENCFF445BLD Cardiac muscle cell, ENCFF987RQQ [0 - 1211] [0 - 1162] RWPE2, ENCFF453JGO C4-2B, ENCFF994HZA [0 - 149] Ascending aorta, ENCFF472VCY [0 - 670] OCI-LY3, ENCFF530OSB RefSeq Genes [0 - 171] CUL1EZH2 ZNF282 ZNF777 KRBA1 Body of pancreas, ENCFF554AUP GHET1 ZNF767P [0 - 203] B cell, ENCFF561APY [0 - 88] Tibial nerve, ENCFF563ABT [0 - 175] H Neural cell, ENCFF574HAY [0 - 630] Correlation of interacted enhancer peak Bipolar neuron, ENCFF593QNT [0 - 229] and expression level of EZH2 Coronary artery, ENCFF609WYJ [0 - 1320] 120 22Rv1, ENCFF615PSS 100

RefSeq Genes 80 CUL1EZH2 ZNF282 ZNF777 KRBA1 GHET1 ZNF767P 60 40

20 Cor: 0.56

0 Pvalue:0.002 Expression level (TPM>1) 0 200 400 600 800100012001400 Maximum enhancer peak score Figure S9 A

LNCaP

150

100 iability (% )

50 Relative Cell V

0

si NT si VPS53 si FAM57Asi GEMIN4 si combination

B

LNCaP VPS53 1.5 FAM57A GEMIN4

1.0

** *** 0.5 ** ***

Relative Fold Change *** **

0.0 si NT si VPS53 si FAM57A si GEMIN4 si combination Table S1. Related to Figures 1 and 2. Metadata table for the datasets used in this study.

Experiment Cell line Molecule Source GEO/ENCODE ID RNA-seq DU145 / This study GSE121022 ChIP-seq DU145 H3K27ac This study GSM3424003 ChIP-seq DU145 CTCF This study GSM3424002 ChIAPET DU145 RNA Pol II This study GSM3423999 ChIAPET DU145 RAD21 ENCODE ENCSR672RHL RNA-seq LNCaP / This study GSE121022 ChIP-seq LNCaP CTCF GEO GSM947528 ChIP-seq LNCaP AR This study GSM3424005 ChIP-seq LNCaP FOXA1 GEO GSM2219864 ChIP-seq LNCaP H3K27ac GEO GSM1902615 ChIP-seq LNCaP H3K4me3 GEO GSM945240 Dnase-seq LNCaP DNase I GEO GSM816637 ChIAPET LNCaP RNA Pol II This study GSM3423998

ChIAPET LNCaP Rad21 ENCODE ENCSR011ITK HiC LNCaP / ENCODE ENCSR346DCU RNA-seq RWPE-1 / This study GSE121022 ChIP-seq RWPE-1 CTCF ENCODE ENCFF391XEK ChIP-seq RWPE-1 H3K27ac GEO GSM1541009 ChIAPET RWPE-1 RNA Pol II This study GSM3424000 RNA-seq VCaP / This study GSE121022 ChIP-seq VCaP CTCF GEO GSE84432 ChIP-seq VCaP AR GEO GSM1328945 ChIP-seq VCaP FOXA1 GEO GSM2058886 ChIP-seq VCaP H3K27ac ENCODE ENCFF190ZWT ChIP-seq VCaP ERG GEO GSM1328979 ChIP-seq VCaP RNA Pol II phospho GEO GSM1328964 ChIAPET VCaP AR GEO GSM1327093 ChIAPET VCaP RNA Pol II This study GSM3423997

Table S2-1. Related to Figures 1 and 2. Metadata table for ChIA-PET sequenced in this study.

Uniquely Total PETs Output Intra Inter Cell Line Total PETs1 Valid PETs2 Mapped Broad Self PETs5 between PETs3 PETs6 PETs7 PETs4 Peaks Peaks RWPE-1 278,678,565 256,081,201 213,698,640 90,123,783 86,711 10,235,232 184,671 47,469 137,202

LNCaP 342,749,728 244,073,222 152,299,081 135,848,887 65,335 12,972,501 159,283 51,365 107,918

VCaP 254,392,059 188,211,704 135,372,820 117,548,587 60,656 15,665,876 492,546 88,374 404,172

DU145 231,264,122 202,773,594 165,934,884 72,043,490 58,097 7,658,637 243,953 38,130 205,823

1Total PETs: total sequenced raw Pair End Tags. 2Valid PETs: total PETs after linker trimming that kept for alignment, regardless of whether they have linkers or not. (this is used for alignment). 3Output PETs: Successfully aligned on hg19 reference genome after removing unmapped reads and low MAPQ reads. 4Uniquely Mapped PETs: Uniquely mapped PETs after removing PCR duplicates. (This is used for following peak calling and interaction analysis). 5Self PETs: Self-ligation PET (two ends of the same DNA fragment). 6Intra PETs: intra-chromosomal PETs. 7Inter PETs: inter-chromosomal PETs. Table S2-2. Related to Figures 1 and 2. Metadata table for ChIP-seq sequenced in this study.

Mapped hg19 Mapped Sample ID Target Cell line Total reads reads hg19 ratio DU145_INPUT None DU145 55,199,482 37,950,546 68.75% DU145_H3K27ac H3K27ac DU145 44,248,954 35,526,292 80.29% DU145_CTCF CTCF LNCaP 52,829,386 42,818,743 81.05% LNCaP_INPUT None LNCaP 53,984,265 34,778,710 64.42% LNCaP_AR AR LNCaP 83,915,780 55,429,089 66.05% Table S2-3. Related to Figures 1 and 2. Metadata table for RNA-seq data sequenced in this study. Mapped Mapped Mapped Mapped Retained Cell line Replicate Total reads rRNA genome genome rRNA reads reads ratio reads ratio *DU145 1 134,038,110 10,743,305 8.02% 123,294,805 113,512,312 92.07% DU145 2 23,796,797 313,557 1.32% 23,483,240 22,513,624 95.87% RWPE-1 1 1,228,613 35,375 2.88% 1,193,238 1,031,576 86.45% RWPE-1 2 43,783,623 2,296,860 5.25% 41,486,763 39,111,293 94.30% *RWPE-1 3 117,596,563 6,435,787 5.47% 111,160,776 102,784,548 92.46% LNCaP 1 24,266,367 313,951 1.29% 23,952,416 22,311,856 93.15% *LNCaP 2 100,996,513 1,359,705 1.35% 99,636,808 93,931,367 94.30% LNCaP 3 36,843,667 282,480 0.77% 36,561,187 29,593,164 80.90% *VCaP 1 119,025,099 5,905,928 4.96% 113,119,171 98,423,765 87.01% VCaP 2 22,201,556 203,535 0.92% 21,998,021 19,614,776 89.17%

*Data shown in IGV track. rRNA represents for ribosome RNAs. Table S5. Related to Figure 2. Peak overlapping of RAD21, RNA Pol II, H3K27ac and CTCF for LNCaP and DU145 cells. The P-value of an overlapped number was calculated by using hypergeometric test, one-side testing of bigger than the overlapped number. Cell Peak Name RNA Pol II RAD21 Name RNA Pol II (Intra- RAD21 (Intra- RNA Pol II (All Peaks RAD21 (All Peaks interaction Peaks interaction Peaks 62563) 65485) 33336) 55294) 19952, 36122, 34109, CTCF (76292) 13147, P(X≥13147)=1 LNCaP P(X≥19952)=0.99 P(X≥36122)=8.41E-04 P(X≥34109)=1.08E-13 17640, RAD21 (65485) 27029, P(X≥27029)=1 ------P(X≥17640)=0.99 H3K27ac 33649, 23013, 23258, P(X≥23258)=1 21138, P(X≥21138)=1 (62723) P(X≥33649)=1.89E-08 P(X≥23013)=2.91E-71 RNA Pol II (Intra- RAD21 (Intra- RNA Pol II (All Peaks RAD21 (All Peaks interaction Peaks interaction Peaks 58097) 90274) 22063) 51326) 28936, CTCF (36152) 9542, P(X≥9542)=1 5541, P(X≥5541)=1 26739, P(X≥26739)=0 DU145 P(X≥28936)=9.18E-139 RAD21 (90274) 25084, P(X≥25084)=1 14612, P(X≥14612)=1 ------H3K27ac 28448, 20095, P(X≥20095)=0 23138, P(X≥23138)=1 19664, P(X≥20095)=1 (50364) P(X≥28448)=7.14E-09

Table S6. Related to Figure 3. Baseline characteristics for those patients with aCGH performed.

All patients N=27 Age at diagnosis, years 62.6 (8.2)* Mean (SD) Histology, N (%) Adenocarcinoma 23 (85) NA 4 (15) Gleeson, N (%) ≤7 7 (26) >7 14 (52) NA 6 (22) Presenting PSA, ug/l Median (IQR) 110.5 (16.7- 565.0)^ Metastatic at diagnosis, N (%) No 9 (33) Yes 14 (52) NA 4 (15) Primary therapy, N (%) Radical 8 (30) Palliative 17 (63) NA 2 (7) Biopsy site, N (%) Bone marrow 9 (33) Lymph node 11 (41) Liver 2 (7) Soft tissue 1 (4) Prostate 4 (15)

NA – not available, N - number, SD – standard deviation, IQR – interquartile range.

*Data not available for two patients

^3 PSA values missing

Table S8. Related to Figure 3. AR enhancers assays

Reporter Forward Reverse Repo 1 Primer Primer Ampli Assay ID rter 1 Concentr Concentr Concentr Forward Primer Sequence Reverse Primer Sequence Reporter 1 Sequence con Dye ation ation ation Size (µM) (µM) (µM)

Peak1_CDA CTGGATGGACCTTTGGGTTA GCCCACTCCTATACACTT CTTGGAGAGGAGGTAA FAM 5 18 18 80 AAHV TCTTT CTTCACT ATATG

Peak2_CDD GGCAGATGGACAGGCAAGTT CCCACTCTCAGCCAGTTC ACGTAAAGCAAGGCCA FAM 5 18 18 73 JXNP T TG CATAAA

peak3_CDF CTGTAGTTGACTGCAGAGGG CATTCTTCTCTGTGCACTG CTAGGAACCTCCCCAA FAM 5 18 18 82 VKUJ AATTT TCTAGT ATACCT

peak4_CDM TGTGTTTCTGGTTTTGTTAGT TCCTGACCCACAGAAATT AAGGTGCTAAGTTGTG FAM 5 18 18 107 FW39 TATCTATTGCT GTGAAAA AATAATTT

peak5_CD9 CAATGCAACATGTTACAAAT GGTTCAGGCCCATTGATT CTTCTGGACGTGAGTGA FAM 5 18 18 93 HH46 CCTTCTGA TGG TAGTC

peak6_CDW GCGCAGCCTATTTACATGAT GGTGATGATAGAGAGATA AAGGTCAACAAGATAA FAM 5 18 18 101 CWJT TTCA GTGGACCAA GGTAATCTC

Table S9. Related to Figure 9. siRNAs used in the study are: siRNAs Gene Gene Accession Catalogue Sequence Non-targeting (pool) N/A D-001810-10 UGGUUUACAUGUCGACUAA UGGUUUACAUGUUGUGUGA UGGUUUACAUGUUUUCUGA UGGUUUACAUGUUUUCCUA Androgen Receptor NM_000044 L-003400-00-0005 GAGCGUGGACUUUCCGGAA (pool) UCAAGGAACUCGAUCGUAU CGAGAGAGCUGCAUCAGUU CAGAAAUGAUUGCACUAUU FAM57A (pool) NM_024792 L-014448-02-0020 CGAUCAACUAGGAUGAAUU ACUACAGGGAAGCGUGAAA CCAAAUUGUGCCCUGGGUA GUAGGUAGUUAUUGAUCGU GEMIN4 (pool) NM_015721 L-020756-01-0020 CGUUGACACUUCUGCCGAA UACCUGUGCGGGAGCGAAA GGUCAUAGUGCACCCGGAA AUGCAGAACUAGAGCGCUU VPS53 (pool) NM_001128159 L-017048-00-0020 GCAAUUAGAUCACGCCAAA GGACGUAGUAUCAGGCAUU UAUUCGAACUGUUGUAAGA GGAUGUAAGUCUGAUUGAA

Table S10. Related to Figure 9.

Quantitative reverse transcription polymerase chain reaction (qRT-PCR) primers used in the study are:

QPCR primers Gene Forward(5’-3’) Reverse(5’-3’) FAM57 CGAACCAGAGACCAGAACCG CTGAATCAGAACCCTGCCCA GEMIN4 TCTTCTAAGTCGGCCTCAGC CATGTTCCTCCTGGCTGGATT VPS53 ACCAGGCGAAGACAATACGG GTCTGTGCAGCCTTCACTCT Androgen Receptor (AR) AGTCAATGGGCAAAACATGG TTGTGTCAAAAGCGAAATGG GAPDH GAGAGAAACCCGGGAGGCTA TCACCTTCCCCATGGTGTCT

Methods

ChIP-seq

The cells were crosslinked using 1% formaldehyde and quenched by 1.25M . The cells were lysed, sonicated, and used in a Chromatin Immunoprecipitation (ChIP) reaction (HighCell# ChIP kit protein G (Diagenode Inc., NJ) as per the manufacturer’s recommendation using the CTCF (3418S, Cell signaling Technologies) and H3K27ac (39133, Active Motif) antibodies. The ChIP processed DNA was then reverse crosslinked and purified using the IPure kit (Diagenode Inc., NJ). The ChIP DNA was quantified and used for construction of next generation sequencing libraries (KAPA Hyper Prep, KAPA Biosystems, MA). The ChIP sequencing (ChIP-Seq) libraries, after appropriate size selection using Agencourt AMPure XP (Beckman Coulter Life Sciences, IN), were sequenced with single-end reads of 75bp on Illumina NextSeq 500.

ChIP-seq data analysis

First, we did quality control for ChIP-seq datasets by FastQC (version 0.11.5) [Andrews, S. (2010) FastQC: a quality control tool for high throughput sequence data.], and then trimmed adaptor sequences from raw reads by FASTX_Toolkit (version 0.0.13.2, command fastx_clipper, http://hannonlab.cshl.edu/fastx_toolkit/). After adaptor trimming, reads shorter than 18 nt were discarded (adaptor sequence is “GATCGGAAGAGCACACGTCTGAA”). Second, ChIP-Seq reads were aligned to reference genome (hg19 version) by bowtie (version 1.0.0) (1), using the parameter as -v 1 -M 1 -m 2 --best -S. Considered reads have multiple alignments in the genome, we only retained reads that have no more than two alignments and reported the best aligned position by “--best”. Third, macs2 was used to call peaks from aligned ChIP-Seq reads, with default parameters (2). Integrated Genomics Viewer (IGV, v2.3.79) (3) was used to visualize ChIP-seq peak data.

RNA-seq

RNA was extracted from the cell lines used for ChIA-PET using the RNeasy Mini kit (QIAGEN, MD). The total RNA was analyzed using the RNA Screen Tape Station (Agilent, CA) and quantified using the ThermoFisher Qubit (Waltham, MA). One microgram of total RNA from each sample was used to prepare libraries using the KAPA mRNA Hyper prep kit (KAPA Biosystems, CA). The quality of the libraries was assessed using the Tape Station High Sensitivity DNA Screentape (Agilent, CA) and sequenced with single read of 75 bp. Library clustering was performed on a cBot with Illumina HiSeq PE Cluster kit v3 (San Diego, CA). The libraries were sequenced as single end 75 basepair reads on an Illumina NextSeq 500 (San Diego, CA) with an Illumina HiSeq SBS kit v3 (San Diego, CA). Base calling and quality scoring were performed using Real-Time Analysis (version 1.18.64) (San Diego, CA) and FASTQ file generation and de-multiplexing using CASAVA(San Diego, CA).

RNA-Seq data analysis

Quality Control We did quality control for RNA-Seq datasets of four cell lines by FastQC. Then, we removed reads that mapped to human ribosome RNA sequences using bowtie.

Alignment

Retained reads from the previous step were processed to map to human reference genome (version hg19) by Tophat2 (version 2.0.10) (4), with parameter “--read-mismatches 2 --segment-mismatches 0 -g 1 --report-secondary- alignments”. RNA-Seq libraries were constructed with strand information (first-stranded). DNA duplications generated by PCR were removed by samtools (command rmdup) (5).

Differential expression

Express (version 1.4.1) was used to calculate read counts for each gene (6). As express can only calculate read counts on transcript level, the estimated gene counts were summed across all the isoforms of the gene. Then the read count matrix files were used as input file for DESeq2 to identify differentially expressed genes (7). Significant differentially expressed genes were selected following the criteria that the absolute value of log2foldchange is no less than 1, and adjusted p-value is less than 0.05.

Definition of regulatory elements

For our analysis, several different genomic elements were defined, including promoters, enhancers, gene bodies and other regions. Promoters are defined as regions from upstream 500nt (-500nt) to downstream 250nt (+250nt) of the transcription start site (TSS) of a gene. Gene bodies are defined by subtracting promoter regions from whole gene regions based on annotation file of hg19. Other regions are genomic regions that are neither promoters not gene bodies. Combining ChIP-seq data of H3K27ac of four cell lines, enhancers are defined as H3K27ac peak regions, discarding peaks which have overlaps with promoters.

Correlation analysis

Correlation co-efficiency was calculated between RNA expression level and chromatin interactions. RNA expression level was measured by FPKM transformed by log2 calculated based on RNA-Seq data, while chromatin interactions were measured by PET numbers calculated from ChIA-PET data of four cell lines. As there are two or three biological replicates for RNA-seq datasets in each cell line, we took the mean FPKM of replicates as the expression level of each gene. Cor.test in R was used to calculate the correlation co-efficiency and p-value.

Motif enrichment analysis

Motif sequences enriched in promoter regions and enhancer regions were identified by homer (version 4.7), with default parameters (8). Significantly enriched motifs were selected following the thresholds described below:

I. For motifs enriched in enhancer regions: p-value < 1e-20, while for motifs enriched in promoter regions: p- value < 1e-5. II. Motifs are ranked by p-value.

III. Motifs with the same sequence type, like bZIP and Zf, are further curated to avoid redundancy (showing the one with the most significant p-value).

Data Visualization

The Integrative Genomics Viewer (IGV) is used to visualize the track files for different gene loci (3). HiC- Plotter was used to draw chromatin interaction heatmap (9).

Predicting potential gene target of enhancers, AR binding peaks, FOXA1 binding peaks and PCa risk SNP loci by using ChIA-PET data

The potential gene targets of cis-elements and trans-acting factors were predicted on the basis of their relationship with ChIA-PET interaction anchor peaks. In this study, potential gene targets of enhancers, AR binding peaks, FOXA1 binding peaks and PCa risk SNP loci are provided in supplementary tables S3, S7 and S8.

Potential gene targets of enhancers

For identifying the potential targets of enhancers, we filtered the RNA Pol II ChIA-PET interaction anchor peaks that overlapped with H3K27ac ChIP-seq peaks. Next, the H3K27ac peaks were assigned their paired peaks on the basis of PET interactions. The genes whose promoter regions (from -500 nt to +250nt from the TSS) overlapped with the paired peak regions were considered as potential targets of corresponding enhancers.

Potential gene targets of AR and FOXA1 binding peaks

Potential targets of AR and FOXA1 binding peaks in LNCaP and VCaP cells were also predicted by following the same strategy as above, only replacing the H3K27ac ChIP-Seq data with AR or FOXA1 ChIP-Seq data. Potential target genes identified by integrating RNA Pol II ChIA-PET with AR or FOXA1 binding peaks were classified as ChIA-PET plus ChIP-seq group (Group 1). Genes that are nearest to AR or FOXA1 binding peaks were classified as AR or FOXA1 ChIP-seq group (Group 2) (by utilizing tool closestBed in bedtools). To balance the number of different gene groups, genes in Group 2 were randomly selected with comparable number as Group 1. Also, comparable number of all annotated genes (GENCODE v19) were randomly selected as control group (Group 3). The AR-FOXA1 co-occupied regions that overlapped with the H3K27ac mark were designated as AR-FOXA1 co-occupied enhancers, and genes located in the paired RNA Pol II anchor peaks were assigned as their targets.

Potential gene targets of PCa risk SNP loci

We filtered the ChIA-PET interaction anchor peaks that overlapped with PCa risk SNP loci. The genes located in the paired anchor-peaks were extracted as targets of the risk SNP loci.

Patients and tissue samples Patients were identified from a population of men treated at the Royal Marsden NHS Foundation Trust who developed castration resistant prostate cancer. Patients with adequate DNA from formalin-fixed, paraffin-embedded biopsies to perform array CGH were selected. Biopsies were obtained from bone marrow (9), lymph node (11), liver (2), soft tissue (1) and prostate (4). All patients had given written informed consent and were enrolled in institutional protocols approved by the Royal Marsden NHS Foundation Trust Hospital (London, UK) ethics review committee (reference no. 04/Q0801/60). A total of 27 patients had sufficient DNA for testing. Demographic and clinical data for each patient were retrospectively collected from the hospital electronic patient record system.

Array Comparative genomic hybridization (aCGH)

Briefly 10 ng of tumor DNA and a pool reference of 20 male normal DNA were amplified using GenomePlex® Complete Whole Genome Amplification (WGA) Kit (Sigma-Aldrich, MO, USA), according to manufacturer procedures. 500ng of amplified tumor DNA was fluorescently labeled with Cy5, and the reference male DNA labeled with Cy3 using SureTag Complete DNA Labeling Kit (Agilent Technologies CA, USA). Labeled DNA was purified and hybridized utilizing the Agilent SurePrint G3 Human array CGH Microarray Kit, 2x400K. Slides were scanned using the SureScan microarray scanner and ratios of tumor DNA to reference male reference DNA were determined using the Feature extraction software and followed by CytoGenomics Software from Agilent

Technologies. Smoothed log2 ratio values < −0.25 were categorized as losses; those > 0.25 as gains; and those in between as unchanged. Amplifications were defined as smoothed log2 ratio values ≥1.2 and homozygous deletions as the segment log2 ratio values ≤ -1.2.

Digital PCR

Digital PCR was performed on a QX100 droplet digital PCR system (Bio-Rad). Copy Number Assays were performed with a FAM-AR Taqman Copy Number Assay, assay ID : Hs 00037560_cn and a reference copy number Vic-assay for RNAse P (cat 4403326) (Thermofisher). Copy Number Assay for AR enhancer peaks were FAM-Taqman custom designed probes (see table below). PCR reactions were prepared with 5ng DNA and Bio-Rad 2x ddPCR supermix (no dUTP) for probes (Cat number 1863010) in a total volume of 20 µl, and partitioned into ~14,000 droplets per sample in a QX100 droplet generator according to manufacturer’s instructions. Emulsified PCR reactions were run on a 96 well plate on an Eppendorf Nexus GSX1 thermal cycler incubating the plates at 95°C for 10 min followed by 40 cycles of 95°C for 15 sec and 60°C for 60 sec, followed by 10 min incubation at 98°C. The temperature ramp increment was 2.5°C/sec for all steps. Plates were read on a Bio-Rad QX100 droplet reader and digital PCR analysis was performed using QuantaSoft v1.3.2.0 software from Bio-Rad to assess the number of positive droplets. At least two negative control wells with no DNA were included in every run.

Whole genome sequencing (WGS) and Copy number alteration (CNA) analysis

For the WGS experiments, tumor DNA was extracted from seven 10-µm-thick frozen sections after confirmation of tumor cell presence on a 4-µm-thick hematoxylin and eosin (H&E)–stained slide. Germline DNA was extracted from saliva. The QIAmp DNA Mini Kit (#51304; Qiagen) was used for DNA isolation from both fresh/frozen sections and buccal swabs. Library preparation was performed using the TruSeq Nano DNA Library Prep Kit. Sequencing was performed using the Illumina HiSeq 2000 System. Tumor and matched germline DNA samples were profiled with a standard WGS protocol. Paired-end sequencing reads were aligned to the human reference genome (GRCh37/hg19) using Illumina Isaac Genome Alignment Software embedded in Illumina pipeline version v2.0.1(10). The absolute gene copy number in tumor was derived from NGS data based on comparing genome coverage between matched tumor-normal samples. CNV analysis used CNV-seq (11) which applied a robust statistical model that allows confidence assessment of observed copy number ratios and adjust confounders to achieve absolute gene copy number in tumor.

SNP data analysis

The 122 germline SNPs were obtained from previous study of prostate cancer (12). The hg19 genome was separated into bins/blocks with the size of the mean length of all peaks in each cell line. Based on overlapping the RNA Pol II peaks, all blocks were subsequently classified into peak-blocks and non-peak blocks. For each cell line, peak-blocks were defined as the number of peaks, and non-peak blocks were calculated by subtracting the number of peaks from the total number of blocks. A bootstrapping probability was calculated by random sampling of genome blocks. Each run, the blocks were randomly sampled as same number as peak-blocks and count the number of blocks that have SNPs. Then the empirical p-value was calculated for enrichment analysis based on in total of 10000 runs.

CPCG cohort

We interrogated a previously published cohort of 128 intermediate risk prostate cancer samples that underwent RNA-Seq, described previously (13). Genotype calls were extracted from WGS of matched blood samples using GATK (v.3.4.0+), described previously (14).

TCGA cohort

The TCGA PRAD was used a validation cohort (15). Only samples with concordance between SNP6 array genotypes and whole exome sequencing (WXS) blood samples calls > 80% were retained (348 samples). Genotypes were imputed using the Sanger Imputation Service – pre-phasing using Shapeit2 (16), imputation using PBWT (17) and the Haplotype Reference Consortium (release 1.1) panel (18).

Germline-RNA associations (eQTL) discovery and validation

For each SNP-target pair, we tested for eQTLs first in the CPCG discovery cohort and then validated significant associations in the TCGA cohort. We fit a linear regression model treating genotypes additively, i.e. number of alternative alleles present (AA = 0, AB = 1, BB = 2), with principal components 1 and 2 to control from population stratification. False discovery adjustment was applied over all SNP-target pairs and significant associations were defined as FDR < 0.05. eQTLs were considered validated if they had FDR < 0.05 in the TCGA validation cohort and the directionality of the coefficient was the same in both cohorts (i.e. sign(βdiscover ) = sign(βvalidation)). To assess the impact of ERG status on rs684232:FAM57A, rs684232:VPS53 and rs684232:GEMIN4 eQTLs, CPCG samples were subsetted into ERG positive and negative based on the presence of the TMPRSS2-ERG fusion, as described previously (14). We quantified each eQTL within positive and negative individuals separately using the same linear regression model described above. mRNA abundance differences between individuals positive and negative for TMPRSS2-ERG fusion with the same genotype were quantified using Mann-Whitney test.

Differential H3K27ac binding

Peak bed files and raw fastqs for H3K27ac (n = 92), H3K27me3 (n=76), AR (n=88) and H3K4me3 (n=56) were downloaded from the Gene Expression Omnibus (GSE120738) (19). Raw fastqs were aligned using bwa (v.0.7.15) and the aligned bam files from each target were merged for each patient (i.e. H3K27ac, H3K27me3, H3K4me3 and AR bams from the same patient were merged). This merging step increased the coverage as well as reduced allele bias due to allele specific binding. Using the merged bams, patients were genotyped at overlapping sites of interest using GATK (v3.4.0+) HaplotypeCaller. Using the peak bed files, consensus peak bed was generated by using DiffBind R package. Bedtools coverageBed was used to overlap the read counts with the consensus peaks. The overlapped read counts were normalized for sequencing depth using TMM method by edgeR. Differential H3K37ac binding at rs684232 was quantified using Mann-Whitney test considering a recessive model.

Immunoblot analysis

Cells were harvested and lysed using RIPA (150mM NaCl, 5mM EDTA, 50mM Tris-Cl, 1% NP-40, 0.5% Sodium deoxycholate, 0.1% SDS) buffer supplemented with proteinase inhibitor (cOmplete EDTA free protease inhibitor tablet, cat no 11873580001, Roche). After determining the protein concentration of lysates, samples were separated on 4 - 12% gradient gels (Invitrogen, NW04122). To detect full length and variants of AR protein, the immunoblots were performed using anti-AR (Sigma, A9853; this antibody binds to the N-terminus of AR), anti-AR (Sigma, SAB5500007; this antibody binds to the C-terminus of AR), anti-Actin (Sigma, AC-74), anti-beta tubulin (SantaCruz, sc9104), and anti-cMyc (Cell Signaling, 5605S) antibodies. To compare endogenous EZH2 and FOXA1 , anti-EZH2 (active motif, 39901), anti-FOXA1 (Cell Signaling, 58613) were used.

Luciferase reporter assays

The ~1100 promoter/enhancer region containing the wild-type or risk allele rs684232 was cloned into pGL2 luciferase reporter plasmid (Promega) and verified by Sanger sequencing. LNCaP or VCaP cells were seeded on 24-well plates and transfected with the indicated plasmids. The Fire fly luciferase / Renilla luciferase activities were measured using the Dual-Glo luciferase assay system as described in the manufacturer's protocol (Promega, E2940). siRNA knockdown and Gene expression studies Small interfering RNA (siRNA) were purchased from Horizondiscovery. siRNAs transfection was carried out by using Lipofectamine RNAiMAX transfection Reagent (Thermo Fisher Scientific, #13778-150) and total RNA was extracted using RNeasy Plus Mini Kit (QIAGEN, #74134) at 72 hr post transfection. Reverse transcription and quantitative PCR were performed as previously described (20).

CellTiter-Glo assay

LNCaP cells were seeded into multiple 96-well plates one day before the transfection with siRNA. To determine the number of viable cells in the culture, CellTiter-Glo reagent (Promega, G7570) was added to each well (7 days post siRNA transfection) and mixed by orbital shaker according to the manufacturer’s instructions. Plates were incubated for 10 min at room temperature, then luminescence was measured using a microplate reader (TECAN).

Supplemental References

1. Langmead B, Trapnell C, Pop M, and Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the . Genome Biol. 2009;10(3):R25. 2. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. 3. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24-6. 4. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, and Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36. 5. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9. 6. Roberts A, and Pachter L. Streaming fragment assignment for real-time analysis of sequencing experiments. Nature methods. 2013;10(1):71-3. 7. Love MI, Huber W, and Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. 8. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576-89. 9. Akdemir KC, and Chin L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 2015;16:198. 10. Raczy C, Petrovski R, Saunders CT, Chorny I, Kruglyak S, Margulies EH, et al. Isaac: ultra- fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics. 2013;29(16):2041-3. 11. Xie C, and Tammi MT. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics. 2009;10:80. 12. Guo H, Ahmed M, Zhang F, Yao CQ, Li S, Liang Y, et al. Modulation of long noncoding RNAs by risk SNPs underlying genetic predispositions to prostate cancer. Nat Genet. 2016;48(10):1142-50. 13. Chen S, Huang V, Xu X, Livingstone J, Soares F, Jeon J, et al. Widespread and Functional RNA Circularization in Localized Prostate Cancer. Cell. 2019;176(4):831-43 e22. 14. Fraser M, Sabelnykova VY, Yamaguchi TN, Heisler LE, Livingstone J, Huang V, et al. Genomic hallmarks of localized, non-indolent prostate cancer. Nature. 2017;541(7637):359-64. 15. Cancer Genome Atlas Research N. The Molecular Taxonomy of Primary Prostate Cancer. Cell. 2015;163(4):1011-25. 16. Delaneau O, Marchini J, and Zagury JF. A linear complexity phasing method for thousands of genomes. Nature methods. 2011;9(2):179-81. 17. Durbin R. Efficient haplotype matching and storage using the positional Burrows- Wheeler transform (PBWT). Bioinformatics. 2014;30(9):1266-72. 18. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279-83. 19. Stelloo S, Nevedomskaya E, Kim Y, Schuurman K, Valle-Encinas E, Lobo J, et al. Integrative epigenetic taxonomy of primary prostate cancer. Nat Commun. 2018;9(1):4900. 20. Li X, Baek G, Ramanand SG, Sharp A, Gao Y, Yuan W, et al. BRD4 Promotes DNA Repair and Mediates the Formation of TMPRSS2-ERG Gene Rearrangements in Prostate Cancer. Cell reports. 2018;22(3):796-808.