A 200 B Normal -log(p value) T_002 ‘Open in cancer’ annotated (n=882) 0 1 2 3 4 5 6 7 150 TumourOAC Regulation of cell development Cellular response to hormone stimulus 100 Response to wounding T_003 T_006 Regulation of cell adhesion Tube development 50 Regulation of serine/threonine kinase activity RS041N -log(p value) RS041T 0 ‘Closed in cancer’ annotated genes (n=399) 0 1 2 3 4 5 6 7

N_006 PC2 (12.68%) T_004 Regulation of cellular component movement N_003 -50 T_005 Prostate gland development Cellular response to organic cyclic compound T_001 Hormone secretion -100 N_005 Response to purine-containing compound Regulation of system process -150 -300 -150 0 150 300 450 PC1 (57.79%) Open in cancer (non-promoter) motifs Normal (n=17) Cancer (n=12) C ‘Open in cancer’ promoter (n=42) E 0 10 20 30 40

HNF4A 1xe-15

Motif 1

LNC_N01 LNC_N02 LNC_N03 LNC_N04 LNC_N05 LNC_N06 LNC_N08 LNC_N09 LNC_N10 LNC_09S LNC_LG01S LNC_LG02S LNC_LG05S LNC_LG06S LNC_LG09S LNC_06S LNC_07S LNC_01 LNC_02 LNC_03 LNC_04 LNC_09 LNC_10 LNC_11 LNC_12 LNC_13 LNC_14 LNC_15 LNC_LG06T HNF4A 1xe-11 HNF4A HNF4G -10 % targets RXRG SMAD3 1xe PPARG % background PPARA RXRB ‘Closed in cancer’ promoter (n=26) RXRA 0 10 20 30 40 Motif 2 MTE 1xe-11 GATA4 GATA5 GATA6 Dux 1xe-11 GATA1 GATA2 GATA3 IRF4 1xe-8 Motif 3 FOXA2 FOXA3 FOXC2 FOXC1 D Footprints in ‘open in cancer’ ATAC regions (n=996) FOXA1 Motif 4 0 % 5% 10 % HNF1B 150 HNF1A POU4F1 HNF4G 1xe-30 POU4F3 HOXA4 DBX2 DBX1 -29 100 HNF1B 1xe Motif 5 BATF -18 FOSL1 TEAD 1xe FOS

Tn5 Tn5 integrations ATF3 JUN 50 1xe-14 FOSL2 Cancer GATA6 JUNB 0 >5 -14

ormalized ormalized Normal EWS:ERG 1xe FPKM (log2) N 0 -100 0 100 % targets % backgrounds

25kb F Chr 18 G GATA6 motif regions (n=480) 5 Normal tissue ATAC 100 'Open in 0 (P>0.0001, χ=22.97, 5 Cancer' OAC tissue ATAC 80 df=4) 0 Genome subset HNF4A % 60 HNF4A 40 Chr 20 5kb 5 Normal tissue ATAC 20 0 5 OAC tissue ATAC 0 0 GATA6 1kb Chr 20 5 Normal tissue ATAC 0 5 HNF4A motif regions (n=375) 100 OAC tissue ATAC0 'Open in FOXA2 80 (P=0.0132, χ=12.63, df=4) Cancer' 5kb Genome Chr 12 60 3 % subset Normal tissue ATAC 0 3 40 OAC tissue ATAC 0 20 HNF1A 0 Chr 14 5kb 5 Normal tissue ATAC 0 5 OAC tissue ATAC 0 FOS Fig. S1 Figure S1. Open chromatin profiling of OAC tissue samples. (A) PCA plot of normalised ATAC‐seq signal from the top 50,000 regions of 4 normal tissue (blue) and 7 tumour tissue (red). (B) Top three motifs and their matched from de novo motif discovery of ‘open in cancer’ and ‘closed in cancer’ promoter regions. The frequency of motif occurrence is shown and motifs are sorted by p‐value. (C) GO term analysis of genes associated with ‘open in cancer’ and ‘closed in cancer’ differentially accessible regions. (D) Normalised Tn5 integrations from normal (blue) and cancer (red) tissue around called footprints (using pyDNase; Piper et al., 2013) within regions that are differentially accessible and ‘open in cancer’ (left). The top five de novo motifs identified within these footprint containing open regions, is shown as frequency of occurrence and sorted by p‐value (right). (E) Heatmaps of log2(1+FPKM) values of transcription factors with enriched DNA binding motifs (shown on the left) in ‘open in cancer’ regions in RNAseq data from normal or OAC tissue (Maag et al., 2017). Hierarchical clustering of samples was performed using 1‐Pearson’s correlation and rows sorted by average fold change in cancer (depicted by triangle). Arrows indicate the highest expressed factor in OAC. (F) UCSC genome browser tracks of HNF4A, GATA6, FOXA2, HNF1A and FOS showing ATAC‐seq signal from normal and OAC tissue. (G) Bar chart of the percentage of ‘Open in Cancer’ regions that contain either a GATA6 (top) or HNF4A (bottom) motif and additionally contain a motif for either HNF4A, GATA6, FOXA1, HNF1B or AP‐1. A random selection of HNF4A and GATA6 motifs in the genome were also assessed for presence of HNF4A, GATA6, FOXA1, HNF1B or AP‐1 motifs. P‐values, χ value and df are shown for each chart.

0.6

TLX2 0.5 HNF1A YTAATTAA POU3F2 HNF1B DBX1 DBX2 USF1 HOXD8 PHOX2B LHX5 MAX LHX3 HLX HOXA4 JUNDM2_1 LMX1B POU3F1 POU3F4 HOXB4 RGTTAMWNATT 0.4 PAX6 POU2F1 GATA1 HLF NFIL3 ATF1_1 LHX1 POU1F1 CEBPA LMX1AHOXC4POU4F3 GATA6_1 LHX3 HOXC5 UNCX OTP LHX9 BHLHB2_1 RYTAAWNNNTGAY GATA5_1 TTAYRTAA TBP_1 GATA3_1 CREB1 0.3 SRF_2 GATAAGR TGAYRTCA HOXC9 YTATTTTNR MAX_1 ZFP740_1 CGTSACG TGACGTCA CDX2 GTGACGY 0.2 footprinting CREB1

Cancer) TAANNYSGCG

ZBTB7B_1  GLIS2_1

HNF4A_2 differential differential 0.1

(Normal (Normal FOXA2 FOXD1 FOXJ1_1 FOXA2

FOXF2 RTAAACA Normalized Normalized TGTTTGY FOXA1 0 -0.4 -0.2 0 0.2 0.4 0.6 FOXA2_1 0.8 1

FOXO3 FOXK1_1 FOXJ3_1 TP53 -0.1 FOXL1_1 SRY

-0.2

-0.3 Normalized differential accessibility (Normal  Cancer)

Bag Fence Outliers (p>0.05) Outliers (p<0.05)

Figure S2. BaGFoot analysis of open chromatin regions in OAC tissue. Scatter bag plot of differential chromatin accessibility (x-axis) and footprinting (y-axis) depth around human transcription factor binding motifs in normal and cancer tissue. Significant outliers (p<0.05) are represented in red. This is identical to Fig. 1G but with full labelling of the motifs included. Differentiating human ESCs

Human embryonic tissue

Normal Normal oesophagus oesophagusBarrett’s OAC

OAC

Normal Normal oesophagus Barrett’s oesophagusBarrett’s

Figure S3. Expression of OAC-specific transcription factors during embryonic human intestinal development. Heatmap showing the expression of the indicated transcription factors across the indicated embryonic tissues and differentiated cell types derived from HUES7 embryonic stem cells (left) and expressed in normal, Barrett’s and OAC tissue (right). Data are row Z normalized. The maximal number of reads found in one of the tissue/cell samples is shown on the left. C A identities analysis presence CLDN lysates cluster clustered the OE from Figure Tubulin HNF4A GATA6 33

top PC2 (10.54%) IB: -250 -200 -150 -100 , 18 the 100 150 -50 50 OE . S 0 (red) 50 -250 4 (D) of T_003 genes . 19 top of of hierarchically , The 000 the Het1A UCSC . the and a N_006 50 (C) HNF . chromatin indicated ATAC -50 , T_002 regions Common 000 N_003 FLO Immunoblot T_006 genome 4 PC1 (53.44%) PC1 A OE33 - - ATAC seq 1 and N_005 150 (pink) T_005 FLO tested and ATAC regions landscape - ATAC - GATA seq 1 browser the and - for OAC tissue Normal 350 seq are regions OE19 D - HNF4A 6 seq two from GATA a tissue binding shown peak E motif normal tracks of peaks OE19 OE33 FLO Het1a / main GATA6 550 normal of 6 OE regions - , 1 data at HNF 0 5 0 5 0 5 0 5 0 5 0 5 19 motifs showing clusters Tumour cell-line Tumour tissue Tumour Tissue Normal Normal cell-line Normal OAC cell OAC OAC tissueOAC oesophageal found the Chr cells P3 4 (N_), from A 10 bottom for

and % input bound - is line 0.02 0.04 0.06 0.08 0.12 represents in 0.1 HNF are OAC indicated normal ATAC 0 OAC Tubulin OE19 HNF4A CLRN3 highlighted 4 of cell (T_)

A Ctrl * - P2 seq * (D) tissue (left) CLRN3 P1 tissue B line and

. CLRN3 P2 OAC in expression signals * N_005 N_003 N_006 FLO Het1A T_005 T_006 T_003 T_002 CLRN3 P3 OE19 OE33 P1 HNF4A IgG represents or blue Het ChIP the and - (blue), 1 OAC GATA as 10kb 1 indicated A and of OE a OE33 (green) tissue ‘normal’ 6 two OAC in 19 Het1A red 0 5 0 5 0 5 0 5 0 5 0 5 (right) p Het Chr

cells FLO-1 < loci P3 . tissue below . cell (A) 0 1 % input bound 3 N_006 (B) . A, 0.1 0.2 0.3 0.4 0.5 0.6 05 occupancy cluster surrounding 0 are PCA

line N_003 FLO Pearson (n= (red), OE19 GATA6 P2 the

n.s. N_005

highlighted Ctrl - samples plot 3 1 * ) P1

, T_002 (blue)

peaks CLDN18 P1 . OE three CLDN18 P2 of T_003 correlation in 33 CLDN18 P3 ChIP ATAC the .

OE T_006 . and Samples (E) and OAC GATA6 IgG

19 T_005 in CLRN 10kb - ChIP a seq OE cells OE19 red cell ‘cancer’ plot 19 1 0 0.5 3 - signal . qPCR were . lines CLDN18 and The The cell

of Pearson correlation A

IB: IB: HNF4A GATA6 * * B HNF4A ChIP-seq GATA6 ChIP-seq C 3377

GATA6 37658 GATA6 rep1 rep2

13603

HNF4A rep1 HNF4A GATA6 GATA6 rep1

HNF4A rep2 GATA6 rep2 1676

D 100 GATA6 HNF4A 6870 HNF4A 80 HNF4A rep1 rep2 60

motif 40 2868 20 % % peaks with TF E 0 Motif TF % Targets P-value HNF4A 8.13% 1xe-336 Binned ranked peaks (%)

F Cell-line ATAC-seq HNF4A centered GATA6 centered

HNF4A only regions GATA6 only regions HNF4A+GATA6 regions HNF4A only regions GATA6 only regions HNF4A+GATA6 regions

200 200 200 140 140 140

100 100 100 70 70 70

Tn5 Tn5 integrations Tn5 integrations

ormalized ormalized ormalized N N 0 0 0 0 0 0 -75 0 75 -75 0 75 -75 0 75 -75 0 75 -75 0 75 -75 0 75 Het1a OE19 Het1a OE19

Tissue ATAC-seq HNF4A centered GATA6 centered

HNF4A only regions GATA6 only regions HNF4A+GATA6 regions HNF4A only regions GATA6 only regions HNF4A+GATA6 regions

50 50 50 40 40 40

Tn5 Tn5 integrations Tn5 Tn5 integrations

25 25 25 20 20 20

ormalized ormalized

ormalized ormalized

N N 0 0 0 0 0 0 -75 0 75 -75 0 75 -75 0 75 -75 0 75 -75 0 75 -75 0 75 Normal OAC Normal OAC

FIG.S5 Figure S5. HNF4 and GATA6 ChIP‐seq analysis. (A) Immunoblots of HNF4A (left) and GATA6 (right) following ChIP in different fractions; IgG FT (flow through from non‐specific IgG precipitation), HNF4A/GATA6 FT (flowthrough from HNF4A/GATA6 precipitation), IgG elute (material eluted following IgG precipitation) HNF4A/GATA6 elute (material eluted following HNF4A/GATA6 precipitation). Arrows indicating immunoprecipitated protein (HNF4A/GATA6) and asterisks are non‐specific cross‐reacting bands (left) or the IgG heavy chain (right). (B) Scatter plot of ChIPseq signal between experimental replicates for HNF4A and GATA6 with Pearson correlation indicated. (C) Overlap of peaks called by each replicate for HNF4A and GATA6 ChIP‐seq. The numbers of peaks in each sector are given. (D) Percentage of HNF4A and GATA6 ChIP‐seq peaks with a HNF4A or GATA6 motif respectively. Peaks were sorted on enrichment and separated into bins, each containing 10% of the peaks. The presence of a HNF4A and GATA6 motif was assessed in each bin, with values ranging from 87%‐32% for HNF4A and 70%‐28% for GATA6. (E) Presence of HNF4A motif in GATA6 ChIP‐seq regions. (F) Footprinting of ATAC‐seq data from OE19 and Het1A cell‐lines (top) or normal and OAC tissue (bottom) on 150 bp regions surrounding the HNF4A (left) or GATA6 (right) transcription factor motifs found in HNF4A‐only, GATA6‐only and HNF4A+GATA6 co‐occupied ChIP‐seq regions. Normalised Tn5 integrations are plotted across each of the regions relative to the centre of the binding motif. Data from cancer cells/tissue are shown in red and normal cells/tissue in blue.

HNF4A

A B 1.2 1.2 CLRN3 40 GATA6

) )

1 1 35 RPLP0 IB: RPLP0 30 0.8 0.8 25 HNF4A 0.6 0.6 20

55 FPKM 0.4 0.4 15 Tubulin 10 55 0.2 0.2

5

Relative Relative RNA levels ( Relative RNA levels ( 0 0 0 siNT siHNF4A siNT siHNF4A siNT siHNF4A C D

1.2 GATA6 1.2 CLDN18 35 HNF4A

) )

1 1 30 n.s.

IB: RPLP0 RPLP0 25 0.8 0.8 70 20 GATA6 0.6 0.6

FPKM 15 0.4 0.4 10 Tubulin 0.2 0.2

55 5

Relative Relative RNA levels ( Relative RNA levels ( 0 0 0 siNT siGATA6 siNT siGATA6 siNT siGATA6

E siNT RNAseq siHNF4A RNAseq

Rep 1 Rep 1

Rep 2 Rep 2

siGATA6 RNAseq Rep 3 Rep 3 siGATA6+siHNF4A RNAseq

Rep 1 Rep 1

Rep 2 Rep 2

F Rep 3 Rep 3

siHNF4A only siGATA6 only -log(p value) -log(p value) 0 5 10 0 10 20

Response to wounding Regulated exocytosis Tissue morphogenesis Glycoprotein metabolic process Epithelial cell differentiation Regulation of hormone levels Regulation of cell migration Lipid biosynthetic process Myeloid leukocyte differentiation Carbohydrate metabolic process

siHNF4A + siGATA6 only -log(p value) 0 2 4 6

T cell differentiation Extracellular matrix organization Embryonic pattern specification Bile acid and bile salt transport RegulationRegulation of protein of protein localization localization to plasma… to plasma membrane FIG. S6 Figure S6. HNF4A and GATA6 regulated target genes. (A and C) Immunoblots of HNF4A (A) and GATA6 (C) expression following 72 hours of transfection with either siNT (non‐targeting control), siHNF4A (A) or siGATA6 (C). Tubulin expression was detected as a loading control. (B and D) RNA level expression measured by RT‐qPCR analysis of HNF4A and CLRN3 expression after 72 hours of siHNF4A transfection (B) or GATA6 and CLDN18 after 72 hours of siGATA6 transfection (D). Expression levels from RNA‐seq data of HNF4A and GATA6 following siRNA knockdown of the reciprocal factor shown on the right. Expression in the presence of siNT (non‐targeting control) is taken as 1 in all cases. ** represents p < 0.01, *** represents p < 0.001 (n=3). (E) Correlation plots of replicate RNA‐seq data from OE19 cells treated with siNT, siHNF4A, siGATA6 or both siHNF4A+siGATA6 with Pearson’s correlation stated. (F) GO term analysis of differentially downregulated genes following siHNF4A, siGATA6 or siHNF4A+siGATA6 treatment. The genes included in the analysis direct targets of the respective transcription factors and are those uniquely downregulated by either HNF4A (“siHNF4A only”) or GATA6 (siGATA6 only”) depletion (and not also reciprocally regulated by the other factor) or genes that are upregulated either when both or either HNF4A or GATA6 are depleted or only when both factors are depleted (“siHNF4A and siGATA6”). The top 5 most significant terms are shown.

Downregulated genes with siRNA Upregulated genes with siRNA

6 **** 6 n.s. n.s. n.s. * n.s. 3 3

0 0

Gene expression expression Gene

(cancer FPKM/normal FPKM) FPKM/normal (cancer FPKM) FPKM/normal (cancer

2 2

-3 -3

Log Log

(n=136) (n=100)

(n=256) (n=716) (n=500) (n=141) (n=314) (n=500)

Random Random

siHNF4A siGATA6 siHNF4A siGATA6

siHNF4A+siGATA6 siHNF4A+siGATA6

Figure S7. The expression of direct HNF4A-GATA6 target genes in oesophageal tissues. Boxplot of changes in in OAC tissue (shown as cancer/normal FPKM log2 fold change) for genes downregulated (A) and upregulated (B) by siRNA against HNF4A (blue), GATA6 (red) or both (black)(TCGA- IC-A6RE-11A/01A data; The Cancer Genome Atlas Research Network, 2017). Whiskers represent 1.5 x IQR. A random (500 randomly selected RefSeq transcripts) dataset is represented in white. **** represents p<0.0001, * represents p<0.05. A B ‘Closed in Barrett’s’ -Log10(P-value) Biological Pathway GO analysis 0 2.5 5

B_001 Oxidative stress-induced premature senescence Regulation of smoothened signaling pathway Ephrin signaling pathway Regulation of protein secretion Sodium ion transmembrane transport Response to mechanical stimulus B_002 Epidermis development Regulation of body fluid levels Inositol lipid-mediated signaling Vascular process in circulatory system C ‘Open in Barrett’s’ -Log10(P-value) B_003 Biological Pathway GO analysis 0 5 10 Neuron projection morphogenesis Cellular metal ion homeostasis Negative regulation of sodium ion transport Actin filament-based process Gland development B_004 E Positive regulation of transferase activity HNF4A motif **** Regulation of lipid metabolic process G Regulation of secretion

1 5 Cellular response to lipid 30 Neuron recognition **** * n.s.

25 H 7 7 7 7 1 0

20 6 6 6 6 Tn5 Tn5

(1+FPKM) 5 5 5 5 2

Tn5 Tn5 integrations 15 5 4 4 4 4 Log

3 3 3 3 ormalized ormalized 10 intergrations

N 2 2 2 2

0

ormalized ormalized 5 1 1 1 1 N 0 0 0 0 0 -500 500 -500 500 -500 500 -500 500

-75 0 75 HNF4A only GATA6 only GATA6 and Random

Normal Normal Normal Normal

Barrett’s Barrett’s Barrett’s Barrett’s ChIP regions ChIP regions HNF4A ChIP regions Normal siHNF4A siHNF4A siGATA6 Random Normal ATAC regions Barrett's +siGATA6 only only set Barrett’s OACATAC ATAC

F Normal Barrett’s D ‘Open in Barrett’s over- representation analysis

0% 20% 40%

LNC_N02 LNC_N03 LNC_N04 LNC_N05 LNC_N06 LNC_N08 LNC_N09 LNC_N10 LNC_09S LNC_LG01S LNC_LG02S LNC_LG05S LNC_LG06S LNC_LG09S LNC_06S LNC_07S LNC_B01 LNC_B02 LNC_B03 LNC_B04 LNC_B05 LNC_B06 LNC_B07 LNC_B08 LNC_B09 LNC_B10 LNC_B11 LNC_B12 LNC_B13 LNC_LG03B Motif 1 LNC_N01 GATA4 1x10-54 FOXA1 -34 FOXA3 FOXB1 1x10 FOXA2 HNF4G 1x10-30 FOXC1 FOXC2 DUX 1x10-16 Motif 2 HOXA13 1x10-15 GATA6 GATA4 GATA2 ‘Closed in Barrett’s over- GATA5 representation analysis GATA3 GATA1 0% 50% 100% Motif 3 SD0002.1 -31 HNF4A 1x10 HNF4G TP53 1x10-29 PPARG PPARA AR 1x10-13 RXRG SIX1 1x10-12 RXRA FOXH1 1x10-12 Motif 4 JUNB FOSL2 JUN ATF3 BATF Motif 5 KLF5 KLF3 KLF6 KLF9 KLF1 Motif 5 HNF1B HNF1A POU4F1 POU4F3 0 >5 HOXA4 FPKM (log2) DBX2 DBX1 FIG. S8 Figure S8. Open chromatin profiling and gene expression in Barrett’s tissue. (A) Correlation plots of ATAC‐seq data from biopsies non‐dysplastic Barrett’s tissues (B_001 to B_004) with Pearson’s correlation stated. (B and C) GO term analysis showing the top enriched biological pathways for genes associated with regions “closed in Barrett’s” (B) or “open in Barrett’s” (C). The top 10 most significantly enriched categories are shown. (D) Top five DNA motifs derived from de novo motif discovery (using total accessible regions in Barrett’s as background rather than whole genome) and their associated transcription factor that are enriched in ‘open in Barrett’s’ (top) or ‘closed in Barrett’s’ (bottom) non‐ promoter regions. The frequency of motif occurrence is shown and the motifs are sorted by p‐value. (E) Footprinting of ATAC‐seq data from normal and Barrett’s tissue on 150 bp regions surrounding the HNF4A transcription factor motif found in ‘Open in Barrett’s’ regions. Normalised Tn5 integrations are plotted across each of the regions relative to the centre of the binding motif. Data from Barrett’s tissue are shown in black and normal tissue in blue. (F) Heatmaps of log2(1+FPKM) values of transcription factors with enriched DNA binding motifs in “open in Barrett’s” regions in RNAseq data from normal or Barrett’s tissue (Maag et al., 2017). Heatmaps are sorted by highest average fold increase expression in Barrett’s (indicated by a triangle) and the highest expressed transcription factor in each case is indicated by an arrow. (G) Boxplot of log2 (1+FPKM) values of the expression of genes positively regulated by either HNF4A, GATA6 or both transcription factors in either normal or Barrett’s tissue (Maag et al., 2017). **** represents p<0.0001, * represents p<0.05. Median values are indicated by the horizontal lines. (H) Tag density plots of normalised ATAC‐seq signal from merged normal (blue), Barrett’s (black) and OAC (red) tissue from regions bound by HNF4A only, GATA6 only, both factors and random regions as a control.

0.4 RGTTAMWNATT HNF1B HNF1A POU3F2 TLX2

PHOX2B HOXD8 OTP YTAATTAA HOXA4 HOXB4 DBX1 GATAAGR HOXC5 DBX2 0.3 BARX2 HLX LHX5 PAX6 UNCX LHX1 LHX3 POU3F1 POU2F1 POU1F1 GATA5_1 POU3F4 LHX3 GATA6_1 GATA3_1 0.2 SRF SRF_1

ARID5A_1 CCAWWNAAGG FOXA2 RREB1 GM397_2 FOXA2 FOXJ3_1 USF1 FOXA2_1 0.1 FOXK1_1 ZBTB7B_1 ZFP281_2 FOXL1_1 ZFP740_1 MAX EGR1_2 Barrett's) GLIS2_1 FOXA1

 ZFP281_1 IRX4

PLAGL1_2 SRY TBP_1 differential differential accessibility

0 ZFP128_2 (Normal (Normal

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 Normalized Normalized

LM207

-0.1 TCFAP2C_2 TGCCAAR TGCGCANK TFAP2A TGGNNNNNNKCCAR TCFAP2B_2 TCFAP2A_1 RCGCANGCGY TCFAP2C_1 TCFAP2B_1 -0.2 TCFAP2E_1

-0.3 Normalized differential accessibility (Normal  Barrett's)

Bag Fence Outliers (p>0.05) Outliers (p<0.05)

Figure S9. BaGFoot analysis of open chromatin regions in Barrett’s tissue. Scatter bag plot of differential chromatin accessibility (x-axis) and footprinting (y-axis) depth around human transcription factor binding motifs in normal and non-dysplastic Barrett’s tissue. Significant outliers (p<0.05) are represented in red. This is identical to Fig. 4D but with full labelling of the motifs included. A B 20 HNF4A Rep 1 Rep 2

15

10

5

(Fold (Fold change) 2

0 Het1A +DOX rep 1 Log Het1A +dox Het1A-HNF4A Het1A-HNF4A Het1A-GATA6 Het1A-GATA6 -5 2d 4d 2d 4d Het1A +DOX rep 2 Rep 1 Rep 2 10 GATA6 8 6

4 HNF4A

-

GATA6 GATA6 -

(Fold (Fold change) 2 2

0

Het1A

Log Het1A

-2 +DOX 2 days rep 1

Het1A +dox Het1A-HNF4A Het1A-HNF4A Het1A-GATA6 Het1A-GATA6 +DOX 2 days rep 1 -4 2d 4d 2d 4d

Het1A-HNF4A Het1A-GATA6 35 +DOX 2 days rep 2 +DOX 2 days rep 2 30 25 20

15

Ct Ct values

HNF4A HNF4A GATA6 GATA6

10 - -

5 Het1A

0 Het1A +DOX 4 days +DOX 4 days rep 1

Het1A +dox Het1A-HNF4A Het1A-HNF4A Het1A-GATA6Het1A-GATA6 +DOX 4 days rep 1 2days 4days 2days 4days

HNF4A rep 1 HNF4A rep 2 GATA6 rep 1 GATA6 rep 2 Het1A-HNF4A Het1A-GATA6 +DOX 4 days rep 2 +DOX 4 days rep 2 C 2972 3000 2608 150 106 87 2500 100 D Open in Het1A-HNF4A 2 days (n=2972) 2000 50 2% 1500 0 0 0 2% 1000 7% Intergenic -50 Intron 500 0 0

No. differential No. differential 0 -100 TTS

accessible accessible regions 40% -500 -150 Exon -1000 -149 Promoter -200 -157 -976 -929 Het1A Het1A- Het1A- Het1A Het1A- Het1A- 49% HNF4A HNF4A GATA6 GATA6 2 days 4 days 2 days 4 days

0.12 E H Cluster 1 0.1 Het1A-HNF4A Cluster 2 2 days 0.08 Cluster 3 434

0.06 intergrations 0.04 2538 Tn5 0.02 70 0 Het1A-HNF4A

4 days Normalized

Open in Barretts genes & F G -log10(P value) Open in Het1A-HNF4A -log10(P value) Open in Het1A-HNF4A 2 days 2 days genes (n=2417) 0 2 4 6 8 10 12 genes overlap (n=113) 0 2 4 6 Cell projection morphogenesis Response to wounding Embryonic morphogenesis Heart development Response to wounding Response to laminar fluid shear stress Sensory organ development Negative regulation of protein modification process Skeletal system development Positive regulation of cytosolic calcium ion concentration Muscle system process Digestive tract morphogenesis Gland development Regulation of MAPK cascade Muscle structure development Positive regulation of DNA-templated transcription, initiation Regulation of cell adhesion Modulation of chemical synaptic transmission Response to growth factor Muscle cell proliferation Lipid biosynthetic process FIG. S10 Figure S10. Open chromatin profiling of Het1A with overexpression of HNF4A or GATA6. (A) qRT‐PCR of Het1A cells (treated with doxycline (dox) for four days), or Het1A‐HNF4A and Het1A‐GATA6 cells treated with doxycline (dox) for two and four days (d) of induction, using primers for HNF4A (top) and GATA6 (middle). Ct values of samples are also shown (bottom). (B) Correlation plots of ATAC‐seq replicates of Het1A cells (+dox), Het1A‐HNF4A and Het1A‐GATA6 cells (two and four days of induction) with Pearson’s correlation stated. (C) Number of differentially accessible regions (linear 5 fold difference) with induction of HNF4A or GATA6 after two or four days dox treatment. (D) Pie chart of the genomic distribution of differentially accessible regions in Het1A‐HNF4A after 2 days of dox treatment. Promoter is defined as ‐2.5 kb to +0.5 kb relative to the TSS. (E) Overlap of differentially more accessible regions between Het1A‐ HNF4A 2d and Het1A‐HNF4A 4d cells with numbers indicated in each quadrant. (F) GO term analysis showing the top enriched biological pathways for genes associated with regions showing increased accessibility following induction of HNF4A expression in Het1A cells. The top 10 most significantly enriched categories are shown. (G) GO term analysis showing the top enriched biological pathway for genes associated with regions showing increased accessibility following induction of HNF4A expression and increased accessibility in Barrett’s compared to normal tissue. (H) Average normalised ATAC‐seq signal from Het1A or Het1A‐HNF4A cells after 2 (2d) or 4 (4d) days induction for regions comprising the three k‐means clusters in Fig. 5E).

0.4

CEBPA 0.3

HNF4A_2

0.2 CEBPA

HNF4A HNF4A 2d) -

ZFP105_1 HNF4A

Het1A NR2F1

 HLF TTGCWCAAY

Het1A Het1A SRY_1 MTF1_2 ( LM112 0.1 RXRA_2 TCF7_1 ZFP128_2 TCF3_1 LEF1_1

TCF7L2_1 footprinting

LM140 GAANYNYGACNY ESRRB TGACCTY 0 YGTCCTTGR differential differential TGACCTTG -0.4 -0.2 0 0.2 0.4 ESRRA_1 0.6 0.8 ESR2 NR4A2 RARA_1 HNF4A_1 NR2F2_1 RORA_1 Normalized Normalized NR2F2_2 ESRRA_2 TCFAP2A_2 TGACGTCA RXRA_1 AP1 TGASTMAGC-0.1 TCFAP2B_1 FOS ALX3 ALX1_2 TCFAP2C_1 TGANTCA TGGNNNNNNKCCAR

JUNDM2_2 CREB1 ESX1

-0.2 Normalized differential accessibility (Het1A  Het1A-HNF4A 2d)

Bag Fence Outliers (p>0.05) Outliers (p<0.05)

Figure S11. BaGFoot analysis of open chromatin regions in Het1A-HNF4A cells. Scatter bag plot of differential chromatin accessibility (x-axis) and footprinting (y-axis) depth around human transcription factor binding motifs in Het1A (+dox) and Het1A-HNF4A (2d) cells. Significant outliers (p<0.05) are represented in red. This is identical to Fig. 5C but with full labelling of the motifs included. Supplemental Table S7. DNA motifs enriched in Barrett’s‐specific open chromatin regions. Top ten motifs found by de novo motif discovery and their associated transcription factors that are enriched in ‘open in Barrett’s’ (A) or ‘closed in Barrett’s’ (B) non‐promoter regions.

(A) Open in Barrett's; Non‐promoter regions Rank Motif P‐value log P‐pvalue % of Targets % of Background Best Match/Details 1 CCTAAGTAAACA 1.00E‐95 ‐2.20E+02 40.32% 10.00% FOXM1(Forkhead)/MCF7‐FOXM1‐ChIP‐Seq(GSE72977)/Homer(0.940) 2 TGATAAGA 1.00E‐70 ‐1.63E+02 26.54% 5.45% Gata4(Zf)/Heart‐Gata4‐ChIP‐Seq(GSE35151)/Homer(0.971) 3 TGAACTTTGGAC 1.00E‐67 ‐1.54E+02 13.64% 1.14% HNF4G/MA0484.1/Jaspar(0.961) 4 AATGACTCAT 1.00E‐65 ‐1.52E+02 19.94% 3.11% Atf3(bZIP)/GBM‐ATF3‐ChIP‐Seq(GSE33912)/Homer(0.960) 5 GACACACCTA 1.00E‐31 ‐7.34E+01 10.85% 1.91% EKLF(Zf)/Erythrocyte‐Klf1‐ChIP‐Seq(GSE20478)/Homer(0.839) 6 ATCATTAACC 1.00E‐30 ‐7.12E+01 13.20% 2.97% Nkx6.1()/Islet‐Nkx6.1‐ChIP‐Seq(GSE40975)/Homer(0.787) 7 ACTTCCTGGT 1.00E‐23 ‐5.51E+01 10.41% 2.38% ELF3(ETS)/PDAC‐ELF3‐ChIP‐Seq(GSE64557)/Homer(0.926) 8 AGGGTGGAGT 1.00E‐19 ‐4.45E+01 10.56% 2.96% KLF3(Zf)/MEF‐Klf3‐ChIP‐Seq(GSE44748)/Homer(0.844) 9 CACTCCCT 1.00E‐15 ‐3.54E+01 19.35% 9.24% KLF5(Zf)/LoVo‐KLF5‐ChIP‐Seq(GSE49402)/Homer(0.746) 10 TGATAGCTTG 1.00E‐13 ‐3.16E+01 3.96% 0.58% PH0169.1_Tgif1/Jaspar(0.659)

(B) Closed in Barrett's; Non‐promoter regions Rank Motif P‐value log P‐pvalue % of Targets % of Background Best Match/Details 1 CATGCCCAGACA 1.00E‐46 ‐1.06E+02 58.35% 25.77% (p53)/mES‐cMyc‐ChIP‐Seq(GSE11431)/Homer(0.824) 2 CAAGCTTGTT 1.00E‐36 ‐8.42E+01 58.58% 29.19% Nr2e3/MA0164.1/Jaspar(0.810) 3 TGACTCAT 1.00E‐35 ‐8.19E+01 30.89% 9.39% BATF(bZIP)/Th17‐BATF‐ChIP‐Seq(GSE39756)/Homer(0.980) 4 GTTACTGCATGT 1.00E‐24 ‐5.62E+01 11.90% 1.92% Ets1‐distal(ETS)/CD4+‐PolII‐ChIP‐Seq(Barski_et_al.)/Homer(0.616) 5 CCTGCCAA 1.00E‐20 ‐4.81E+01 38.44% 18.86% NFIA/MA0670.1/Jaspar(0.885) 6 AAGAACATGTCT 1.00E‐17 ‐3.96E+01 4.81% 0.33% AR‐halfsite(NR)/LNCaP‐AR‐ChIP‐Seq(GSE27824)/Homer(0.594) 7 TGCCTAAGGC 1.00E‐16 ‐3.83E+01 13.27% 3.59% TFAP2C(var.2)/MA0814.1/Jaspar(0.844) 8 GTTTTGCCCTGG 1.00E‐15 ‐3.60E+01 2.06% 0.02% ZNF416(Zf)/HEK293‐ZNF416.GFP‐ChIP‐Seq(GSE58341)/Homer(0.612) 9 GCACACGC 1.00E‐15 ‐3.47E+01 25.63% 11.69% PB0130.1_Gm397_2/Jaspar(0.796) 10 CTTGCTAAGA 1.00E‐13 ‐3.05E+01 3.43% 0.21% PB0054.1_Rfx3_1/Jaspar(0.730) Supplemental Table S9. DNA motifs enriched in chromatin regions that are more accessible upon HNF4A expression.

Rank Motif P‐value log P‐pvalue % of Targets % of Background Best Match/Details 1 TATGACTCAT 1e‐330 ‐7.60E+02 28.30% 4.42% Fra1(bZIP)/BT549‐Fra1‐ChIP‐Seq(GSE46166)/Homer(0.983) 2 CAAAGTCCAA 1.00E‐232 ‐5.36E+02 67.44% 34.51% HNF4G/MA0484.1/Jaspar(0.899) 3 TAAGGAATGT 1.00E‐94 ‐2.18E+02 21.55% 7.90% TEAD3/MA0808.1/Jaspar(0.961) 4 TCTGCCCTGG 1.00E‐64 ‐1.48E+02 28.60% 14.92% PB0133.1_Hic1_2/Jaspar(0.810) 5 TTTGACCCTGGG 1.00E‐53 ‐1.23E+02 27.25% 14.88% EBF1(EBF)/Near‐E2A‐ChIP‐Seq(GSE21512)/Homer(0.699) 6 TCTGTGGTTT 1.00E‐43 ‐9.95E+01 12.99% 5.45% RUNX(Runt)/HPC7‐Runx1‐ChIP‐Seq(GSE22178)/Homer(0.959) 7 GTCCAATGAA 1.00E‐27 ‐6.24E+01 7.84% 3.20% PB0005.1_Bbx_1/Jaspar(0.722) 8 ATTATGAC 1.00E‐18 ‐4.23E+01 4.05% 1.41% PH0065.1_Hoxc10/Jaspar(0.621) 9 AGCTTTGACC 1.00E‐17 ‐4.11E+01 3.08% 0.91% MF0004.1_Nuclear_Receptor_class/Jaspar(0.750) 10 GGCAGTCCAA 1.00E‐16 ‐3.91E+01 1.69% 0.30% PB0134.1_Hnf4a_2/Jaspar(0.618) Supplemental Table S10. Patient sample information.

Samples Sex Matched sample N_003 M N_005 F N_006 M B_001 F B_002 M B_003 M B_004 F T_001 M T_002 M T_003 M N_003 T_004 M T_005 F N_005 T_006 M N_006 Supplemental Table S11. PCR primers used in this study

ChIP‐qPCR Associated gene Forward (5'‐3') ADS# Reverse (5'‐3') ADS# Control AAGCCCCTCAACAGACTCAA 5857 CCTCATAGCTTGACCTTCGAG 5858 CLRN3_P1 TGGGAGAGGCAGGCTAACTA 5869 GGCCCGTCTCATTTTGAGTA 5870 CLRN3_P2 TGGGAGAGGCAGGCTAACTA 5871 GGCCCGTCTCATTTTGAGTA 5872 CLRN3_P3 CCGTGTGAAGGGACATCTTT 5873 GCACTCAGCAGGAGTGTCAG 5874 CLDN18_P1 GACCAATCTGCAGTGCCTTC 5901 GGGCTTGCCCTACTCCTTTA 5902 CLDN18_P2 CAAACAGAAACCAGCCCTCC 5903 GTGGGAAGTGTTAGGGCTGA 5904 CLDN18_P3 GAAAGCGAGTGTGAGGGAAG 6342 TGGTCACAGCATTTGGTGAT 6343

RT‐qPCR Gene Forward (5'‐3') ADS# Reverse (5'‐3') ADS# RPLP0 ATTACACCTTCCCACTTGCT 5454 CAAAGAGACCAAATCCCATATCCT 5455 HNF4A ATGTACTCCTGCAGATTTAGCC 5855 CCTCATAGCTTGACCTTCGAG 5856 GATA6 CTCTACAGCAAGATGAACGG 5430 TGGAGTTTCATGTAGAGTCCAC 5431 CLRN3 CAACCCTTACCAGACATTCC 5851 TATGAGCCAGAACGAGTATCC 5852 CLDN18 ATCATGTTCATTGTCTCAGGTC 5470 GGCTTTGTAGTTGGTTTCTTCTG 5471 Additional Supplemental Table legends

Table S1. Differentially accessible regions in OAC patient samples. (A) Coordinates of the top 50,000 genomic regions showing accessibility across normal and OAC samples. (B and C) Coordinates of regions showing >5 fold changes in accessibility when comparing the average signal in OAC tumour samples T_002, T_003, T_005, T_006 to normal tissue N_003, N_005 and N_006 among these 50,000 regions. Regions are split according to genomic locations in promoter regions (‐2.5 kb to +0.5 kb relative to the TSS) (C) or non‐promoter regions (B).

Table S2. DNA motifs enriched in OAC‐specific open chromatin regions. Top ten motifs found by de novo motif discovery and their associated transcription factors that are enriched in ‘open in cancer’ (A) or ‘closed in cancer’ (B) promoter or non‐promoter regions.

Table S3. HNF4A and GATA6 binding regions identified by ChIP‐seq . Genomic regions identified by ChIP‐ seq for HNF4A (A) or GATA6 (B) binding.

Table S4. DNA motifs enriched in HNF4A and GATA6 ChIP‐seq datsets. Top 25 motifs found by de novo motif discovery and their associated transcription factors that are enriched in HNF4A (A) or GATA6 (B) ChIP‐seq datasets.

Table S5. Differentially regulated genes following transcription factor knockdown. Genes whose expression is deregulated by >1.3 fold (p‐value<0.05) following either HNF4A (A and B), GATA6 (C and D) or both HNF4A and GATA6 (E and F) depletion. Data are split between genes up‐ and down‐regulated. Only genes representing likely direct targets (i.e. associated with a nearby HNF4A and/or GATA6 peak) are shown.

Table S6. Differentially accessible regions in Barrett’s patient samples. (A) Coordinates of the top 50,000 genomic regions showing accessibility across normal and Barrett’s samples. (B and C) Coordinates of regions showing >5 fold increase or decrease in accessibility when comparing the average signal in Barrett’s samples B_001‐4 to normal tissue N_003, N_005 and N_006 among these 50,000 regions. Regions are split according to genomic locations in promoter regions (‐2.5 kb to +0.5 kb relative to the TSS) (C) or non‐promoter regions (B).

Table S8. Differentially accessible regions in Barrett’s patient samples. (A) Coordinates of the top 50,000 genomic regions showing accessibility across Het1A and Het1A‐HNF4A cells treated with doxycycline for 2 days. (B and C) Coordinates of regions showing >5 fold increase (B) or decrease (C) in accessibility when comparing the signal in Het1A compared to Het1A‐HNF4A cells. (D) Coordinates of the top 50,000 genomic regions showing accessibility across Het1A and Het1A‐GATA6 cells treated with doxycycline for 2 days. (E and F) Coordinates of regions showing >5 fold increase (E) or decrease (F) in accessibility when comparing the signal in Het1A compared to Het1A‐GATA6 cells.