Active and poised promoter states drive folding of the extended HoxB in mouse embryonic stem cells

Mariano Barbieri, Sheila Q. Xie, Elena Torlai Triglia, Inês de Santiago, Miguel

R. Branco, David Rueda, Mario Nicodemi and Ana Pombo

Online Methods

1 Online Methods

Cell Culture

Mouse ES-OS25 cells (kindly provided by W. Bickmore) were grown as previously described1. Briefly, cells were grown on 0.1% gelatin-coated flasks in GMEM-BHK21 supplemented with 10% fetal calf serum, 2 mM L-glutamine, 1% MEM non-essential amino acids (NEAA), 1 mM sodium pyruvate (all from Gibco, Invitrogen), 50 µM ß- mercaptoethanol, 1000 U/ml of human recombinant leukaemia inhibitory factor (LIF; Chemicon, Millipore, Chandler’s Ford, UK) and 0.1 mg/ml Hygromycin (Roche) as described previously2,3.

Cells were fixed and processed for cryosectioning as described previously4-6. Briefly, cells were fixed in 4% and then 8% EM-grade freshly depolymerized paraformaldehyde in 250 mM HEPES-NaOH (pH 7.6; 10 min and 2 h, respectively), which provides optimal preservation of subcellular organisation7. Cell pellets were embedded (2h) in 2.1 M sucrose in PBS and frozen in liquid nitrogen. Tokuyasu cryosections (150 nm in thickness) were cut using an UltraCut UCT 52 ultracryomicrotome (Leica, Milton Keynes, United Kingdom), captured in sucrose drops on metal bacteriological loops (~2mm diameter), and transferred onto glass coverslips for cryoFISH.

Probes

Genomic regions within the HoxB locus were detected using fosmid probes from BACPAC Resources (California, USA; Supplementary Table 3), including Snf8 (WI1-696E14; HoxB13 (WI1-2076M19); Hoxb1 (WI1-376M13); control (WI1-2809P17135) and Snx11 (WI1- 1523P20). The specificity of fosmid probes was confirmed by PCR using specific primers (Supplementary Table 4). Probes were labelled with digoxigenin-11-dUTP, fluorescein-12- dUTP or tetramethyl-rhodamine-5-dUTP by nick translation (Roche), and purified from unincorporated nucleotides using MicroBioSpin P-30 chromatography columns (BioRad, Hertfordshire, UK).

Fluorescence in situ Hybridisation on Ultrathin cryosections (CryoFISH)

CryoFISH was performed directly on fresh cryosections essentially as described before4,6 for co-hybridisation of HoxB locus pairs, as follows: Snf8/Hoxb13; Snf8/Hoxb1; Snf8/Control; Snf8/Snx11; Hoxb13/Hoxb1, Hoxb13/Control; Hoxb13/Snx11; Hoxb1/Control; Hoxb1/Snx11; and Control/ Snx11. Briefly, cryosections on glass coverslips were rinsed (3x) in PBS, incubated (15 min) in 20 mM glycine in PBS, rinsed (3x) in PBS, permeabilised (10 min)

2 with 0.2% Triton X-100 and 0.2% saponin in PBS, and then washed (3x) in PBS. Cryosections were incubated (1h, 37ºC) with 250 µg/ml RNase A (Sigma; in 2xSSC), treated (10 min) with 0.1 M HCl, dehydrated in ethanol (50 to 100% series, 3 min each), denatured (10 min, 80°C) in 70% deionized formamide, 2xSSC, and dehydrated. Hybridization was carried out at 37°C in a moist chamber over 48h. Hybridization mixtures contained 50% deionized formamide (Sigma), 2xSSC, 10% dextran sulphate, 50 mM phosphate buffer (pH 7.0), 1mg/ml Cot1 DNA, 2mg/ml salmon sperm DNA and 2-4 µl nick-translated probe. Before hybridization, probes were denatured (10 min) at 70°C and re-annealed (30 min) at 37°C.

Post-hybridization washes were as follows: 50% formamide in 2xSSC (42°C; 3x over 25 min), 0.1xSSC (60°C, 3x over 30 min), and 0.1% Tween-20 in 4xSSC (42°C, 10 min). Sections were incubated (30 min) with casein-blocking solution (pH 7.8; Vector Laboratories) containing 2.6% NaCl, 0.5% BSA, and 0.1% fish skin gelatin. The signal of rhodamine- labelled probes was amplified with rabbit anti-rhodamine (2h; 1:500; Invitrogen) and Cyanine3-conjugated donkey antibodies raised against rabbit IgG (1h; 1:1000; Jackson ImmunoResearch Laboratories). Probes labelled with digoxigenin were detected with sheep anti-digoxigenin Fab fragments (2h; 1:200; Roche) and with AlexaFluor555-conjugated antibodies raised in donkey against sheep IgG (1h; 1:1000; Invitrogen). Probes labelled with FITC were detected with mouse antibodies against FITC IgG (1h, 1:500; Jackson ImmunoResearch Laboratories) and AlexaFluor488-conjugated antibodies raised in donkey against mouse IgG (1h, 1:1000; Invitrogen). Nuclei were stained with 2 µM TOTO-3 in 0.05% Tween-20 in PBS, washed in PBS, before coverslips were mounted with VectaShield (Vector Laboratories), immediately before imaging.

DNA-FISH combined with immunofluorescence of RNAPII or Ezh2

Cryo-immunofluorescence detection of RNAPII-S2p or Ezh2, combined with DNA CryoFISH on ultrathin cryosections (immuno-cryoFISH) was performed essentially as described previously5,8. Briefly, for Fig. 6a-d, ultrathin cryosections (~140-150 nm thick) were first immunolabelled with murine primary antibodies specifically against RNAPII phosphorylated on S2 (2h; H5; IgM, 1:2000; Covance) or with mouse against Ezh2 (2h, 1:50, BD BioSciences 612667), and detected with AlexaFluor680-conjugated donkey antibodies against mouse Ig (1h; 1:500; Invitrogen, UK). The specificity of antibodies to phosphorylated RNAPII was tested by treatment with alkaline phosphatase (2h; 37ºC; 0.5 U/µl; New England Biolabs, Hitchin, UK) or by omission of primary antibodies to show lack of detection of phosphorylated RNAPII in western blots1 and immunofluorescence9 (not shown). After immunolabelling, cryosections were fixed with 8% freshly-depolymerized PFA

3 in PBS (2h), prior to DNA-FISH, to preserve immunocomplexes during FISH. CryoFISH was then performed as described above, and nuclei were stained with 1µg/µl DAPI in 0.05% Tween-20 in PBS, before imaging.

Microscopy and Image Analysis Images were acquired on a confocal laser-scanning microscope (Leica TCS SP5; 63x oil objective, NA 1.4) equipped with a 405 nm diode, and Argon (488 nm), HeNe (543 nm) and HeNe (633 nm) lasers, using pinhole equivalent to 1 Airy disk. Images from different channels were collected sequentially to prevent fluorescence bleed-through. For measurements of interlocus distances, we used a supervised automatic algorithm for loci detection, center of mass estimation and distance calculation. Briefly, after segmentation of nuclear profiles based on the counterstain, loci present within each profile were identified and the coordinates of their center of mass used to measure all pairwise inter-allele (homologous and heterologous) distances. To measure colocalisation between genomic regions and nuclear domains containing active RNAPII-S2p or Polycomb subunit Ezh2, we first contrast stretched raw images in Adobe Photoshop. The association between Hoxb1 and Hoxb13 or Snf8 and Snx11 loci with RNAPII-S2p or Ezh2 (Fig. 4a-d) was scored as ‘co-localised” (signals overlap at least 1 pixel) or ‘separated’5 (signals do not overlap; including adjacent signals).

Mapping of chromatin occupancy datasets

The published ChIP-seq and mRNA-seq datasets used are shown in Supplementary Table 1. UCSC Genome Browser11 (http://genome.ucsc.edu) was used to display single sequencing data (Fig. 1b).

For ChIP-seq, sequenced reads from published datasets were re-aligned against the mouse genome (mm9 assembly, July 2007) using Bowtie212, v2.0.5, with default parameter settings. Replicated reads (i.e. identical reads, aligned to the same genomic location) that occur more often than a threshold, computed as the 95th percentile of the frequency distribution of each dataset, were removed. When reads originated from multiple sequencing runs of the same library, they were merged after mapping and removal of replicated reads.

For mRNA-seq, sequenced reads from a published dataset were aligned against the mouse genome mm9 and the UCSC mm9 Known Gene GTF annotation file using TopHat13, v2.0.8, to allow the detection of reads on exon-exon splice junctions. TopHat was run with default parameter settings for single-ended reads14.

4 Coordinates of CTCF binding sites were obtained from Chen et al. (2008)15. The genomic coordinates were converted from mouse genome version mm8 (Feb. 2006) to version mm9 (July 2007) using the UCSC LiftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). Most (93%) regions were successfully converted.

Datasets for RNAPII-S5p, H2AK119ub1, mock-IP and mRNA-seq were obtained in ES- OS25 cells, grown in the same conditions as in the present study. H3K27me3 datasets were produced by Mikkelsen et al. (2007) in ESC-V6.5 while CTCF binding sites (Chen et al. 2008) were obtained in ES-E1415,16 (Supplementary Table 1).

Promoter classification

Promoters were classified as Active, Poised or Silent according to the presence or absence of RNAPII-S5p, H3K27me3 and H2Aub1 (see main text). We considered all RefSeq within a 1Mb genomic region centred on the HoxB locus. The binary classification was obtained by integrating levels of ChIP-seq enrichment within 2kb windows centred on TSS coordinates, as previously8. TSS coordinates were retrieved from UCSC database of mm9 genome version. The complete list of promoters and classification considered for this study is shown in Supplementary Table 2.

SBS model via Monte Carlo simulations

To model the selected DNA region around the HoxB cluster in mouse ES cells (chr11: 95685813-96650449, 960 kb; mouse genome mm9), we employed the Strings & Binders Switch (SBS) model17,18 where a chromatin filament is represented as a self-avoiding polymer 19 bead chain on a cubic lattice of unitary edge size, d0 . In the present case the chain is made of N=152 beads. The length of each bead (s0) is 7600 bases, i.e., a single bead in a cubic lattice cell of unitary edge size, d0, contains s0 base pairs. In order to compare cryoFISH data with our simulation results, we must set the scale of d0. This is accomplished by estimating the chromatin average density within a lattice polymer bead as equal to the average density of 1/3 chromatin in the cell nucleus. Thus, we have d0 ~ (s0/G) R. Considering the order of magnitude of the average nuclear radius of R=5.0 µm derived from nuclear profiles and of the size of mouse genome, G, we obtain d0 ~0.035µm. The average nuclear radius was determined from the radii of nuclear slices used for cryoFISH using R = π x /2, where is the average radii of nuclear slices. Beads associated with each gene within the HoxB region are positioned at the gene Transcription Start Sites (TSS). The coordinates used are from the mm9 genome assembly (Supplementary Table 2). In the few cases where TSSs are separated by less than 7600bp, they are represented as contiguous beads along the chain. In the SBS simulation conditions

5 chosen, poised genes (red in the Fig. 2a) are attachment sites for the ‘poised’ binding factors (also visualized in red) and, analogously, active genes (green in Fig. 2a) are attachment sites for ‘active’ binding factors. All other (grey) beads are inert, as they have no interactions apart from excluded volume effects (and chain length integrity constraints, see below). The chain total length also includes 12 inert beads added at each end of the polymer in order to avoid finite size effects on the conformation of the portion of the chain corresponding to the HoxB region.

We explored the system behaviour varying the affinity, ER and EG, of poised and active genes, and the concentrations of ‘poised’ and ‘active’ binding molecules, cR and cG. These values are varied in a broad, biologically relevant range17,18. In particular, in the main text and Fig. 2-5 and 7, we refer to the case where ER=EG=4 kBT, a value which is in the range of real transcription factor binding energies (where kB is the Boltzmann constant and T the temperature in Kelvin). The fractions of binding molecules per lattice binding site is related to 3 their molar concentration, cm: c~cm d0 NA (where NA is the Avogadro number). Our binding molecules have a binding multiplicity of six, a value of the order of magnitude of the typical number of polymerases in a Transcription Factory (see Pombo et al. 199920 and references therein). The SBS model with multiplicity 1 (Supplementary Fig. 8a,b,e,f) is a variant of the above model where the maximal number of bonds that binding sites can form with the binders is limited to only one (versus six in the previous version), while binders keep their ability to have up to six bonds with different binding sites. We used the same binding energies used in the previous model (ER=EG=4 kBT), and a concentration of 0.01% ensuring the compaction of the polymer. The model was tested by Metropolis Monte Carlo (MC) computer simulations21,22. Diffusing molecules and polymer beads randomly move from one to a nearest neighbour cell on the lattice, while single site occupancy and polymer non-breaking constraints are maintained. Chemical interactions are only permitted between nearest neighbour particles17,18. For each red and green binding factor concentration value, we produced up to 500 independent equilibrium polymer configurations obtained from initial non-interacting configurations in the self-avoiding-walk state (see Fig. 2b). Each run is fully equilibrated with up to 5x1012 single MC steps21,22.

Contact and frequency matrices The model Contact Matrices, as those shown in Fig. 2c, report the probability of finding a bead pair within a distance of three-lattice spacing, i.e., 3d0. The comparison between the co- segregation frequencies and relative distance of probes in cryoFISH nuclear sections with those of the model requires a specific approach whereby we consider analogous sections cut through our polymer model. As FISH probes are roughly 30kb long, in the SBS model we

6 consider polymer regions with the same length of 3 beads (3s0 ≃ 23kb) centred on the TSS of the specific gene included in the fosmid probe (Supplementary Table 3). Then we cut through the model polymer a “section” (having the same thickness used in the experiments) at a random angle and check whether it contains two, one or no centres of mass. In the cases where both centres of mass are in the section, we compute their distance projected along the longitudinal plane of the section. Then we measure their co-segregation as the probability of finding this distance within 3d0 = 105nm. In the above procedure, we used a section thickness of 150nm corresponding to the one used in real cryosections. Each independent polymer configuration is random-sliced once for each pair of polymer regions; slicing was performed independently for each pair of polymer regions. For comparison between distances obtained from the model and from cryoFISH (Fig. 5 and Supplementary Fig. 7), we corrected for the small contribution of distances between loci in different homologous by excluding distance >1µm, approximately the expected diameter of the territory of 11 (where HoxB is located).

Code Availability Any custom computer codes used to generate results reported in the manuscript that are central to the main claims will be made available to editors and referees upon request. All details of the algorithms are illustrated in the sections above and in previous publications cited therein.

Online Methods References

1. Stock, J.K. et al. Ring1-mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells. Nat Cell Biol 9, 1428-35 (2007). 2. Billon, N., Jolicoeur, C., Ying, Q.L., Smith, A. & Raff, M. Normal timing of oligodendrocyte development from genetically engineered, lineage-selectable mouse ES cells. J Cell Sci 115, 3657-65 (2002). 3. Niwa, H., Miyazaki, J. & Smith, A.G. Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells. Nat Genet 24, 372-6 (2000). 4. Branco, M.R. & Pombo, A. Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol 4, e138 (2006). 5. Ferrai, C. et al. Poised transcription factories prime silent uPA gene prior to activation. PLoS Biol 8, e1000270 (2010). 6. Xie, S.Q., Lavitas, L.M. & Pombo, A. CryoFISH: fluorescence in situ hybridization on ultrathin cryosections. Methods Mol Biol 659, 219-30 (2010). 7. Guillot, P.V., Xie, S.Q., Hollinshead, M. & Pombo, A. Fixation-induced redistribution of hyperphosphorylated RNA polymerase II in the nucleus of human cells. Exp Cell Res 295, 460-8 (2004). 8. Brookes, E. et al. Polycomb associates genome-wide with a specific RNA polymerase II variant, and regulates metabolic genes in ESCs. Cell Stem Cell 10, 157-70 (2012).

7 9. Xie, S.Q. & Pombo, A. Distribution of different phosphorylated forms of RNA polymerase II in relation to Cajal and PML bodies in human cells: an ultrastructural study. Histochem Cell Biol 125, 21-31 (2006). 10. Fraser, J. et al. Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Mol Syst Biol 11, 852 (2015). 11. Kent, W.J. et al. The browser at UCSC. Genome Res 12, 996-1006 (2002). 12. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-9 (2012). 13. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36 (2013). 14. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7, 562-78 (2012). 15. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106-17 (2008). 16. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage- committed cells. Nature 448, 553-60 (2007). 17. Nicodemi, M. & Prisco, A. Thermodynamic pathways to genome spatial organization in the cell nucleus. Biophys J 96, 2168-77 (2009). 18. Barbieri, M. et al. Complexity of chromatin folding is captured by the strings and binders switch model. Proc Natl Acad Sci U S A 109, 16173-8 (2012). 19. Doi, M. & Edwards, S.F. The Theory of Polymer Dynamics, (Oxford University Press, 1988). 20. Pombo, A. et al. Regional specialization in human nuclei: visualization of discrete sites of transcription by RNA polymerase III. EMBO J 18, 2241-53 (1999). 21. Binder, K. Applications of Monte Carlo methods to statistical physics. Reports on Progress in Physics 60, 487-559 (1997). 22. Binder, K. & Heermann, D.W. Monte Carlo Simulation in Statistical Physics, (Springer Berlin Heidelberg, 2002).

8 Active and poised promoter states drive folding of the extended HoxB locus in mouse embryonic stem cells

Supplementary Material

9 Barbieri, Xie, et al., Supplementary Figure 1

Supplementary Figure 1. Schematic representation of the extended HoxB locus and classification of gene promoter states or CTCF occupancy sites. (a) An illustration of the 1Mb genomic region centered on the HoxB gene cluster (chr11: 95685813-96650449; mouse genome mm9). (b) Classification of genes within the HoxB locus according to the transcriptional state of their promoters: promoters classified as Active are shown in green, Poised promoters in red and Silent promoter in grey. (c) Classification of genes based only on the presence of RNAPII-S5p at promoters (blue). (d) Distribution of CTCF binding sites (orange).

10 Barbieri, Xie, et al., Supplementary Figure 2

Energy of binding phase diagram

phase 1 – open phase 2 –Red closed driven by Poised promoters

1 0 Rg/Rg 0.8

0.6

phase 4 – Green and Red closed phase 3 – Green closed driven by Active and Poised promoters driven by Active Promoters 2.0 2.0 4.0 4.0 ER (kBT) EG (kBT)

Supplementary Figure 2. The four stable phases of the homotypic interaction model obtained varying the affinities of binders. The system phase diagram identifies the stable architectural classes of the model, which correspond to its thermodynamics phases, defined by binding affinity.

11 Barbieri, Xie, et al., Supplementary Figure 3

a Model Contact Model Contact Probabilities predicted probability of Probabilities predicted probability of FISH probes co-localization FISH probes co-localization phase 2 –Red closed phase 1 – open driven by Poised promoters

Snf8 Hoxb13 Hoxb1 Ctrl Snx11 Snf8 Hoxb13 Hoxb1 Ctrl Snx11

Snf8 Snf8

Hoxb13 Hoxb13

Hoxb1 Hoxb1

Ctrl Ctrl

Snx11 Snx11

phase 3 – Green closed phase 4 – Green and Red closed driven by Active Promoters driven by Active and Poised promoters

Snf8 Hoxb13 Hoxb1 Ctrl Snx11 Snf8 Hoxb13 Hoxb1 Ctrl Snx11

Snf8 Snf8

Hoxb13 Hoxb13

Hoxb1 Hoxb1

Ctrl Ctrl

Snx11 Snx11

0 0.5 1 0 0.5 1 contact probability contact probability (model) (sub-matrices) b

Mixed population states open (H3K4me3+) phase 2-3 mix (50:50) promoters

Snf8 Hoxb13 Hoxb1 Ctrl Snx11 Snf8 Hoxb13 Hoxb1 Ctrl Snx11

Snf8 Snf8

Hoxb13 Hoxb13

Hoxb1 Hoxb1

Ctrl Ctrl

Snx11 Snx11

c 0 0.5 1 0 0.5 1 contact probability contact probability CTCF binding sites (model) (sub-matrices)

Snf8 Hoxb13 Hoxb1 Ctrl Snx11

Snf8

Hoxb13

Hoxb1

Ctrl

Snx11

12 Barbieri, Xie, et al., Supplementary Figure 3

Supplementary Figure 3. Contact matrices across the HoxB locus simulated using the SBS model. (a) Contact matrices summarize chromatin contacts obtained using the SBS model, which considers homotypic interactions between Active and/or Poised genes. The four larger panels illustrate the predicted full contact probability matrices in the four thermodynamics states of the model. The four smaller panels represent the model-predicted contact frequency matrix restricted to the probes considered in the FISH experiments. (b) Contact matrix results using the SBS model in the cases in which: left, a 50:50 mixture of polymers is considered, composed of polymers in phases 2 and 3; or, right, all promoters genes marked by H3K4me3/RNAPII-S5p can bind with each other (right). (c) The contact matrix predicted by an SBS model based only on contacts that depend on contacts between CTCF binding sites.

13 Barbieri, Xie, et al., Supplementary Figure 4

Chromatin occupancy maps across the Skap1 gene

100 kb 100 kb 15 _ 15

H3K4me3

1 _ 8 _ 8

H3K27me3

1 _ 10

Control

10 _ 10

CTCF

1 _

Snx11 Skap1 Skap1 Control probe Snx11

Supplementary Figure 4. Chromatin occupancy maps across the Skap1 gene in mouse ES cells. To identify a control region for FISH at similar distance to Active or Poised (Polycomb- repressed) windows, we identified a region within the coding region of the Skap1 gene. Although the Skap1 promoter is classified at Polycomb-repressed and occupied by RNAPII- S5p, its long coding region (>200kb) is devoid of specific enrichment for the promoter marks studied. The control region chosen (grey horizontal bar) is highly enriched for CTCF occupancy and is expected to interact extensively in the CTCF-only models but not the promoter state models. Orange vertical lines: CTCF binding sites.

14 Barbieri, Xie, et al., Supplementary Figure 5 a

Nuclear profile identification

Profile area and radius Locus detection and centre of mass determination

Inter-locus distances

b Average Average inter- Locus Nr. nuclear inter-locus locus distance frequency profiles distance from modeling (%) (µm) (µm)

Hoxb13 : Hoxb1 2584 13.9 : 13.0 0.07 0.08

Snf8 : Hoxb13 1872 12.8 : 11.6 0.27 0.16 Snx11 : Control 1535 9.7 : 8.8 0.30 0.20 Snf8 : Snx11 2537 15.6 : 15.0 0.08 0.08

Hoxb1 : Control 2587 7.5 : 8.0 0.30 0.25

Hoxb1 : Snf8 1613 16.1 : 14.9 0.38 0.21

Snf8 : Control 2058 12.2 : 14.5 0.32 0.23

Hoxb13 : Snx11 1870 14.8 : 14.4 0.33 0.16

Hoxb13 : Control 1614 17.3 : 19.4 0.31 0.24

Hoxb1 : Snx11 1544 16.6 : 17.2 0.34 0.18

15 Barbieri, Xie, et al., Supplementary Figure 5

Supplementary Figure 5. Strategy for semi-automated distance measurement in cryoFISH images of genomic loci within the HoxB locus. (a) Summary of the semi-automated approach developed here to locate FISH signals in nuclear profiles from cryosections and to measure their inter-locus distances. (b) Table listing the number of nuclear profiles analysed for the ten pairs of probes tested, the frequency of probe detection and average locus inter-distances from FISH and from the SBS model for the equivalent genomic regions, considering the ensemble of homotypic polymers modelled in phase 4.

16 Barbieri, Xie, et al., Supplementary Figure 6

CryoFISH; Inter-locus distance distributions

80 80 80 Hoxb1 Control Snx11 Control Snf8 Control 70 ~240kb 70 ~170kb 70 ~550kb 60 60 60 50 50 50 40 40 40

30 30 30 20 20 20 Frequency (%) 10 10 10 0 0 0 0 50 100 150 200 >250 0 50 100 150 200 >250 0 50 100 150 200 >250

80 80 80 Hoxb13 Snx11 Snf8 Hoxb1 Hoxb1 Snx11 70 ~580kb 70 ~320kb 70 ~410kb 60 60 60 50 50 50 40 40 40

30 30 30 20 20 20 Frequency (%) 10 10 10 0 0 0 0 50 100 150 200 >250 0 50 100 150 200 >250 0 50 100 150 200 >250 distance (nm) distance (nm) distance (nm)

Supplementary Figure 6. Distance distributions of genomic regions within the HoxB locus. This figure presents the histograms of inter-locus physical distances between different pairs of probes within the HoxB locus that were not shown in Fig. 4. The distance distributions of heterotypic pairs of loci are more spread than the distributions of distances of pairs belonging to the same class (homotypic pairs; Fig. 4). Around 70% of distances are >250 nm for heterotypic pairs.

17 Barbieri, Xie, et al., Supplementary Figure 7

heterotypic interactions control cases

Hoxb13 - Snf8 Snf8 - Control probability

Hoxb1 - Snx11 Hoxb1 - Control probability

Hoxb13 - Snx11 Hoxb13 - Control probability

distance (µm) distance (µm)

CryoFISH Model

Supplementary Figure 7. Comparison of inter-locus distances from cryoFISH imaging of the HoxB locus and the SBS homotypic interaction model. Similar distance distributions were obtained experimentally by cryoFISH from a population of mouse ES cells and by polymer physics using an ensemble of SBS polymers corresponding to the homotypic interaction case, although free fitting parameters were not used to optimise the match. Most (typically ~70%) of heterotypic distances and distances to control probe are much more broader (>150nm). A 50:50 mixture of polymers in phases 2 and 3 is also represented for comparison.

18 Barbieri, Xie, et al., Supplementary Figure 8

a b Multiplicity 6 Multiplicity 1 Polymer binding sites Polymer binding sites Frequency of interaction from can form only 1 bond can form up to 6 bonds cryo-FISH versus: correlation with binders with binders

SBS multiplicity 6, phase 4 0.97 SBS multiplicity 6, mix phase 2-3 0.94 SBS multiplicity 1 0.78 SBS multiplicity 6, phase 1 -0.17

Binding site Bond not Binder allowed

c d SBS multiplicity 6 SBS multiplicity 6 phase 4 Mixed population states phase 4 phase 1 phase 2-3 mix (50:50)

Snf8 Snf8 Snf8 Snf8 Hoxb13 Hoxb13 Hoxb13

Hoxb13 Hoxb1 Hoxb1 Hoxb1

Ctrl Hoxb1 Ctrl Ctrl

Snx11 Snx11 Snx11 Ctrl

Snx11 0 0.5 1 probability

e f g SBS multiplicity 1 SBS multiplicity 1 cryoFISH

Snf8 Snf8

Snf8 Hoxb13 Hoxb13

Hoxb13 Hoxb1 Hoxb1

Ctrl Ctrl Hoxb1 Snx11 Snx11

Ctrl

Snx11

19 Barbieri, Xie, et al., Supplementary Figure 8

Supplementary Figure 8. Testing different modelling parameters of the SBS model: an analysis of correlations with cryoFISH results. (a) Scheme of two different conditions of polymer modelling considered in our study, which differ by the binding multiplicity of the polymer beads. In the case of multiplicity equal to 1 (right), the polymer architectures produced can have different compared with the high multiplicity case (left). (b) Table listing the Pearson correlation coefficients between the contact matrix obtained experimentally from single cell imaging of genomic positioning across the extended HoxB locus and the matrices calculated from ensembles of folded polymers obtained by different variants of the SBS model. The model with multiplicity 6 correlates better in comparison with cryoFISH experimental results than the one with multiplicity 1. The case corresponding to ‘phase 4’ is preferred to the ‘mix phase 2-3’ case (i.e., a 50-50 mixture of phase 2 and 3; panel d) as seen by comparing the specific values of their contact matrices with those obtained by FISH (panel g). (c) The full contact probability matrix for the case with multiplicity 6 shows the formation of both local clustering and long-range contacts. (d) Comparison of the mini-contact matrices obtained with the SBS model using multiplicity 6 in phase 4, mix phase 2-3 and phase 1, relative to the five polymer positions corresponding to the five FISH probes. (e) The contact probability matrix for the case with multiplicity 1 lacks longer-range contacts, namely between the two active (green) loci containing Snx11 and Snf8. Note that these mini-matrices were produced with more signal resolution than in previous figures, to allow more sensitivity in the calculations of correlations between matrices. (f) The mini-contact matrix in the model with multiplicity 1. (g) The contact matrix in our cryoFISH experiments for the five FISH probes calculated also with more signal resolution, to allow sensitive calculations of correlations with SBS matrices.

20 Supplementary Tables

Supplementary Table 1. List of published genome-wide ChIP-seq and mRNA-seq datasets.

Dataset Reference Cell line GEO identifier ChIP-seq RNAPII-S5p Brookes et al. 20128 ES-OS25 GSM850467 ChIP-seq H3K4me3 Mikkelsen et al. 200716 ESC-V6.5 GSM307618 ChIP-seq H3K27me3 Mikkelsen et al. 200716 ESC-V6.5 GSM307619 ChIP-seq H2AK119ub1 Brookes et al. 20128 ES-OS25 GSM850471 ChIP-seq Mock IP (Control) Brookes et al. 20128 ES-OS25 GSM850473 ChIP-seq CTCF Chen et al. 200815 ES-E14 GSM288351 mRNA-seq Brookes et al. 20128 ES-OS25 GSM850476

Supplementary Table 2. List of 28 promoters classified poised, active or silent according to ChIP-seq and mRNA-seq datasets in Supplementary Table 1.

Gene Symbol TSS RNAPII-S5p H2Aub1 H3K27me3 Classification Phospho1 95685813 true true true Poised Abi3 95703474 true false false Active Gngt2 95703608 true false false Active AK133925 95720178 true false false Active B4galnt2 95776185 true true true Poised Igf2bp1 95867226 true false false Active Gip 95885858 false false false Silent Snf8 95896230 true false false Active Ube2z 95926678 true false false Active Atp5g1 95936975 true false false Active Calcoco2 95973276 true false false Active Ttll6 95995532 true false false Active Hoxb13 96055674 true true true Poised AK078606 96064767 true true true Poised AK078566 96113082 true true true Poised Hoxb9 96132643 true true true Poised Hoxb8 96143218 true true true Poised Hoxb7 96147959 true true true Poised Hoxb6 96160484 true true true Poised Hoxb5 96164825 true true true Poised AK002860 96168224 true true true Poised Hoxb4 96180052 true true true Poised Hoxb3 96205082 true true true Poised Hoxb2 96212945 true true true Poised Hoxb1 96227071 true true true Poised Skap1 96325937 true false true Poised Snx11 96638845 true false false Active Cbx1 96650449 true false false Active

Supplementary Table 3. List of used probe regions for cryoFISH imaging.

Interprobe Distances (bp) Probe Name Coordinates (bp) Hoxb13 Hoxb1 Control Snx11 (fosmid code) chr11:start-end (WI1- (WI1- (WI1- (WI1- 2076M19) 376M13) 809P17) 1523P20) Snf8 95894328 95937161 150605.5 318688 553984.5 727376.5 (WI1-696E14) Hoxb13 96046408 96086292 168082.5 403379 576771 (WI1-2076M19) Hoxb1 96214206 96254659 235296.5 408688.5 (WI1-376M13) Control 96450866 96488592 173392 (WI1-2809P17) Snx11 96620769 96665473 (WI1-1523P20)

Supplementary Table 4. List of primers used to detect the specificity of Fosmid FISH probes.

Fosmids the PCR product identifies Primer Sequence (5’ – 3’) Snf8 Forward CACCACTGCCTGGGTATTCT Snf8 Reverse GGGTTCTCAGCTTTGTGCTC Hoxb13 Forward TCTGCAGGTGTCTAGCATGG Hoxb13 Reverse TGACTCTGCTCTGCTCTGGA HoxB1-2 Forward AAGGCTAGCTTTCCCAGAGG HoxB1-2 Reverse AGGGGCTAACAGTTGGAGGT Skap1 Forward TGCTTTCTACAGTCTCCTTTTGTCT Skap1 Reverse CAAATGCTAAAGCTACTGACACAGA Snx11/Cbx1 Forward CTCCGTTTCCAACAACCACT Snx11/Cbx1 Rev GCTTCAGATGGGATGGAAAA

22