US 2017O 130247A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2017/0130247 A1 DOWen et al. (43) Pub. Date: May 11, 2017

(54) COMPOSITIONS AND METHODS FOR Related U.S. Application Data ALTERING EXPRESSION (60) Provisional application No. 62/234.770, filed on Sep. (71) Applicant: Whitehead Institute for Biomedical 30, 2015. Research, Cambridge, MA (US) Publication Classification (72) Inventors: Jill M. Dowen, Chapel Hill, NC (US); (51) Int. C. Zi Peng Fan, Waltham, MA (US); CI2N 15/90 (2006.01) Denes Hnisz. Cambridge, MA (US); (52) U.S. C. Tong Ihn Lee, Somerville, MA (US); CPC ...... CI2N 15/907 (2013.01); C12N 2800/80 Richard A. Young, Boston, MA (US) (2013.01) (21) Appl. No.: 15/282,905 (57) ABSTRACT Provided herein are improved compositions and methods for (22) Filed: Sep. 30, 2016 the directed control of gene expression. Patent Application Publication May 11, 2017. Sheet 1 of 43 US 2017/O130247 A1

Figare i

i00,000

Chronosoines:

{},{}{}{}

{{E} opologicaily s Associating otains {{

Gene loops

Naceosonies Patent Application Publication May 11, 2017. Sheet 2 of 43 US 2017/O130247 A1

Figure :

S. Kic

Enhanceti 4% Promoter

3% Polyconi

S9, the Patent Application Publication May 11, 2017. Sheet 3 of 43 US 2017/O130247 A1

Figure

Reads {-aight 4{{ } {{}{.{{}{}

Unique PETs 9,442.89)

PET peaks & & ... & & & ...& 8...&... & f68.88: }

interactions annig PET peaks High-coficience: iteraction agg FE peaks (empirical FER<-0.0i, 3. --- "C" ·&&.8.xxx...... oxoi.8°8'''88's'ss's48&S SMC “tookook os---to do OSN &: : or Super-enhance :::: 88:

Patent Application Publication May 11, 2017. Sheet 5 of 43 US 2017/O130247 A1

Figar: 2

Hi-C interaction frequencies

opoiogicaiy Associating 8) nat:

High-confidence iteraction

* Patent Application Publication May 11, 2017. Sheet 6 of 43 US 2017/O130247 A1

Figure 2 3.

8

o 3. & 8. 40 s 3 & O is: w88:3

(C.

2% 88% beyond within

TAS A38 Patent Application Publication May 11, 2017. Sheet 7 of 43 US 2017/O130247 A1

Figare 2

3. S

E.

High-confidence iteraction

&: '8; 8.3388& 8 £3 xxiiciaty 8:33rtiary Patent Application Publication May 11, 2017. Sheet 8 of 43 US 2017/O130247 A1

Figare 3

gx38xicaisy &883::::::g. 88ai

& s

; : . . . . . i ii. ------& xxx s'. 8.8 3 s : &

{{xxixg:kaiy &ssocia: 3883i:

&::::::: 8: ki::ity (388 Patent Application Publication May 11, 2017 Sheet 9 of 43 US 2017/O130247 A1

38 v. 2: 43 Patent Application Publication May 11, 2017. Sheet 10 of 43 US 2017/0130247 A1

: 8

8.

8. :

33:888sy d 38.3888; Patent Application Publication May 11, 2017 Sheet 11 of 43 US 2017/O130247 A1

Patent Application Publication May 11, 2017. Sheet 12 of 43 US 2017/0130247 A1

Figure 4

A.

3888-8::::3::ce. Eg:

.

Patent Application Publication May 11, 2017. Sheet 13 of 43 US 2017/0130247 A1

Figure 4

83 & &::::::::::::::::::::::::::::::::::

3:33

s 8-8--

88:88.

Patent Application Publication May 11, 2017 Sheet 14 of 43 US 2017/O130247 A1

Figare 4 8.

f 3:38:8 8 :::::::::

;----~--~~~~~~*~*~~~~~~~~~~~~****~~~~~~~~~~*** Patent Application Publication May 11, 2017. Sheet 15 of 43 US 2017/0130247 A1

Figure 4

M 8:38-:388:88): 38:::::::::

d

Patent Application Publication May 11, 2017. Sheet 16 of 43 US 2017/0130247 A1

Figure 4

&-88::::::: 3:3:

Patent Application Publication May 11, 2017. Sheet 17 of 43 US 2017/0130247 A1

Figure 5 A.

*::::::::::::::::::::::

» x&g::::::::x: 8x8:3388:8: i: 888 {3888 Patent Application Publication May 11, 2017. Sheet 18 of 43 US 2017/0130247 A1

Figare 5

------

88:3:

8:3 Patent Application Publication May 11, 2017. Sheet 19 of 43 US 2017/0130247 A1

Figure 5

3.

r: stews p boundary bo::ciary Patent Application Publication May 11, 2017. Sheet 20 of 43 US 2017/0130247 A1

Figares

$388:383

3:38

Y s 3.

88: s - Patent Application Publication May 11, 2017. Sheet 21 of 43 US 2017/0130247 A1

Figare 5

:: S8.s:

8 s: Patent Application Publication May 11, 2017. Sheet 22 of 43 US 2017/0130247 A1

i Patent Application Publication May 11, 2017 Sheet 23 of 43 US 2017/O130247 A1

Patent Application Publication May 11, 2017. Sheet 25 of 43 US 2017/O130247 A1

Figtre 7 (Figare Si

No.ººººººººººººººº!

7% Enhancer Patent Application Publication May 11, 2017. Sheet 26 of 43 US 2017/0130247 A1

Figure 7 Figare Si

Patent Application Publication May 11, 2017. Sheet 27 of 43 US 2017/0130247 A1

Figarei (Figure Si)

g 38:38::::::::::: P a. 838-3. a m . s4 &::*-8 & oc or - 2 2 88-8 - K r 88. K3 & 38.x: Rep; on Seif-tigation E. count 3'er kb bins (r.p.k.a. }

x-, s 8 8.

8: Patent Application Publication May 11, 2017. Sheet 28 of 43 US 2017/0130247 A1

Figre 7 (Figure Si

Non-chrei PE {Chineric PET

a: - 3338.

: & 38. 38 33. :8 Geronic 8ian Genomic span

it of PETs per interaction Patent Application Publication May 11, 2017. Sheet 29 of 43 US 2017/0130247 A1

Figure 7 (Figure S. G.

interactions High-confidence interactions {erapiricai FOR <30.8, 3:Es)

82% 8% - 39%

ter-chronixsotia 33-chroniososia tier-chronosoina intra-chioiosonkai

Saiitiation of the Sw: , Ch;A-PET Library

Fraction of PETs Reiative to Current ataset Size Patent Application Publication May 11, 2017. Sheet 30 of 43 US 2017/0130247 A1

Figare 8 (iigstre S2)

3::::::::::::::: &x:3:33:33:3:38

ix::c:gic:y 8888ciatig ::::::::::::

:::::::::: Patent Application Publication May 11, 2017. Sheet 31 of 43 US 2017/0130247 A1

Figure 8 Figtre S2)

33::::::

*:::::::38: $23888; 8: 83. 2333}

::::::

3:3:

& 8:8::::::::::8 88:38-3:8:::::::::: 3i. 23:3:

Patent Application Publication May 11, 2017. Sheet 32 of 43 US 2017/0130247 A1

Figs re 9 (Figtse S3}

seases 8:x: 3:33:8:33::::::::

Six:8:

& ...wik...... wikowow. & k sax Patent Application Publication May 11, 2017. Sheet 33 of 43 US 2017/0130247 A1

Figare 9 (Figure S3)

& 838:33:3:3:

M 8:88-8383:38:338:3

&: kx sixko www.s.

S3888-8:33:3: Ex:3:

i:88s &:s:x:::::::::::::::::::::

to so &:83-8:33:30:8:

3888... : Patent Application Publication May 11, 2017. Sheet 34 of 43 US 2017/0130247 A1

Figure 9 (Figure S3

mem &:g::::::::::::::88: 3:3:38

Patent Application Publication May 11, 2017. Sheet 35 of 43 US 2017/0130247 A1

Figare 9 (Figure S3

::::::::::::::::::y;&&8:::::::::::33:8:

&::::::::::::::::ce: 3:38: Patent Application Publication May 11, 2017. Sheet 36 of 43 US 2017/0130247 A1

Figure 8 (Figure S4

8:88-8:38& $3838 3C-888&gatx 3:23:8388

: {38:

ra sig3:8:8; 88-388-388

33 Patent Application Publication May 11, 2017. Sheet 37 of 43 US 2017/0130247 A1

Figure 8 (Figure S4

win $:383-8:38:3: 3883: -j-i-m-i-. - s 38-8errogated 8:3:33:38

3.33. :::::::S:8: 3x38:38 Patent Application Publication May 11, 2017. Sheet 38 of 43 US 2017/0130247 A1

Figarei (Figure S4

wix type ESC

3C 3 trix-388-38.

sSS

s

*::::::::::::::y :::::::::::: k:gixxist: 88:88: ::::::::: 3:x: Patent Application Publication May 11, 2017. Sheet 39 of 43 US 2017/0130247 A1

Figare ii (Figre S5)

3::::::::::::::::::y 8888c:::::::::33:::::::

t 38

---......

Sssssssssssssss

ii; Patent Application Publication May 11, 2017. Sheet 40 of 43 US 2017/0130247 A1

Figare (Fig are S5)

3:388x338; 8.388x333i: 38883:

hoto-8: $8ycory: s38::::::: -

X:::::::::::

3:8338c:38;33 8:::::::::::::::::::38:8w

Patent Application Publication May 11, 2017. Sheet 41 of 43 US 2017/0130247 A1

Figure ii (Figure S5)

x:3:38:8:

Patent Application Publication May 11, 2017. Sheet 42 of 43 US 2017/0130247 A1

Figtre i (Figare S5)

::::::::::::::::::y 8$88:a::g .3388:

$3:::::::::

33;x:::::::::::::: *$88:::::::::::::38:8

::::::::::::8& Patent Application Publication

& $$$84

US 2017/O 130247 A1 May 11, 2017

COMPOSITIONS AND METHODS FOR 2012: Filippova et al., 2014; Gibcus and Dekker, 2013; ALTERING GENE EXPRESSION Naumova et al., 2013: Nora et al., 2012). 0009 TADs, also known as topological domains, are CROSS-REFERENCE TO RELATED defined by DNA-DNA interaction frequencies, and their APPLICATION(S) boundaries are regions across which relatively few DNA DNA interactions occur (Dixon et al., 2012: Nora et al., 0001. This application claims the benefit of U.S. Provi 2012). TADs average 0.8 Mb, contain approximately seven sional Application No. 62/234,770, filed Sep. 30, 2015. The -coding genes, and have boundaries that are shared entire teachings of the above application is incorporated by the different cell types of an organism (Dixon et al., 2012: herein by reference. Smallwood and Ren, 2013). The expression of genes within aTAD is somewhat correlated, and thus some TADs tend to GOVERNMENT SUPPORT have active genes and others tend to have repressed genes 0002 This invention was made with government support (Cavalli and Misteli, 2013; Gibcus and Dekker, 2013: Nora under Grant Number HG002668 awarded by the National et al., 2012). Institutes of Health. The government has certain rights in the 0010 Gene loops and other structures within TADs are invention. thought to reflect the activities of factors (TFs), cohesin, and CTCF (Baranello et al., 2014: Gorkin et al., FIELD OF THE INVENTION 2014; Phillips-Cremins et al., 2013: Seitan et al., 2013; Zuin et al., 2014). The structures within TADs include cohesin 0003. The present invention generally relates to methods associated enhancer-promoter loops that are produced when of selectively altering gene expression within, for example, enhancer-bound TFs bind cofactors such as Mediator that, in insulated neighborhoods formed by the looping of two turn, bind RNA polymerase II at promoter sites (Lee and CTCF interaction sites occupied by cohesion. Young, 2013: Lelli et al., 2012: Roeder, 2005: Spitz and Furlong, 2012). The cohesin-loading factor NIPBL binds REFERENCE TO LENGTHY TABLE Mediator and loads cohesin at these enhancer-promoter 0004. The specification includes lengthy Tables: Table loops (Kagey et al., 2010). Cohesin also becomes associated S1E and Table S2A. Lengthy Table S1E has been submitted with CTCF-bound regions of the genome, and some of these via EFS-Web in electronic format as follows: File name: cohesin-associated CTCF sites facilitate gene activation S1ETBL.txt, Date created: Sep. 30, 2016; File size: 2,482, while others may function as insulators (Dixon et al., 2012: 827 Bytes and is incorporated herein by reference in its Parelho et al., 2008: Phillips-Cremins and Corces, 2013: Seitan et al., 2013: Wendt et al., 2008). entirety. Lengthy Table S2 has been submitted via EFS-Web 0011. The structures anchored by Mediator in electronic format as follows: File name: S2ATBL.txt, and cohesin are thought to be mostly cell-type-specific, Date created: Sep. 30, 2016; File size: 360,209 Bytes and is whereas those anchored by CTCF and cohesin tend to be incorporated herein by reference in its entirety. larger and shared by most cell types (Phillips-Cremins et al., 0005. Please refer to the end of the specification for 2013; Seitan et al., 2013). Despite this picture of cohesin access instructions. associated enhancer-promoter loops and cohesin-associated CTCF loops, we do not yet understand the relationship BACKGROUND OF THE INVENTION between the transcriptional control of cell identity and the 0006 Embryonic stem cells depend on active transcrip sub-TAD structures of that may contribute to tion of genes that play prominent roles in pluripotency (ES this control. Furthermore, there is limited evidence that the cell identity genes) and on repression of genes encoding integrity of Sub-TAD structures is important for normal lineage-specifying developmental regulators (Ng and expression of genes located in the vicinity of these struc Surani, 2011: Orkin and Hochedlinger, 2011: Young, 2011). tures. The master transcription factors (TFs) OCT4, SOX2, and 0012 To gain insights into the cohesin-associated chro NANOG (OSN) form super-enhancers at most cell identity mosome structures that may contribute to the control of genes, including those encoding the master TFs themselves; pluripotency in ESCs, we generated a large cohesin ChIA these Super-enhancers contain exceptional levels of tran PET data set and integrated this with other genome-wide Scription apparatus and drive high-level expression of asso data to identify local structures across the genome. ciated genes (Hnisz et al., 2013: Whyte et al., 2013). 0013 The results show that super enhancer-driven cell 0007 Maintenance of the pluripotent ESC state also identity genes and repressed genes encoding lineage-speci requires that genes encoding lineage-specifying develop fying developmental regulators occur within insulated mental regulators remain repressed, as expression of these neighborhoods formed by the looping of two CTCF inter genes can stimulate differentiation and thus loss of ESC action sites occupied by cohesin. identity. These repressed lineage specifying genes are occu 0014 Perturbation of these structures demonstrates that pied by polycomb group in ESCs (Boyer et al., their integrity is important for normal expression of genes 2006; Lee et al., 2006; Margueron and Reinberg, 2011; located in the vicinity of the neighborhoods. Squazzo et al., 2006). The ability to express or repress these key genes in a precise and Sustainable fashion is thus SUMMARY OF THE INVENTION essential to maintaining ESC identity. 0015 The present disclosure provides compositions and 0008 Recent pioneering studies of mammalian chromo methods for regulating gene expression in a directed fash Some structure have Suggested that they are organized into ion. a hierarchy of units, which include topologically associating 0016. In one embodiment is provided a method of alter domains (TADs) and gene loops (FIG. 1A) (Dixon et al., ing the expression of a gene in an insulated neighborhood US 2017/O 130247 A1 May 11, 2017

(IN) of the genome of a cell comprising contacting an ChIP-seq data for SMC1, a merged data set for the tran organism comprising said cell with a gene modulatory scription factors OCT4, SOX2, and NANOG (OSN), molecule. Such molecules include, but are not limited to, MED12, RNA polymerase II (Po12), H3K27me3, and CTCF Small molecules, lipid, proteins, peptides, nucleic acids, at SMC1-occupied regions. Read density is displayed within such as RNA, DNA or any modified version thereof, and a 10 kb window, and color scale intensities are shown in combinations thereof. rpm/bp. Cohesin occupies three classes of sites: enhancer 0017. In one embodiment, expression of the gene is promoter sites, polycomb-occupied sites, and CTCF-occu increased. pied sites. (C) ESC cohesin (SMC1) ChIA-PET data analy 0.018. In one embodiment, the cell is selected from the sis at the Mycn locus. The algorithm used to identify group consisting of stem cells, bone marrow cells, testis paired-end tags (PETS) is described in detail in the Extended cells, olfactory cells, lung cells, thymus cells, cells of the Experimental Procedures. PETs and interactions involving central nervous system, cells of the brain, spleen cells, MEF enhancers and promoters within the window are displayed at cells, MEL cells, heart cells, somatic cells of the limbs, liver each step in the analysis pipeline: unique PETs, PET peaks, cells, and kidney cells. interactions between PET peaks, and high-confidence inter 0019. In one embodiment, the cells are stem cells and actions supported by at least three independent PETs and said stem cells are embryonic stem cells. with a false positive likelihood of <1% (see Extended 0020. In one embodiment, the insulated neighborhood Experimental Procedures). (D) Summary of the major comprises a topologically active domain (TAD). classes of interactions and high-confidence interactions 0021. In one embodiment, the topologically active identified in the cohesin ChIA-PET data. Enhancers, pro domain is a Super-enhancer domain (SD) and Such SDS may moters, and CTCF sites where interactions occur are dis be selected from any known SD or any disclosed herein such played as blue circles, and the size of the circle is propor as those in Table S4A and S4B. tional to the number of regions. The interactions between 0022. In one embodiment, the gene is selected from the two sites are displayed as gray lines, and the thickness of the group consisting of those in Table S4C. gray line is proportional to the number of interactions. The 0023. In one embodiment is provided a method of alter diagram on the left was generated using the interactions, and ing the expression of a gene located in an insulated neigh the diagram on the right was generated using the high borhood (IN) of the genome of a cell comprising altering the confidence interactions. See also Figures S1 and S2 and sequence of one or more of the CTCF boundaries of said Tables S1 and S2. insulated neighborhood. 0029 FIG. 2 DNA Interactions Frequently Occur within 0024. In one embodiment, the CTCF boundary is altered Topologically Associating Domains. (A) An example TAD via CRISPR technology. shown with normalized Hi-C interaction frequencies is 0025. Such alteration may involve either or both of the displayed as a two-dimensional heat map (Dixon et al., boundaries of the insulated neighborhood. 2012), and the TAD is indicated as a gray bar. High 0026. Additional embodiments of the present composi confidence SMC1 ChIA-PET interactions are depicted as tions and methods, and the like, will be apparent from the blue lines. (B) Enrichment of CTCF, cohesin (SMC1), and following description, drawings, examples, and claims. As PET peaks at TAD boundary regions. The metagene repre can be appreciated from the foregoing and following sentation shows the number of regions per 10 kb window description, each and every feature described herein, and centered on the TAD boundary, and +500 kb is displayed. each and every combination of two or more of Such features, (C) Pie chart of high-confidence interactions that either fall is included within the scope of the present disclosure pro within TADs (88%) or across TAD boundaries (12%). (D) vided that the features included in such a combination are High-confidence interactions are displayed as a two-dimen not mutually inconsistent. In addition, any feature or com sional heatmap across a normalized TAD length for the bination of features may be specifically excluded from any ~2,200 TADs (Dixon et al., 2012). The display is centered on embodiment of the present invention. Additional aspects and the normalized TAD and extends beyond each boundary to advantages of the present invention are set forth in the 10% of the size of the domain. See also Table S3A. following description and claims, particularly when consid ered in conjunction with the accompanying examples and 0030 FIG. 3 Super-Enhancer Domain Structure. (A) An drawings. example super-enhancer domain (SD) within a TAD. High confidence SMC1 ChIA-PET interactions are depicted as BRIEF DESCRIPTION OF THE DRAWINGS blue lines. ChIP-seq binding profiles (reads per million per ) for CTCF, cohesin (SMC1), and the master 0027. The patent or application file contains at least one transcription factors OCT4, SOX2, and NANOG (OSN) are drawing executed in color. Copies of this patent or patent shown at the Lefty 1 locus in ESCs. The super-enhancer is application publication with color drawing(s) will be pro indicated by a red bar. (B) Model of SD structure. The 197 vided by the Office upon request and payment of the SDs have interactions (blue) between cohesin-occupied necessary fee. CTCF sites that may serve as outer boundaries of the domain 0028 FIG. 1 DNA Interactions Involving Cohesin (A) structure. SDS also contain interactions between Super Units of chromosome organization. Chromosomes consist of enhancers and the promoters of their associated genes. (C) multiple topologically associating domains (TADS). TADs Metagene analysis showing the occupancy of various factors (image adapted from Dixon et al., 2012) contain multiple at the key elements of TADs and SDs, including CTCF sites, genes with DNA loops involving interactions between Super-enhancers, and Super enhancer-associated genes. enhancers, promoters, and other regulatory elements, which ChIP-seq profiles are shown in reads per million per base are mediated by cohesin (blue ring) and CTCF (purple pair. Boundary site metagenes are centered on the CTCF balls). Nucleosomes represent the smallest unit of chromo peak, and t2 kb is displayed. Super-enhancer metagenes are some organization. (B) Heatmap representation of ESC centered on the 197 super-enhancers in SDs, and +3 kb is US 2017/O 130247 A1 May 11, 2017

displayed. The data for associated genes are centered on the the 120 PDs with this transition pattern. (F) CRISPR 219 Super-enhancer-associated genes in SDS, and t3 kb is mediated genome editing of a CTCF site at the Tcfap2e displayed. (D) Heatmap showing that cohesin ChIA-PET locus. (Top) A high-confidence interaction is depicted by a high-confidence interactions occur predominantly within the blue line, and ChIP-seq binding profiles (reads per million SDs. The density of high-confidence interactions is shown per base pair) for CTCF, cohesin (SMC1), and H3K27me3 across a normalized SD length for the 197SDs. (E) Heatmap are shown in ESCs. (Bottom) Expression level of the indi showing that transcriptional proteins are contained within cated genes in wildtype and CTCF site-deleted cells mea boundary sites of SDs. The occupancy of Mediator sured by qRT-PCR. Transcript levels were normalized to (MED12), H3K27ac, and RNA polymerase II (Pol2) at GAPDH. Gene expression was assayed in triplicate in at Super-enhancers and associated genes is shown across a least two biological replicate samples and is displayed as normalized SD length for the 197 SDs. See also Figure S3 mean+SD (p<0.05, Tcfap2e in C1 deletion cells; p<0.001, and Table S4. Tcfap2e in C2 deletion cells in wild-type versus CTCF 0031 FIG. 4 Super-Enhancer Domains Are Functionally site-deleted). p values were determined using the Students Linked to Gene Expression. CRISPR-mediated genome t test. See also Figure S5 and Table S5. editing of CTCF sites at five loci. The top of each panel 0033 FIG. 6 Insulated Neighborhoods Are Preserved in shows high-confidence interactions depicted as blue lines Multiple Cell Types. (A) Model depicting constitutive and ChIP-seq binding profiles (reads per million per base domain organization, mediated by interaction of two CTCF pair) for CTCF, cohesin (SMC1), and OCT4, SOX2, and sites co-occupied by cohesin, in two cell types. (B) An NANOG (OSN) in ESCs at the respective loci. The super example SD in ESCs and a domain in NPCs. High-confi enhancer is indicated as a red bar. The bottom of each panel dence interactions from the SMC1 ChIA-PET data set are shows gene expression level of the indicated genes in depicted by blue lines, and 5C interactions from Phillips wild-type and CTCF site-deleted cells measured by qRT Cremins et al. (2013) are depicted by black lines. Super PCR. Transcript levels were normalized to GAPDH. Gene enhancers are indicated by red bars. ChIP-seq binding expression was assayed in triplicate in at least two biological profiles (reads per million per base pair) for CTCF, cohesin replicate samples and is displayed as mean+SD. All p values (SMC1), OCT4, SOX2, NANOG (OSN), SOX2, and BRN2 were determined using the Student's t test. (A) CRISPR are shown at the Nanog locus and the Olig1/Olig2 locus in mediated genome editing of a CTCF site at the miR-290-295 ESCs and NPCs. (C) Occupancy of CTCF peaks across 18 locus (p<0.001, Pri-miR-290-295, and Nlrp12 in wild-type cell types. The CTCF peaks used for the analysis are the versus CTCF site-deleted). (B) CRISPR-mediated genome CTCF peaks found in ESCs. The percentage of these peaks editing of a CTCF site at the Nanog locus (p<0.05, Nanog that are observed in the indicated number of cell types is in wild-type versus CTCF site-deleted). (C) CRISPR-medi shown for four groups of CTCF sites: all CTCF peaks ated genome editing of a CTCF site at the Tagfl locus identified in ESCs, CTCF peaks at SD boundaries in ESCs, (p<0.001, GmS90; p-0.01, Lrrc2) in wild-type versus CTCF CTCF peaks at PD boundaries in ESCs, and CTCF peaks at site-deleted). (D) CRISPR-mediated genome editing of a PET peaks (identified by SMC1 ChIA-PET in ESCs). See CTCF site at the Pou5fl locus (p<0.012, H2O-10 in wild also Figure S6 and Table S3B. type versus CTCF site-deleted). (E) CRISPR-mediated 0034 FIG. 7 (Figure S1). PET Quality Assessment and genome editing of CTCF sites at the Prdm14 locus (p<0.001, Interactions, Related to FIG. 1. (A) Heatmap representation Slco5a1 in wild-type versus CTCF site-deleted). The CTCF of ESC ChIP-seq data for the combination of the master deletion lines at the Pou5fl and Pram14 (C1-2) loci are transcription factors OCT4, SOX2 and NANOG (OSN), heterozygous, whereas the CTCF-deletion lines at the MED12, RNA polymerase II (Pol2), CTCF, and SMC1 at Nanog, Tagfl, and miR-290-295 loci are homozygous for promoters and enhancers in ESCs. Read density is displayed the . See also Figure S4. within a 10 kb window and color scale intensities are shown 0032 FIG. 5 Polycomb Domain Structure. (A) An in rpm/bp. (B) Heatmap representation of ESC ChIP-seq example polycomb domain (PD) within a TAD. A high data for the combination of the master transcription factors confidence interaction is depicted as the blue line. ChIP-seq OCT4, SOX2 and NANOG (OSN), MED12, RNA poly binding profiles (reads per million per base pair) for CTCF, merase II (Po12), CTCF, and SMC1 at CTCF-bound sites in cohesin (SMC1), and H3K27me3 at the Gata2 locus in ESCs. Read density is displayed within a 10 kb window and ESCs. (B) Model of PD structure. The 349 PDs have color scale intensities are shown in rpm/bp. (C) Scatter plot interactions (blue) between CTCF sites that serve as putative showing the number of non self-ligation PETs in 10 kb boundaries of the domain structure. (C) Metagene analysis windows in replicates in reads per million mapped reads per reveals the occupancy of various factors at the key elements kilobase. (D) Bar graph showing the percentage of high of TADs and PDs: CTCF sites and target genes. ChIP-seq confidence interactions from one replicate of the SMC1 profiles are shown in reads per million per base pair. ChIA-PET supported by interactions in the other replicate. Boundary site metagenes are centered on the CTCF peak, (E) Left, scatter plot showing the frequency of non-chimeric and t2 kb is displayed. The metagenes depicting genes are PETs with homodimeric linkers against PET genomic span centered on the 380 polycomb target genes in PDs, and +3 in increments of 100 bp. The curve Suggests a distance kb is displayed. (D) Heatmap showing that high-confidence cut-off at 4 kb, below which the PETs may originate from interactions are largely constrained within PDs. The density self-ligation of DNA ends from a single chromatin fragment of high-confidence interactions is shown across a normal in the ChIA-PET protocol. Right, scatter plot showing ized PD length for the 349 PDs. (E) Heatmap showing that chimeric PET frequencies with heterodimeric linkers against polycomb proteins are contained within boundary sites of PET genomic span in increments of 100 bp, Suggesting PDs. The occupancy of CTCF, H3K27me3, SUZ12, and chimeric PETs were more uniformly distributed across dif EZH2 is indicated within a 20 kb window centered on the ferent genomic spans. (F) Bar graph showing the percentage left, and right CTCF-occupied boundary region is shown for of interactions called by requiring different numbers of US 2017/O 130247 A1 May 11, 2017

chimeric and non-chimeric PETs. All PET interactions 000-24,360,000. Genomic coordinates for the Klf) and called using chimeric PETs that are supported by at least 3 binding profiles are chr19:23,068.300-23,273.400. (E) PETs have a false positive likelihood <1% (see Extended Genomic coordinates for the Id1 TAD are chr2:152,440,000 Experimental Procedures). (G) Diagram showing the fre 152,680,000. Genomic coordinates for the Id1 binding pro quency of intrachromosomal and interchromosomal interac files are chr2:152,511,000-152,581,000. (F) Genomic coor tions in the interaction (left) and high confidence interaction dinates for the Pou.5fl TAD are chr17: 35,600,000-36,080, data set (right). (H) Saturation analysis of the SMC1 ChIA 000. Genomic coordinates for the Pou5f1 binding profiles PET data set. Subsampling of various fractions of PETs are chr17:35,617.300-35,649,800. (G) Genomic coordinates within the merged ChIA-PET data set was performed, and for the Trim28 TAD are chr7:13,000,000-13,640,000. the number of unique genomic positions of intrachromo Genomic coordinates for the Trim28 binding profiles are somal PETs beyond the self-ligation distance cutoff of 4 kb chr7:13,590,396-13,620,304. (H) Genomic coordinates for was plotted. The Solid line depicting the non-linear least the Elov16 TAD are chr3:128,920,000-129,480,000. squares regression fitting of the data to the Michaelis Genomic coordinates for the Elov16 binding profiles are Menten model Suggests that we have sampled approximately chr3:129,217,096-129,348,924. (I) Genomic coordinates for 70% of the available intrachromosomal PETs beyond 4 kb in the Txnip TAD are chr3:96,320,000-96.520,000. Genomic the current library. The dashed line indicates the estimated coordinates for the Txinip binding profiles are chr3:96.347, 100% saturation. 300-96,391,100. (J) Genomic coordinates for Hs6st1 TAD 0035 FIG. 8 (Figure S2) High-Confidence SMC1 ChIA are chr1:34,520,000-36,360,000. Genomic coordinates for PET Interactions Are Consistent with Previously Identified Hs6st1 binding profiles are chr1:35,883,900-36,200,400. Interactions, Related to FIG. 1. High-confidence SMC1 0037 FIG. 10 (Figure S4) Super-Enhancer Domain ChIA-PET interactions are depicted as blue lines. Interac Functions, Related to FIG. 4. (A) Quantitative 3C analysis tions from other published data sets are depicted as black at the miR-290-295 locus. The super-enhancer domain is lines. ChIP-seq binding profiles (reads per million per base indicated as a black bar. The deleted CTCF site is high pair) for CTCF, cohesin (SMC1), and OCT4, SOX2, and lighted with a box. Arrows indicate the chromosomal posi NANOG (OSN) are shown at the indicated loci in ESCs. (A) tions between which the interaction frequency was assayed. A high-confidence SMC1 ChIA-PET interaction is sup Asterisk indicates the 3C anchor site. ChIP-seq binding ported by 3C from (Kagey et al., 2010). Genomic coordi profiles (reads per million per base pair) for CTCF, cohesin nates for the Lefty1 TAD are chr1:182,760,000-183,160, (SMC1), and the master transcription factors OCT4, SOX2, 000. Genomic coordinates for the Leftyl ChIP-seq binding and NANOG (OSN) are also shown. The Super-enhancer is profiles are chr1:182,851,700-182,871,500. (B) A high-con indicated as a red bar. The interaction frequencies between fidence SMC1 ChIA-PET interaction is supported by a the indicated chromosomal positions and the 3C anchor sites CTCF ChIA-PET PET from (Handoko et al., 2011). are displayed as a bar chart (mean+SD) on the bottom panel. Genomic coordinates for the Prodm14 TAD are chr1:13,040, qPCR reactions were run in duplicates, and values are 000-13,680,000. Genomic coordinates for the Prodm14 ChIP normalized against the mean interaction frequency in wild seq binding profiles are chr1:13,034,300-13,131,900. (C) A type cells. (p<0.05 for all three regions: Student's t test.) (B) high-confidence SMC1 ChIA-PET interaction is supported Quantitative 3C analysis at the Pou5fl locus. The super by a Pol2 ChIA-PET PET in (Zhang et al., 2013). Genomic enhancer domain is indicated as a black bar. The deleted coordinates for the Phc1 TAD are chr6:121,160,000-122, CTCF site is highlighted with a box. Arrow indicates the 600,000. Genomic coordinates for the Phc1 ChIP-seq bind chromosomal positions between which the interaction fre ing profiles are chr6:122.241,500-122,350,700. (D) A high quency was assayed. Asterisk indicates the 3C anchor site. confidence SMC1 ChIA-PET interaction is supported by 5C ChIP-seq binding profiles (reads per million per base pair) in (Phillips-Cremins et al., 2013). Genomic coordinates for for CTCF, cohesin (SMC1), and the master transcription the Sox2 TAD are chr3:33,680,000-35,520,000. Genomic factors OCT4, SOX2, and NANOG (OSN) are also shown. coordinates for the Sox2 ChIP-seq binding profiles are The super enhancer is indicated as a red bar. The interaction chr3:34,522,100-34,691,600. frequencies between the indicated chromosomal positions 0036 FIG. 9 (Figure S3) Super-Enhancer Domains, and the 3C anchor sites are displayed as a bar chart (mean Related to FIG. 3. Active cell identity genes reside in SD) on the bottom panel. qPCR reactions were run in Super-enhancer Domains (SD). Shown are example SDS duplicates, and values are normalized against the mean within Topologically Associating Domains (TADs) in ESCs. interaction frequency in wild-type cells. (p<0.05; Students High confidence SMC1 ChIA-PET interactions are depicted t test.) (C) Expression level of the indicated germ layer as blue lines. ChIP-seq binding profiles (reads per million markers in wild-type cells and a cell line where the SD per base pair) for CTCF, cohesin (SMC1), and the master boundary CTCF site was deleted at the miR-290-295 locus. transcription factors OCT4, SOX2, and NANOG (OSN) are Gene expression was measured by qRT-PCR. Gene expres shown at the example SDs in ESCs. Super-enhancer regions sion was assayed in triplicate reactions in at least two are indicated by a red bar. (A) Genomic coordinates for the biological replicate samples and is displayed as mean+SD (p Sox2 TAD are chr3:35,520,000-33,680,000. Genomic coor value <0.003, PAX6, GATA6 and Sox17 in wild-type versus dinates for the Sox2 binding profiles are chr3:34,724,900 CTCF site-deleted). p value was calculated using the Stu 34,502,100. (B) Genomic coordinates for the Prdm14 TAD dent's t test. are chr1:13,040,000-13,680,000. Genomic coordinates for 0038 FIG. 11 (Figure S5) Polycomb Domain Interac the Prdm14 binding profiles are chr1:13,034.300-13,131, tions, Related to FIG. 5. Repressed developmental lineage 900. (C) Genomic coordinates for the Sall1 TAD are chr8: genes reside in chromosome structures termed Polycomb 90,920,000-92,360,000. Genomic coordinates for the Sall1 Domains (PD). Example PDs within Topologically Associ binding profiles are chr8:91,455.200-91,581,300. (D) ating Domains (TADs) are shown with high-confidence PET Genomic coordinates for the Klf) TAD are chr19:22,920, interactions depicted by blue lines. ChIP-seq binding pro US 2017/O 130247 A1 May 11, 2017

files (reads per million per base pair) for CTCF, cohesin 5-40, 5-50, 6-8, 6-10, 6-20, 6-30, 6-40, 6-50, 7-10, 7-20, (SMC1), and H3K27me3 are shown at the example PDs in 7-30, 7-40, 7-50, 8-10, 8-20, 8-30, 8-40, 8–50, 9-10, 9-20, ESCs. (A) Genomic coordinates for the Tlx1 TAD are 9-30, 9-40, 9-50, 10-20, 10-30, 10-40, 10-50, 20-30, 20-40, chr19:45,120,000-45,840,000. Genomic coordinates for the 20-50, 30-40, 30-50 or 40-50 times the wild type level or Tlx1 binding profiles are chr19:45,178,400-45,246,700. (B) Such level as is presented by a Subject having a disease or Genomic coordinates for the LhX4 TAD are chr1:157,400, disorder associated with the aberrant expression of that gene. 000-158,640,000. Genomic coordinates for the Lhx4 bind ing profiles are chr1:157,392,000-157,657,700. (C) Discussion of Results Genomic coordinates for the Ihh TAD are chr1:74,240,000 0045. Understanding how the ESC pluripotency gene 75,600,000. Genomic coordinates for the Ihh binding pro expression program is regulated is of considerable interest files are chr1:74,978,200-75,060,400. (D) Genomic coordi because it provides the foundation for understanding gene nates for the Onecut3 TAD are chr10:79,200,001-81,040, control in all cells. There is much evidence that cohesin and 000. Genomic coordinates for the Onecut3 binding profiles CTCF have roles in connecting gene regulation and chro are chr10:79,892,959-79,985,160. (E) Genomic coordinates mosome structure in ESCs (Cavalli and Misteli, 2013; for the Rax TAD are chr18:66,080,001-66,680,000. Dixon et al., 2012; Gibcus and Dekker, 2013; Gorkin et al., Genomic coordinates for the Rax binding profiles are chr18: 2014; Merkenschlager and Odom, 2013; Phillips-Cremins 66,089,130-66,130,404. (F) Genomic coordinates for the and Corces, 2013; Phillips-Cremins et al., 2013; Sanyal et Shh TAD are chr5:28,760,001-29,680,000. Genomic coor al., 2012: Sofueva et al., 2013) but limited knowledge of dinates for the Shh binding profiles are chr5:28.766,181-28, these structures across the genome and scant functional 808,422. (G) Genomic coordinates for the Dhh TAD are evidence that specific structures actually contribute to the chr15:98.360,001-100,560,000. Genomic coordinates for control of important ESC genes. the Dhh binding profiles are chr15: 98,718,426-98,738,916. 0046 We describe here organizing principles that explain (H) Genomic coordinates for the Fendrr/Foxf1 TAD are how a key set of cohesin-associated chromosome structures chr8:123,160,001-124,360,000. Genomic coordinates for contributes to the ESC gene expression program. To gain the Fendrr/Foxf1 binding profiles are chr8:123,482,102-123, insights into the relationship between transcriptional control 627,553. (I) Genomic coordinates for the Bhlhe22 TAD are of cell identity and control of chromosome structure, we chr3:17,800,001-19,120,000. Genomic coordinates for the carried out cohesin ChIA-PET and focused the analysis on Bhlhe22 binding profiles are chr3:17,927,749-18,082,958. loci containing Super-enhancers, which drive expression of (J) Genomic coordinates for the Adamts15 TAD are chr10: key cell identity genes. 79.200,001-81,040,000. Genomic coordinates for the 0047 We found that the majority of super enhancers and Adamts15 binding profiles are chr10:79,797.646-79,818, their associated genes occur within large loops that are 6O2. connected through interacting CTCF sites co-occupied by 0039 FIG. 12 (Figure S6) SD and PD Boundary Sites Are cohesin. These Super-enhancer domains, or SDS, typically Constitutively Occupied by CTCF across Multiple Cell contain one Super-enhancer that loops to one gene within the Types, Related to FIG. 6. The proportions of SDs and PDs SD. The SDs appear to restrict super-enhancer activity to identified in ESCs for which CTCF ChIP-seq peaks at both genes within the SD because the cohesin ChIA-PET inter boundaries are observed in other mouse cell types. Occu actions occur primarily within the SD and loss of a CTCF pancy of CTCF peaks across the cell types was determined boundary tends to cause inappropriate activation of nearby from publicly available CTCF ChIP-seq data (Shen et al., genes located outside that boundary. 2012). MEF cells are murine embryonic fibroblasts and 0048. The proper association of super-enhancers and MEL cells are murine erythroleukemia cells. their target genes in Such “insulated neighborhoods” is of DETAILED DESCRIPTION considerable importance, as the mistargeting of a single Super enhancer is sufficient to cause leukemia (Groschel et 0040 Provided herein are compositions and methods for al., 2014). The cohesin ChIA-PET data and perturbation of the controlled or selected regulation of gene expression Such CTCF sites suggest that genes that encode repressed, lin as those genes found in insulated neighborhoods within the eage-specifying, developmental regulators also occur within genome. insulated neighborhoods in ESCs. Maintenance of the pluri 0041 As used herein, an “insulated neighborhood is a potent ESC state requires that genes encoding lineage region of a chromosome bounded by one or more markers. specifying developmental regulators are repressed, and these 0042 Modulation of gene expression in an insulated repressed lineage-specifying genes are occupied by nucle neighborhood can be effected by administration of a gene osomal histones that carry the polycomb mark H3K27me3 modulatory compound. (Boyer et al., 2006; Bracken et al., 2006; Lee et al., 2006: 0043. In one embodiment, administration of a gene Negre et al., 2006: Schwartz et al., 2006: Squazzo et al., modulatory compound increases the level of gene expres 2006: Tolhuis et al., 2006). sion by 5%, 10%, 15%, 20%, 25%, 30%, 33%, 35%, 40%, 0049. The majority of these genes were found to be 45%, 50%, 52%. 55%, 60%, 65%, 67%, 69%, 70%, 74%, located within a cohesion-associated CTCF-CTCF loop, 75%, 76%, 77%, 80%, 85%, 90%, 95% or more than 95%. which we call a polycomb domain (PD). The perturbation of 0044. In one embodiment, gene expression may be CTCF PD boundary sites caused derepression of the poly increased by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, comb-bound gene within the PD, suggesting that these 16, 17, 18, 19, 20 21, 22, 23, 24, 15, 26, 27, 28, 29, 30, 31, boundaries are important for maintenance of gene repression 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, within the PD. CTCF has previously been shown to be 48, 49, 50, 51, 52, 53, 54, 1-5, 1-10, 1-20, 1-30, 1-40, 1-50, associated with boundary formation, activity, and 2-5, 2-10, 2-20, 2-30, 2-40, 2-50, 3-5, 3-10, 3-20, 3-30, 3-40, transcriptional regulation (Bell et al., 1999: Denholtz et al., 3-50, 4-6, 4-10, 4-20, 4-30, 4-40, 4-50, 5–7, 5-10, 5-20, 5-30, 2013; Felsenfeld et al., 2004; Handoko et al., 2011; Kim et US 2017/O 130247 A1 May 11, 2017

al., 2007; Phillips and Corces, 2009; Schwartz et al., 2012: invention is not intended to be limited to the above Descrip Sexton et al., 2012; Soshnikova et al., 2010; Valenzuela and tion, but rather is as set forth in the appended claims. Kamakaka, 2006). 0060. In the claims, articles such as “a,” “an,” and “the 0050 Previous report shave also demonstrated that may mean one or more than one unless indicated to the cohesin and CTCF are associated with large loop substruc contrary or otherwise evident from the context. Claims or tures within TADs, whereas cohesin and Mediator are asso descriptions that include “or between one or more members ciated with smaller loop structures that sometimes form of a group are considered satisfied if one, more than one, or within the CTCF-bound loops (de Wit et al., 2013: Phillips all of the group members are present in, employed in, or Cremins et al., 2013; Sofueva et al., 2013). CTCF-bound otherwise relevant to a given product or process unless domains have been proposed to confine the activity of indicated to the contrary or otherwise evident from the enhancers to specific target genes, thus yielding proper context. The invention includes embodiments in which tissue-specific expression of genes (DeMare et al., 2013; exactly one member of the group is present in, employed in, Handoko et al., 2011; Hawkins et al., 2011). or otherwise relevant to a given product or process. The 0051 Our genome-wide study extends these observations invention includes embodiments in which more than one, or by connecting Such structures with the transcriptional con the entire group members are present in, employed in, or trol of specific Super-enhancer-driven and polycomb-re otherwise relevant to a given product or process. pressed cell identity genes and by showing that these struc 0061. It is also noted that the term “comprising is tures can contribute to the control of genes both inside and intended to be open and permits but does not require the outside of the insulated neighborhoods that contain key inclusion of additional elements or steps. When the term pluripotency genes. “comprising is used herein, the term “consisting of is thus 0052. The organization of key cell identity genes into also encompassed and disclosed. insulated neighborhoods may be a property common to all 0062. Where ranges are given, endpoints are included. mammalian cell types. Indeed, several recent studies have Furthermore, it is to be understood that unless otherwise identified CTCF-bound regions whose function is consistent indicated or otherwise evident from the context and under with ESC SDs (Guo et al., 2011; Wang et al., 2014). standing of one of ordinary skill in the art, values that are 0053 For example, in T cell acute lymphocytic leukemia, expressed as ranges can assume any specific value or Notch1 activation leads to increased expression of a Super Subrange within the stated ranges in different embodiments enhancer-driven gene found between two CTCF sites that of the invention, to the tenth of the unit of the lower limit of are structurally connected but does not affect genes located the range, unless the context clearly dictates otherwise. outside of the two CTCF sites (Wang et al., 2014). 0063. In addition, it is to be understood that any particular 0054 Future studies addressing the mechanisms that embodiment of the present invention that falls within the regulate loop formation should provide additional insights prior art may be explicitly excluded from any one or more into the relationships between transcriptional control of cell of the claims. Since such embodiments are deemed to be identity genes and control of local chromosome structure. known to one of ordinary skill in the art, they may be 0055. The following examples of the related art and excluded even if the exclusion is not set forth explicitly limitations related therewith are intended to be illustrative herein. Any particular embodiment of the compositions of and not exclusive. Other limitations of the related art will the invention (e.g., any nucleic acid or protein encoded become apparent to those of skill in the art upon a reading thereby; any method of production; any method of use; etc.) of the specification and a study of the drawings. can be excluded from any one or more claims, for any 0056 Compounds useful in the invention include those reason, whether or not related to the existence of prior art. described herein in any of their pharmaceutically acceptable 0064 All cited sources, for example, references, publi forms, including isomers such as diastereomers and cations, databases, database entries, and art cited herein, are enantiomers, salts, Solvates, and polymorphs, as well as incorporated into this application by reference, even if not racemic mixtures and pure isomers of the compounds expressly stated in the citation. In case of conflicting State described herein, where applicable. ments of a cited source and the instant application, the 0057 While a number of exemplary aspects and embodi statement in the instant application shall control. ments have been discussed herein, those of skill in the art 0065. Section and table headings are not intended to be will recognize certain modifications, permutations, additions limiting. and sub-combinations thereof. It is therefore intended that the following appended claims and claims hereafter intro REFERENCES duced are interpreted to include all Such modifications, 0.066 Baranello, L., Kouzine, F., and Levens, D. (2014). permutations, additions and Sub-combinations as are within CTCF and cohesin cooperate to organize the 3D structure their true spirit and scope. of the mammalian genome. Proc. Natl. Acad. Sci. USA 0058 All patents, patent applications, patent publica 111, 889-890. tions, scientific articles and the like, cited or identified in this 0067 Bell, A. C., West, A. G., and Felsenfeld, G. (1999). application are hereby incorporated by reference in their The protein CTCF is required for the enhancer blocking entirety in order to describe more fully the state of the art to activity of vertebrate insulators. Cell 98, 387-396. which the present application pertains. 0068 Boyer, L. A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L. A., Lee, T.I., Levine, S. S. Wernig, M., EQUIVALENTS AND SCOPE Tajonar, A., Ray, M. K., et al. (2006). Polycomb com 0059. Those skilled in the art will recognize, or be able to plexes repress developmental regulators in murine embry ascertain using no more than routine experimentation, many onic stem cells. Nature 441, 349-353. equivalents to the specific embodiments in accordance with 0069 Bracken, A. P. Dietrich, N., Pasini, D., Hansen, K. the invention described herein. The scope of the present H., and Helin, K. (2006). Genome-wide mapping of US 2017/O 130247 A1 May 11, 2017

Polycomb target genes unravels their roles in cell fate (ChIA-PET) for mapping chromatin interactions and transitions. Genes Dev. 20, 1123-1136. understanding transcription regulation. J. Vis. Exp. http:// 0070 Cavalli, G., and Misteli, T. (2013). Functional dx.doi.org/10.3791/3770. implications of genome topology. Nat. Struct. Mol. Biol. I0083) Gorkin, D.U., Leung, D., and Ren, B. (2014). The 20, 290–299. 3D genome in transcriptional regulation and pluripotency. 0071 Chepclev, I., Wei, G., Wangsa, D., Tang, Q., and Cell Stem Cell 14, 762-775. Zhao, K. (2012). Characterization of genome-wide I0084 Groschel, S., Sanders, M. A., Hoogenboezem, R., enhancer-promoter interactions reveals co-expression of de Wit, E., Bouwman, B. A., Erpelinck, C. van der interacting genes and modes of higher order chromatin Velden, V. H., Havermans, M., Avellino, R., Van Lom, K., organization. Cell Res. 22, 490-503. et al. (2014). A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in 0072 Cuddapah, S., Jothi, R., Schones, D. E., Roh, T.Y., leukemia. Cell 157, 369-381. Cui, K., and Zhao, K. (2009). Global analysis of the 0085. Guo, C., Yoon, H. S., Franklin, A., Jain, S., Ebert, insulator binding protein CTCF in chromatin barrier A., Cheng, H. L., Hansen, E. Despo, O., Bossen, C., regions reveals demarcation of active and repressive Vettermann, C., et al. (2011). CTCF-binding elements domains. Genome Res. 19, 24-32. mediate control of V(D)J recombination. Nature 477, 0073 de Wit, E., Bouwman, B. A., Zhu, Y., Klous, P., 424-430. Splinter, E., Verstegen, M.J., Krijger, P. H., Festuccia, N., I0086 Handoko, L., Xu, H., Li, G., Ngan, C. Y., Chew, E., Nora, E. P. Welling, M., et al. (2013). The pluripotent Schnapp, M., Lee, C. W., Ye, C., Ping, J. L., Mulawadi, F., genome in three dimensions is shaped around pluripo et al. (2011). CTCF-mediated functional chromatin inter tency factors. Nature 501, 227-231. actome in pluripotent cells. Nat. Genet. 43, 630-638. 0074 DeMare, L. E., Leng, J., Cotney, J., Reilly, S. K., I0087 Hawkins, R. D., Hon, G. C., Yang, C., Antosie Yin, J., Sarro, R., and Noonan, J. P. (2013). The genomic wicz-Bourget, J. E. Lee, L. K., Ngo, Q. M., Klugman, S., landscape of cohesin-associated chromatin interactions. Ching, K. A., Edsall, L. E., Ye, Z. et al. (2011). Dynamic Genome Res. 23, 1224-1234. chromatin states in human ES cells reveal potential regu 0075 Denholtz, M., Bonora, G., Chronis, C., Splinter, E., latory sequences and genes involved in pluripotency. Cell de Laat, W., Ernst, J., Pellegrini, M., and Plath, K. (2013). Res. 21, 1393-1409. Long-range chromatin contacts in embryonic stem cells 0088 HnisZ, D., Abraham, B. J., Lee, T. I., Lau, A., reveal a role for pluripotency factors and polycomb Saint-Andre, V. Sigova, A. A., Hoke, H. A., and Young, proteins in genome organization. Cell Stem Cell 13, R. A. (2013). Super-enhancers in the control of cell 6O2-616. identity and disease. Cell 155,934-947. 0076. Dixon, J. R., Selvaraj, S., Yue, F., Kim, A., Li, Y., I0089 Kagey, M. H., Newman, J. J., Bilodeau, S., Zhan, Shen, Y., Hu, M., Liu, J. S., and Ren, B. (2012). Topo Y., Orlando, D. A., van Berkum, N. L., Ebmeier, C. C., logical domains in mammalian genomes identified by Goossens, J., Rahl, P. B., Levine, S. S., et al. (2010). analysis of chromatin interactions. Nature 485, 376-380. Mediator and cohesin connect gene expression and chro 0077 Dowen, J. M., Bilodeau, S., Orlando, D.A., Hu"b- matin architecture. Nature 467, 430-435. ner, M. R., Abraham, B.J., Spector, D. L., and Young, R. 0090 Kaspi, H., Chapnik, E., Levy, M., Beck, G., Horn A. (2013). Multiple structural maintenance of chromo stein, E., and Soen, Y. (2013). Brief report: miR-290-295 Some complexes at transcriptional regulatory elements. regulate embryonic stem cell differentiation propensities Stem Cell Rev. 1, 371-378. by repressing Pax6. Stem Cells 31, 2266-2272. 0078 Essafi, A., Webb, A., Berry, R. L., Slight, J., Burn, 0091 Kieffer-Kwon, K. R. Tang, Z. Mathe, E., Qian, J., S. F. Spraggon, L., Velecela, V., Martinez-Estrada, O. M., Sung, M. H., Li, G., Resch, W., Baek, S. Pruett, N., Wiltshire, J. H., Roberts, S. G., et al. (2011). A witl Grøntved, L., et al. (2013). Interactome maps of mouse controlled chromatin Switching mechanism underpins tis gene regulatory domains reveal basic principles of tran Sue-specific wint4 activation and repression. Dev. Cell 21, scriptional regulation. Cell 155, 1507-1520. 559-574. 0092. Kim, T. H., Abdullaev, Z. K. Smith, A. D., Ching, K. A., Loukinov, D. I. Green, R. D., Zhang, M. Q., 0079. Felsenfeld, G., Burgess-Beusse, B., Farrell, C., Lobanenkov, V. V., and Ren, B. (2007). Analysis of the Gaszner, M., Ghirlando, R., Huang, S., Jin, C. Litt, M., vertebrate insulator protein CTCF-binding sites in the Magdinier, F. Mutskov, V., et al. (2004). Chromatin . Cell 128, 1231-1245. boundaries and chromatin domains. Cold Spring Harb. 0093 Lee, T.I., and Young, R. A. (2013). Transcriptional Symp. Quant. Biol. 69,245-250. regulation and its misregulation in disease. Cell 152, 0080 Filippova, D., Patro, R., Duggal, G., and Kings 1237-1251. ford, C. (2014). Identification of alternative topological 0094 Lee, T.I., Jenner, R. G., Boyer, L.A., Guenther, M. domains in chromatin. Algorithms Mol. Biol. 9, 14. G., Levine, S. S., Kumar, R. M., Chevalier, B., Johnstone, 0081 Fullwood, M.J., Liu, M. H., Pan, Y. F., Liu, J., Xu, S. E., Cole, M. F., Isono, K., et al. (2006). Control of H., Mohamed, Y.B., Orlov, Y. L., Velkov, S., Ho, A., Mei, developmental regulators by Polycomb in human embry P. H., et al. (2009). An oestrogen-receptor-alpha-bound onic stem cells. Cell 125, 301-313. human chromatin interactome. Nature 462, 58-64. (0095 Lelli, K. M., Slattery, M., and Mann, R. S. (2012). I0082 Gibcus, J. H., and Dekker, J. (2013). The hierarchy Disentangling the many layers of eukaryotic transcrip of the 3D genome. Mol. Cell 49, 773-782. Goh, Y., tional regulation. Annu. Rev. Genet. 46, 43-68. Fullwood, M. J., Poh, H. M., Peh, S. Q., Ong, C. T., 0096 Li, G., Fullwood, M. J., Xu, H., Mulawadi, F. H., Zhang, J., Ruan, X., and Ruan, Y. (2012). Chromatin Velkov, S., Vega, V., Ariyaratne, P. N., Mohamed, Y.B., interaction analysis with paired-end tag sequencing Ooi, H. S., Tennakoon, C., et al. (2010). ChIA-PET tool US 2017/O 130247 A1 May 11, 2017

for comprehensive chromatin interaction analysis with 0112 Sanyal,y A., Lajoie, B. R., Jain, G., and Dekker, J. paired-end tag sequencing. Genome Biol. 11, R22. (2012). The long-range interaction landscape of gene 0097. Li, G., Ruan, X. Auerbach, R. K., Sandhu, K. S., promoters. Nature 489, 109-113. Zheng, M., Wang, P., Poh, H. M., Goh, Y., Lim, J., Zhang, 0113 Schaaf, C. A., Misulovin, Z. Gause, M., Koenig, J., et al. (2012). Extensive promoter-centered chromatin A., Gohara, D. W., Watson, A., and Dorsett, D. (2013). interactions provide a topological basis for transcription Cohesin and polycomb proteins functionally interact to regulation. Cell 148, 84-98. control transcription at silenced and active genes. PLoS 0098 Margueron, R., and Reinberg, D. (2011). The Poly Genet. 9, e1003560. comb complex PRC2 and its mark in life. Nature 469, 0114 Schwartz, Y.B., Kahn, T. G., Nix, D.A., Li, X. Y., 343-349. Bourgon, R., Biggin, M., and Pirrotta, V. (2006). 0099 Merkenschlager, M., and Odom, D. T. (2013). Genome-wide analysis of Polycomb targets in Drosophila CTCF and cohesin: linking gene regulatory elements with melanogaster. Nat. Genet. 38, 700-705. their targets. Cell 152, 1285-1297. 0115 Schwartz, Y. B., Linder-Basso, D., Kharchenko, P. 0100 Meuleman, W., Peric-Hupkes, D., Kind, J., V., Tolstorukov, M.Y., Kim, M., Li, H. B., Gorchakov, A. Beaudry, J. B., Pagie, L., Kellis, M., Reinders, M., Wes A., Minoda, A., Shanower, G., Alekseyenko, A. A., et al. sels, L., and van Steensel, B. (2013). Constitutive nuclear (2012). Nature and function of insulator protein binding lamina-genome interactions are highly conserved and sites in the Drosophila genome. Genome Res. 22, 2188 associated with A/Trich sequence. Genome Res. 23, 270 2198. 28O. 0116 Seitan, V.C., Faure, A.J., Zhan, Y. McCord, R. P. 0101 Naumova, N., Imakaev, M., Fudenberg, G., Zhan, Lajoie, B. R., Ing-Simmons, E., Lenhard, B., Giorgetti, L., Y., Lajoie, B. R., Mirny, L. A., and Dekker, J. (2013). Heard, E., Fisher, A. G. et al. (2013). Cohesin based Organization of the mitotic chromosome. Science 342, chromatin interactions enable regulated gene expression 948-953. within preexisting architectural compartments. Genome 0102 Negre, N., Hennetin, J., Sun, L. V., Lavrov, S., Res. 23, 2066-2077. Bellis, M., White, K. P., and Cavalli, G. (2006). Chro 0117 Sexton, T., Yaffe, E., Kenigsberg, E., Bantignies, F., mosomal distribution of PCG proteins during Drosophila Leblanc, B., Hoichman, M., Parrinello, H., Tanay, A., and development. PLoS Biol. 4, e170. Cavalli, G. (2012). Three-dimensional folding and func 0103) Ng, H. H., and Surani, M.A. (2011). The transcrip tional organization principles of the Drosophila genome. tional and signalling networks of pluripotency. Nat. Cell Cell 148, 458-472. Biol. 13,490-496. Nora, E. P. Lajoie, B. R., Schulz, E. G., Giorgetti, L., Okamoto, I., Servant, N., Piolot, T., van 0118. Shen, Y., Yue, F., McCleary, D. F. Ye, Z., Edsall, Berkum, N. L., Meisig, J., Sedat, J., et al. (2012). Spatial L., Kuan, S. Wagner, U. Dixon, J., Lee, L., Lobanenkov, partitioning of the regulatory landscape of the X-inacti V. V., and Ren, B. (2012). A map of the cis-regulatory vation centre. Nature 485, 381-385. sequences in the mouse genome. Nature 488, 116-120. 0104. Ong, C. T., and Corces, V. G. (2014). CTCF: an 0119 Smallwood, A., and Ren, B. (2013). Genome orga architectural protein bridging genome topology and func nization and long-range regulation of gene expression by tion. Nat. Rev. Genet. 15, 234-246. enhancers. Curr. Opin. Cell Biol. 25, 387-394. 0105 Orkin, S. H., and Hochedlinger, K. (2011). Chro I0120 Sofueva, S., Yaffe, E., Chan, W. C., Georgopoulou, matin connections to pluripotency and cellular repro D., Vietri Rudan, M., Mira-Bontenbal, H., Pollard, S. M., gramming. Cell 145,835-850. Schroth, G. P. Tanay, A., and Hadjur. S. (2013). Cohesin 0106 Parelho, V., Hadjur, S., Spivakov, M., Leleu, M., mediated interactions organize chromosomal domain Sauer, S. Gregson, H. C., Jarmuz, A., Canzonetta, C., architecture. EMBO J. 32, 3119-3129. Webster, Z., Nesterova, T., et al. (2008). Cohesins func I0121 Soshnikova, N., Montavon, T., Leleu, M., Galjart, tionally associate with CTCF on mammalian chromo N., and Duboule, D. (2010). Functional analysis of CTCF some arms. Cell 132, 422-433. during mammalian limb development. Dev. Cell 19, 819 01.07 Phillips, J. E., and Corces, V. G. (2009). CTCF: 830. Spitz, F., and Furlong, E. E. (2012). Transcription master weaver of the genome. Cell 137, 1194-1211. factors: from enhancer binding to developmental control. 0108 Phillips-Cremins, J. E., and Corces, V. G. (2013). Nat. Rev. Genet. 13, 613-626. Chromatin insulators: linking genome organization to 0.122 Squazzo, S. L., O'Geen, H., Komashko, V. M., cellular function. Mol. Cell 50, 461–474. Krig, S. R., Jin, V. X., Jang, S. W., Margueron, R., 0109 Phillips-Cremins, J. E., Sauria, M. E., Sanyal, A., Reinberg, D., Green, R., and Farnham, P. J. (2006). Suz12 Gerasimova, T.I., Lajoie, B. R., Bell, J. S., Ong, C. T., binds to silenced regions of the genome in a cell-type Hookway, T. A., Guo, C., Sun, Y., et al. (2013). Archi specific manner. Genome Res. 16, 890-900. tectural protein Subclasses shape 3D organization of I0123 Tolhuis, B., de Wit, E., Muirers, I., Teunissen, H., genomes during lineage commitment. Cell 153, 1281 Talhout, W., van Steensel, B., and van Lohuizen, M. 1295. (2006). Genome-wide profiling of PRC1 and PRC2 Poly 0110 Roeder, R. G. (2005). Transcriptional regulation comb chromatin binding in Drosophila melanogaster. and the role of diverse coactivators in animal cells. FEBS Nat. Genet. 38, 694-699. Lett. 579,909-915. 0.124 Valenzuela, L., and Kamakaka, R.T. (2006). Chro 0111 Rubio, E. D., Reiss, D. J., Welcsh, P. L., Disteche, matin insulators. Annu. Rev. Genet. 40, 107-138. Van C. M., Filippova, G. N., Baliga, N. S., Aebersold, R., Bortle, K. Ramos, E., Takenaka, N., Yang, J., Wahi, J. E., Ranish, J. A., and Krumm, A. (2008). CTCF physically and Corces, V. G. (2012). Drosophila CTCF tandemly links cohesin to chromatin. Proc. Natl. Acad. Sci. USA aligns with other insulator proteins at the borders of 105,8309-83.14. H3K27me3 domains. Genome Res. 22, 2176-2187. US 2017/O 130247 A1 May 11, 2017

0.125 Wang, H., Zang, C., Taing, L., Arnett, K. L., Wong, Sequencing (ChIA-PET) for mapping chromatin interac Y. J., Pear, W. S., Blacklow, S. C., Liu, X. S., and Aster, tions and understanding transcription regulation. Journal J. C. (2014). NOTCH1-RBPJ complexes drive target gene of visualized experiments: JoVE. expression through dynamic interactions with Superen 0140 Guenther, M. G. Levine, S. S., Boyer, L. A., hancers. Proc. Natl. Acad. Sci. USA 111, 705-710. Jaenisch, R., and Young, R. A. (2007). A chromatin 0126 Wen, B., Wu, H., Shinkai, Y., Irizarry, R. A., and landmark and transcription initiation at most promoters in Feinberg, A. P. (2009). Large histone H3 lysine 9 dim human cells. Cell 130, 77-88. ethylated chromatin blocks distinguish differentiated from 0141 Handoko, L., Xu, H., Li, G., Ngan, C. Y., Chew, E., embryonic stem cells. Nat. Genet. 41, 246-250. Schnapp, M., Lee, C. W., Ye, C., Ping, J. L., Mulawadi, F., 0127 Wendt, K. S., Yoshida, K., Itoh, T., Bando, M., et al. (2011). CTCF-mediated functional chromatin inter Koch, B., Schirghuber, E., Tsutsumi, S., Nagae, G., Ishi actome in pluripotent cells. Nature genetics 43, 630-638. hara, K., Mishiro, T., et al. (2008). Cohesin mediates transcriptional insulation by CCCTC-binding factor. 0.142 HnisZ, D., Abraham, B. J., Lee, T. I., Lau, A., Nature 451, 796-801. Saint-Andre, V. Sigova, A. A., Hoke, H. A., and Young, 0128. Whyte, W. A., Bilodeau, S., Orlando, D. A., Hoke, R. A. (2013). Super-enhancers in the control of cell H. A., Frampton, G. M., Foster, C. T., Cowley, S. M., and identity and disease. Cell 155,934-947. Young, R. A. (2012). Enhancer decommissioning by 0.143 Kagey, M. H., Newman, J. J., Bilodeau, S., Zhan, LSD1 during embryonic stem cell differentiation. Nature Y., Orlando, D. A., van Berkum, N. L., Ebmeier, C. C., 482,221-225. Goossens, J., Rahl, P. B., Levine, S. S., et al. (2010). 0129. Whyte, W.A., Orlando, D. A. HnisZ, D., Abraham, Mediator and cohesin connect gene expression and chro B. J., Lin, C. Y., Kagey, M. H., Rahl, P. B., Lee, T.I., and matin architecture. Nature 467, 430-435. Young, R. A. (2013). Master transcription factors and 014.4 Kent, W.J., Sugnet, C. W. Furey, T. S. Roskin, K. mediator establish Super-enhancers at key cell identity M. Pringle, T. H. Zahler, A. M., and Haussler, D. (2002). genes. Cell 153, 307-319. The human genome browser at UCSC. Genome Res 12, 0130 Young, R. A. (2011). Control of the embryonic 996-1006. stem cell state. Cell 144, 940-954. 0145 Kieffer-Kwon, K. R. Tang, Z. Mathe, E., Qian, J., 0131 Zuin, J., Dixon, J. R. van der Reijden, M.I., Ye.Z., Sung, M. H., Li, G., Resch, W., Baek, S. Pruett, N., Kolovos, P., Brouwer, R. W., van de Corput, M. P. van de Grontved, L., et al. (2013). Interactome maps of mouse Werken, H. J., Knoch, T. A., van IJcken, W. F., et al. gene regulatory domains reveal basic principles of tran (2014). Cohesin and CTCF differentially affect chromatin scriptional regulation. Cell 155, 1507-1520. architecture and gene expression in human cells. Proc. 0146 Langmead, B. Trapnell, C., Pop, M., and Salzberg, Natl. Acad. Sci. USA 111, 996-1001. S. L. (2009). Ultrafast and memory-efficient alignment of (0132 Benjamini, Y., Hochberg, Y. (1995). Controlling short DNA sequences to the human genome. Genome the false discovery rate: a practical and powerful approach 10, R25. to multiple testing. J R Statist Soc B 57, 280-300. 0147 Lee, T.I., Jenner, R. G., Boyer, L.A., Guenther, M. 0.133 Boyer, L. A., Plath, K., Zeitlinger, J., Brambrink, G., Levine, S. S., Kumar, R. M., Chevalier, B., Johnstone, T., Medeiros, L. A., Lee, T.I., Levine, S. S. Wernig, M., S. E., Cole, M. F., Isono, K., et al. (2006). Control of Tajonar, A., Ray, M. K., et al. (2006). Polycomb com developmental regulators by Polycomb in human embry plexes repress developmental regulators in murine embry onic stem cells. Cell 125, 301-313. onic stem cells. Nature 441, 349-353. 0134 Chen, X. Xu, H., Yuan, P., Fang. F., Huss, M., 0148 Lehoczky, J. A., Williams, M. E., and Innis, J. W. Vega, V. B., Wong, E., Orlov, Y. L., Zhang, W., Jiang, J., (2004). Conserved expression domains for genes et al. (2008). Integration of external signaling pathways upstream and within the HoxA and HoxD clusters sug with the core transcriptional network in embryonic stem gests a long-range enhancer existed before cluster dupli cells. Cell 133, 1106-1117. cation. Evolution & development 6, 423-430. 0135 Chepelev, I., Wei, G., Wangsa, D., Tang, Q., and 0149 Lettice, L. A., Heaney, S.J., Purdie, L. A., Li, L., Zhao, K. (2012). Characterization of genome-wide de Beer, P., Oostra, B. A., Goode, D., Elgar, G., Hill, R. enhancer-promoter interactions reveals coexpression of E., and de Graaff, E. (2003). A long-range Shh enhancer interacting genes and modes of higher order chromatin regulates expression in the developing limb and fin and is organization. Cell research 22, 490-503. associated with preaxial polydactyly. Human molecular 0.136 Dixon, J. R. Selvaraj, S., Yue, F., Kim, A., Li, Y., genetics 12, 1725-1735. Shen, Y., Hu, M., Liu, J. S., and Ren, B. (2012). Topo 0150. Li, G., Fullwood, M. J., Xu, H., Mulawadi, F. H., logical domains in mammalian genomes identified by Velkov, S., Vega, V., Ariyaratne, P. N., Mohamed, Y.B., analysis of chromatin interactions. Nature 485, 376-380. Ooi, H. S., Tennakoon, C., et al. (2010). ChIA-PET tool 0.137 Fuglede, B., and Topsoe, F. (2004). Jensen-Shan for comprehensive chromatin interaction analysis with non Divergence and Hilbert space embedding. Informa paired-end tag sequencing. Genome biology 11, R22. tion theory, 31. 0151. Li, G., Ruan, X. Auerbach, R. K., Sandhu, K. S., 0138 Fullwood, M.J., Liu, M. H., Pan, Y. F., Liu, J., Xu, Zheng, M., Wang, P., Poh, H. M., Goh, Y. Lim, J., Zhang, H., Mohamed, Y.B., Orlov, Y. L., Veikov, S., Ho, A., Mei, J., et al. (2012). Extensive promoter-centered chromatin P. H., et al. (2009). An oestrogen-receptor-alpha-bound interactions provide a topological basis for transcription human chromatin interactome. Nature 462, 58-64. regulation. Cell 148, 84-98. 0139 Goh, Y., Fullwood, M. J., Poh, H. M., Peh, S. Q., 0152 Lieberman-Aiden, E. van Berkum, N. L., Wil Ong, C. T., Zhang, J., Ruan, X., and Ruan, Y. (2012). liams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Chromatin Interaction Analysis with Paired-End Tag Lajoie, B. R., Sabo, P. J., Dorschner, M. O., et al. (2009). US 2017/O 130247 A1 May 11, 2017

Comprehensive mapping of long-range interactions conditions, as described previously (Whyte et al., 2012). reveals folding principles of the human genome. Science V6.5 murine ESCs were grown on irradiated murine embry 326, 289-293. onic fibroblasts (MEFs). Cells were grown under standard 0153. Loven, J., Hoke, H. A., Lin, C. Y., Lau, A., ESC conditions as described previously (Whyte et al., Orlando, D.A., Vakoc, C. R., Bradner, J. E., Lee, T.I., and 2012). Cells were grown on 0.2% gelatinized (Sigma, Young, R. A. (2013). Selective inhibition of tumor onco G1890) tissue culture plates in ESC media; DMEM-KO genes by disruption of super-enhancers. Cell 153, 320 (Invitrogen, 10829-018) supplemented with 15% fetal 334. bovine serum (Hyclone, characterized SH3007103), 1,000 0154 Meuleman, W., Peric-Hupkes, D., Kind, J., U/ml LIF (ESGRO, ESG1106), 100 uM nonessential amino Beaudry, J. B., Pagie, L., Kellis, M., Reinders, M., Wes acids (Invitrogen, 11140-050), 2 mM L-glutamine (Invitro sels, L., and van Steensel, B. (2013). Constitutive nuclear gen, 25030-081), 100 U/ml penicillin, 100 ug/ml strepto lamina-genome interactions are highly conserved and mycin (Invitrogen, 15140-122), and 8 nl/ml of 2-mercap associated with A/T-rich sequence. Genome Res 23, 270 toethanol (Sigma, M7522). 28O. (O155 Noble, W. S. (2009). How does multiple testing Genome Editing correction work? Nature biotechnology 27, 1135-1137. 0156 Phillips-Cremins, J. E., Sauria, M. E., Sanyal, A., (0166 The CRISPR/Cas9 system was used to create ESC Gerasimova, T.I., Lajoie, B. R., Bell, J. S., Ong, C. T., lines with CTCF site deletions. Target-specific oligonucle Hookway, T. A., Guo, C., Sun, Y., et al. (2013). Archi otides were cloned into a plasmid carrying a codon-opti tectural protein Subclasses shape 3D organization of mized version of Cas9 (pX330. Addgene: 42230). The genomes during lineage commitment. Cell 153, 1281 genomic sequences complementary to guide RNAS in the 1295. genome editing experiments were: (O157. Shen, Y., Yue, F. McCleary, D. F. Ye, Z., Edsall, L., Kuan, S. Wagner, U. Dixon, J., Lee, L., Lobanenkov, V. V., et al. (2012). A map of the cis-regulatory sequences Name Sequence SEO ID NO in the mouse genome. Nature 488, 116-120. PRDM14 C1 up ATGACATAATGAGATTCACG 0158 Spitz, F. Gonzalez, F., and Duboule, D. (2003). A global control region defines a chromosomal regulatory PRDM14 C1 down ACTGAAGTGGAAGGTGAGTG landscape containing the HoxD cluster. Cell 113, 405 417. PRDM14 C2 down CGACCCACCTCCTAACCTTA 0159 Wen, B., Wu, H., Shinkai, Y., Irizarry, R. A., and MIR290 C1 up CATTGGCTGTCAACTATACC Feinberg, A. P. (2009). Large histone H3 lysine 9 dim ethylated chromatin blocks distinguish differentiated from MIR290 C1 down CCCGTCCTAAATTATCTGCG embryonic stem cells. Nature genetics 41, 246-250. POU5F1 C1 up CAGAAGCTGACAACACCAAG (0160. Whyte, W. A., Bilodeau, S., Orlando, D. A., Hoke, H. A., Frampton, G. M., Foster, C. T., Cowley, S. M., and POU5F1 C1 down ACACTCAAACTCGAGGACTC Young, R. A. (2012). Enhancer decommissioning by LSD1 during embryonic stem cell differentiation. Nature NANOG C1 up TTAAACACATCATAAGATGA 482,221-225. NANOG C1 down TGAACTACGTAGCAAGTTCC 0161 Whyte, W.A., Orlando, D. A. HnisZ, D., Abraham, B. J., Lin, C. Y., Kagey, M. H., Rahl, P. B., Lee, T.I., and TDGF1 C1 up CAGTCTGAACTGCACATAGC Young, R. A. (2013). Master transcription factors and TDGF1 C1 down AAAGCTAAACTCTCCCAAGT mediator establish Super-enhancers at key cell identity genes. Cell 153, 307-319. TCFAP2E C1 up CCACGTGGGAAATCTAACTC 0162 Xu, Z. Wei, G., Chepelev, I., Zhao, K., and Felsen feld, G. (2011). Mapping of INS promoter interactions TCFAP2E C1 down GAAGTGAAGCCTTCTCGTTA reveals its role in long-range regulation of SYT8 tran TCFAP2E C2 up GAAGAGTGTGACTGAAAAGA scription. Nature structural & molecular biology 18, 372 378. TCFAP2E C2 down TCTCACGGAGCCTCAGGAGA 0163 Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nusbaum, C., Myers, R. 0.167 Cells were transfected with two plasmids express M., Brown, M., Li, W., et al. (2008). Model-based analy ing Cas9 and sgRNA targeting regions around 200 base pairs sis of ChIP-Seq (MACS). Genome biology 9, R137. up- and downstream of the CTCF binding site, respectively. A plasmid expressing PGK-puroR was also cotransfected, EXAMPLES using X-fect reagent (Clontech) according to the manufac 0164. The following examples are illustrative in nature turer's instructions. One day after transfection, cells were and are in no way intended to be limiting. replated on DR4MEF feeder layers. One day after replating, puromycin (2 ug/ml) was added for 3 days. Subsequently, Example 1. Experimental Procedures puromycin was withdrawn for 3-4 days. Individual colonies A. Experimental Procedures were picked and genotyped by PCR. (0168 For the Prdm14 (C1-2), mir-290-295, Pou5f1 and Cell Culture Nanog SDs and Tcfap2e (C1) PD boundary CTCF site (0165 V6.5 murine ESCs were grown on irradiated deletions, at least two independent clones were expanded murine embryonic fibroblasts (MEFs) under standard ESC and analyzed. Data on FIGS. 4, 5 and S4 were obtained from US 2017/O 130247 A1 May 11, 2017

the analysis of a single representative clone for each geno - Continued type. The sequences of the deletion alleles in the used cell TCTGAAGACAGCTACAATGTACTTACATATAATAAATAAATAAATCTTTA lines are listed below. (0169 PRDM14 Locus Reference Sequence: AAAAAAAAAAAAAAAAAAAAAAAAGAAATAGTACGGGGCTGGTGAGATGG CTTAGTGGGTAAAAGCACCCGACTGCTCTTCCGAAGGTTCAAAGTTCAAA

CCGTTTCCCTAGCACAGAGCCACCCAGGACCAGAAATAACAGCTTCACAA TCCCAGCAACCACATGGTGGCTCACAATCACCCGTAATGAGATCTGACTC

AAGAGCTCGCTCTGTACACATGGGTCCCAAGGAAGAAGTGAGATTCTTTC CCTCTTCTGAAGTGTCTAAAGACAGCTACAGTGTACATTATATGTAATAA CCCCTCACTGAGGAAAGGGGGCCGTCTTTCCTGCTAAAGAGAAAGAAAGT ATAAATGTTTTTTTMAAAAAGAAAGAAATAGTACATTTCTCAATGGCCTC GAAGTTTCCTGGAATCTTCTTTTTCTCCTCGTGAATCTCATTATGTCATC GAGAATTAACCTGCAGGAAAAGGAAAATGCTGTGTTTCTTCTCCAAAAAT GAAATTCTAGGCTTAATCGATGCTTCTGCCCCAGCTTCTCAATTATCTGA CCTATAGGTGGCGCACAGACACCGGTTTCAAGTGATGGGTCCCAGCTTTG GATTTCAGATGCCCACCGCGTCCAGCTCAGAAAATCAAATTGTGGTTACT ACCTTTCTGCCCAAGTCCGGTTTGTCGGGAACTCTTCTTCCTTCTGCCTC ATTCTAGACATTTCCAGCAGAGGGCGCTTCGGTGCAGGTAGCCAGAACAC TACCCCCTGCCAGAATTACAGGGCTGCTCTTGGCTCTGAGTTGTTCGGTG CGAAGTCATCCAGTTTCTGGCCGCAAACTCAGATTACTAGATTGCCAACA TAAGTGAGAAAGCAAGCAGCACCTGCAGTCCTGAGGTGTCACCTAGCAGC GGGTTTCCAGAACGTGGGTAAAAGAGACTGAAGTGGCAATCCCCACGAAA TCCCTTCTAACAAGGCTGCGCTCCTCTTGGGAGGACATAGCCAAGAGTCA ACAAAAAAAACAAAACAAAACGGTCAAGGGTGCTTCGTACTGAAGTGGAA CTGAAGGGCAAGCTCCCTCAAAGCTCCTCTCTAAGGTTAAATAGCAGCAT GGTGAGTGAGGCTGTGTGGGCAGATCGCAACCGTCATTTAGAACAAACCT

GAAGCAGAGCGGTGTAAATGACTGTATTCCCAGCACTCAAGAGAATAGCT GACCTCGACCCACCTCCTAACCTTAAGGTTCATATTCTCTCTGCAAAACA

GGAGCTTTGGCCAGCCTACAGAGGAGACCCTGTGCTGTTCTCAGTATTCA TCAAGGGGGTCTGGAGGAACACTGAAGTCCCTCCTGCCGCATG GTTATGCTACCCTCTAATGAAGTACATTGTACTTCCTGGTAATTTCATTT (0170 PRDM14 C1-2 Deletion Allele Sequence: TTATGAAAGGCAATACTGGATTCCTGCCTTTCTTCCTTTCTGCCTGTAGT

CCGTTTTTAGGTTGATCAACAGGTTGACATTACACTTGTGACAATTCTCT TCTCCTGCTAAAGAGAAAGAAAGTGAAGTTTCCTGGAATCTTCTTTTTCT

TGCCTCACGGAACGATAACGTTTCAAAGGGGAAGACTAATTAGGATTGGT CCTCCACCTCCTAACCTTAAGGTTCATATTCTCTCTGCAAAACATCAAGG

ACCGTTAGTTTTTTGTCAACACAGCCAGAGTCATCTGGGAAGAGGGAACC GGGTCTGGAGGAACACTG TGAGCTGGGGGTTTACCTCCATCAGATCGTTTGTGAGTATGTCTGTAGGA (0171 MIR290 Locus Reference Sequence: AATGTTCTTAATCATTAATATCGGAGAGCCAGACCATCCCCGGTGGTGCC

ACTGCTGGGCCGGTAGTCCTGGGTGATACAAGGAGGCAGGTTTACTGGCT CCAGGCCTCACTAGTGTCAGGTCGTGAGTCATGCACAGCCATGATTTCTG

AGTAAGCAGCACTCCTTTGCAGGCTCTGCTCCACTCTCTCCTTCCTCCCT CATCCTTGGCCACCTTGCTCAGTTTTCTGGAGTGTTGAGATTCAATGCGA

TCTGCCTTGAGTTCCTGTCTTGACTTCCCTCGGTGATGAGCTGTACCCTG ACACCACCACAGCTGGGCAAAATCTAATAACCCAGGA TAGGATGGGAGCA

AAAACCAGATAACTTGTCCTTAATTTACTTTTGGTCATGGTAGACTTTTT TTGGCTGTCAACTATACCAGGTGTGCAAATCTTGGGT TTGAGGCCTCAT

ATTATTGTTGTTTTGTTGTTGTTGTTGTTGTTGTTGTTTTTATGTGTATG TTGTAAGGTGCCTTATACCTTTAGCCCCAGCCCACTT TTTTTCCCCCTG

GGTGTTTTGCTTACAAGTATGTCTGGGCACCATATTCATGCACAGTGATG CTGTATAAAATTCAGGTGTGAGTACAATTTTTCTTTTT AAAGATTTATAA

CCCAATGATTCCAGAAAAGGGCCGAGGATTCCCTGGGACTGGAGTTACAG GTGTTCTGTAGCTATCTTCAGATTCACCCAAAGAAGGCATTGGATCCCAT

AAAGTTAGGAGCTGCCATGTGTGTGCAGCGAATCAAACTCTGGCCTTCTG TACAGATGATTGCAAGCCACCATGTAGTTGCTGGGAA TGAACTCAGGAC

GAAGAGCAGCCAGTGCTCTTAACTGCTGATCCATCTTTCTAGCCCACTTC CTCTGACTTAACCACTCCAGCCCTTGAGTACAATTTTT GAAAAATTACCT

GTCACGTTGTTTATCACAGCAGTCGAAAGCAGACTAGGACATGATGGAAA TGTGGGTCTTTATGCTGTGACTTGGCCAGTAGATGGCAGTCTTGGTCCAT

GGAGTCAAAAGCTTGGTCAAGGGATCTTTAGAGATGGGAAGGGGAACTTT GGAAATGTCTAGGACTCTGGATATTTTTCCTTTTCTG GGTCTTTACTGA

TTAAACGTTGGTCCTGCCATGCTCTCCCAGAGGCATGGTGCCTTCTCTGT TCTTCAAACCTGCTAACCAGCCAATCCCCGTCCTAAA TATCTGCGTGGA

CTTTCCTAGTGCTTTCCTTTGCAAAGCAAGCAAATATCATCTACTTTGGT ATCTACATCAAACCCAGTGAGCTCCATCAAAGGTTGAGTGTTTAGGTCTC

GTTTTAAGAAATAGTACGGGGGGGCTGGTGAGATGGCTCAGTGGGTTAGA AAGCAGAACAATTTTGTCAACCTGCACTTACTGGGCC CCTGACCTAAGA

GCACCCGACTGCTCTTCCGAAGGTCCAGAGTTCAAATCCCAGCAACCACA CGGTCCCATGTAACAGGATGACCTTGAGTGATCTTGTCTCCACCT CACCA

TGGTGGCTCACAACCATCCGTAACGAGATCTGACTCCCTCTTCTGGTGTG TATACCAGAGGCTGGCCTTTAAAAGTTCCACCTCAAG GCTTCAAAGCA US 2017/O 130247 A1 May 11, 2017

(0172 MIR290 C1 Deletion Allele Sequence: - Continued

TGCATCTTGGTCGGTCAAATTTTTCTTATTGATGAAAAAGATGATTAAAG AAATCTAATAACCCAGGATAGGATGGGAGCATTGGCTGTCAATCTATATC GACACTGTGAATTTGAGACTATTCTGAACTACGTAGCAAGTTCCAGGACA AAACCCAGTGAGCTCCATCAAAGGTTGAGTGTTTAGGTCTCAAGTAGAAC

AATTTTGTCAACCTGCACTTACTGGGCCTCCTGACCTAAGACGGTCCCAT GCCAGTGTTACAAATCAAGACCCGATTTTGGAAGAAGATGGGGGCTGGGG

GTAACAGGAT AAGGCGCCATGTTCCCCCCCCCACCCCCCACCCCCAGCTCCCATCTCATC

(0173 POU5F1 Locus Reference Sequence: TGTTGCTACATTTACTAGACTGCCAAGAATTTTCTCTCTCAGAAATTAAA

AAAATAGTAAAGAAATAAAGAAAAAAGCCAGGCATGATGGTTCACTCTCA CTCAGCCATCTCCCTGTTGAGCTGTCAAGCAGAGAGGCCCCAACAACAAC

ACAGGAAATTGCCAGTGTTCTTGATTGCCCAAAAGAACCAGATGACCAGA TAATCCCATAATTCCATTACGTAAGAGAATCTCTCTACC AGCTGACAACACCAAGAGGCTAGGGGTCTTCCAGTTGGCCTTGTACTGTT (0176 NANOG C1 Deletion Allele Sequence: GCAACTGTCAGGGAAAGGATGTAACCAGAGGGCCTCTGGGACTCCTCTCA

CCCTTGATAGTTTGAGGGATATGAGCAAATTACACGGTTATCAGAAGGTG GTCTGTAACGCTGTGTGTGTGTTGTTAGTTAAACACATCATAAGATCCAG

GCCATAGTGACACTGAAAATTGGCCCATTGGCTTCAAAGATTTACCAAAG GACAGCCAGTGTTACAAATCAAGACCCGATTTTGGAAGAAGATGGGGGCT

TACCGTCCGTATTTTCTACCTACGGTGTGCTGGAGCCTAGAGGACACTAG G GGGGCGCGCTGAGCTCGCGGAAGCCACCCAGAGTCCTTCCAGGAGACTCC (0177 TDGF1 Locus Reference Sequence: CTTAAAGGTTGATCAAATTGTTCTTTGCCAAACTGAATTTATCATAAAAA

TTATACTTTATTTTGTATTACTTTGTGTACATGGGTGTTTTGCCTTCACA TGAGCAACAAGATGTCTGCACCATCTTTTCATCCCTCGGATTTTTCTTTA

GATGCGTCTGGTGCACCTGAGAAGCCAGAAAAGAGAACAGGAGTGAACAG TAGTCGTGGCTTGTACTGAGTAGACAGGTTCCTGTCTCTGACAAAAACGG

GTTTGTGGGGGCTGACACTCAAACTCGAGGACTCTGGGAAAGCATCGAGT AGGACAGAAACCCCACTCTTCTCCAAAGAAGCATGCTGATAAAAAGTGGA

GCTCTTAACCATTGAGCCATCTCTCCAGCCCATCTGTTTTCTTTTGCCGG CACAAACCATGAACTGCCGTGTGACAGTCTGAACTGCACATAGCCGGATG

AGGAAGGGGCTCTGAGAAAGGATCTTACCTGTAGCCCAGGCTGGCCCAGT AGGCTTTCGGGTAAAGACTAGAATTGCAAGATTACTAACACTGTTAAACT

TCACCATGTGATGAAGGCTCCTTTCAAATCCAAGCTA CTTTGTTTTCTGCAGTCCTTGGTTTAGACT CAAGACTCTGAAACCTAGAA

(0174 POU5F1 C1 Deletion Allele Sequence: ACTGAGCTCAAGGCTTCCGAGGCTTTGACATCGAAACACCTGATCTCCAG

TAGGGGGCGCTGCAGCCTAGCAGGGCGGAGCTGACTCTTCTGGCCAGTTT AATTGCCAGTGTTCTTGATTGCCCAAAAGAACCAGATGACCAGAAGCTGA TTTCTCCTGATGGTCCCAGTAAAACTCATGTCAGGCTCAGATTTTAGACT CAACAAATTATACTTTATTTTGTATTACTTTGTGTACATGGGTGTTTTGC AAGGGACTGGAAAGGGGGAATTCTGAGAATTAGAGCTAAAGAATTAGAGG CTTCACAGATGCGTCTGGTGCACCTGAGAAGCCAGAAAAGAGAACAGGAG GGTTAAAGAGTGAAGCCCAGGAAAATATATTTGAACAAATAAAAGCTAAA TGAACAGGTTTGT CTCTCCCAAGTTGGACAAACAAAAACAAAACAAAAACCCTCCATAAATCC

(0175 NANOG Locus Reference Sequence: TCAATCTTTAGCTTCAAGAAATTGAATCCAAAGGAACCCATATCCAGACC

CGGTGCTCAGCGTGGAAAAGGCCAGAGTGCTGAAAGCAGCTGACTCTTCA TCAGTTGTGTGTGGGGTAGGCCATGTGTGCCTTGGTGCACATGTGCAGGT GTACACATGGACAAACCCTGGCAGAGCTGGGCAAGAAAACAGACACACCT CAGAACAACTTGGGTGAGAGTCTGCTTCCTTTTTCTGCTCTGTATGGGTT GGGTGTGATAATGACTT GTAGGGATAGAATTCAAGTTTCGCATTTGAATTCTATGGTTTAGCACTTA

AACATCAAAACCTTTCCCAGAACGTCACACTTCAATGCCATTGAGGAAGG (0178 TDGF1 C1 Deletion Allele Sequence:

TGTCTGTAACGCTGTGTGTGTGTTGTTAGTTAAACACATCATAAGATGAA AGACAGGTTCCTGTCTCTGACAAAAACGGAGGACAGAAACCCCACTCTTC GGAAAGCTGGGAGTGTCCTTAACACAGCAGCGAGCAGCAAAAGCTACTTT TCCAAAGAAGCATGCTGATAAAAAGTGGACACAAACCATGAACTGCCGTG CTCCTCAAGCCTGGAGGAGTCTGGTCCGACAGTCCACCAACAGGGGGCGT TGACAGTCTGAACTGGACAAACAAAAACAAAACAAAAACCCTCCATAAAT TATTTCCCAGCCCTCGTGAAGCGTTGAACTGTCCTGGTGAGAAGGGTGAT CCT CAATCTTTAGCTTCAAGAAATTGAATCCAAAGGAACCCATATCCAGA GTGCAGTTCCTTGTCTCAGCAGCAGATGGAGCCATAGGGACGAGAACAAG CCCGGTGCTCAGCGTGGAAAAGG TTCCTAGGTGAAGGAAGGAGTGGGGGGAGACGAAGCGGAAGAAGCTGAAG US 2017/O 130247 A1 May 11, 2017

(0179 TCFAP2E Locus Reference Sequence (C1): - Continued GTCACAAGTCGGGCTGGTGAGATGGCTCAGTGGGTAAGAGCACCCGACTG

GCTATCTTGCATCTAGAAGTGGAAAGACAGTTCTTTAAATGCCTAGGCAG CTCTTCCAAAGGTCCGGAGTTCAAATCCCAGCAACCACATGGTGGCTCAC

TGTACATTAACCTATGACAAGCAAACACTCCCCTGTGGATTTTGCTCTTC AACCATCCATAATGAGATCTGACGCCCTCTTAAGTGTCTGAAGACAGCTA

AGTTTGCCTGTATGTATGTGTGCTTGTATTTGCACTGCATGTGTTTCTGG CAGTGTACTTACATATAATAAATAAATAAATCTAAAAAAAAAAAAGTCAA

TGCCCTTGAAGATCAGAAGAAACATCAAACCCCCTAGGACTAGAGTTACA AAGTCTATTTAGTACTTTGCTTGGAGTGGGTCAAGCAGCCAACAAATAGC

GATGGCTGTGAATCACCACGTGGGAAATCTAACTCTGGACTTGGGGGGGT TACTAAATAAATAAGTAACCAAAAAGATAATTACAGTTTTCCAAATCTGT

GTGTATTGAGAGAGTAATTTTTAGAAGAAAGAAGACAAGCAGGGTAGGAC TAGGGGACTCTTTGGAAGGGCTCTTATGTGACCTTGACCTAGCATAGCTA

AGAAAATTTTTAAAAAGCTGGAAAGAATGTAATCTCATATGATTTTATAG CACATAAGGCCCAGTTATAAGTGAGCACAAACGAGCAACTGTGCTTATTT

GATAAAATTTTAAGGTACAAATGGGACCACAGAATTAGTTCCCCACATGA CTTCTAGGAGGGACATGTGCTTCATGAGCTACTTCTCTGGAGACCAGCAG

GCAAGATGGTCTTC: TOTATTATTATTTTTTTTCCTTTTATGGTGTTTTGT AGCTGTGGAATACCAGGGTTTCAGACTGGGCCCTTCTGTTTCAGGGGCAA

CTGCATGTGTGTCTGTGCACCATGTGCATGAAGTGCCTGAGAAGGCCAGC GGGTTCTTCACATTGTAAGCATGCAGGTGATGATTTCTTATGGTTTTATT

AGAGGGCATCAGATCCCTTGAGATGAGTTACAGGTGGTTGAGAGACACCT TTATTTTTTTTTTTTAGATACAGACACCTGGATGAAGGCATGAGGAAGGC

TATGAGTCCTGGAAATTATAC CTGGGTACTCTGGAAGAGCAGCCAGGATT AGAGAGATACCCCTGGGAAAACGGAGACCACAACAGGCACAGATACACTG

CTTAACCTCTGAGCCATCTCCCTGGCCCCAATCTTTTGCATTCTTCTGTC ATAAGACATATATACACATCGGTATGCATGCTCAAATACACATGGACTCT

CGTCAGCTATTCAATCCATTTCAAAGTGGAAGTGAAGCCTTCTCGTTAAG CAGTTGACATTCTTGGCTTATTC, TCTTCCAAGGCTCACGTTTTCTCCTCT

GATGACAGTTATCCGGAAGGGAGCATGAAAATGTTCCAGGGCCTTTTCTT TTAAAACAAAACAAAACGAAACGAAACGAAACAAAACAAACAAACAAACC

GCTTTATGCACACTCAAAGCTGAAAATCTTTCCCATGTCAATGGATGAGA CCAAACTTTTTTGTCCTCCTTTGTTGACAGAGCCGGATTCCTCTTCCTAA

CCATCACTCAATACCTAAACAGAAAACTATGAAGCCTAATTGCTTGTGTG CAGGTCTGTCTGCAAATGTTTGAGATGAAATCTCGCAAAAGATACTGACG

CTTGCTTAAGTATATACTGGTTGGTGACAGCTTCCACCAAGAAAGGCATA CCCCATCTAGTGGCCGGAGCTTACCACTGCAGCTCAACACTCCCCTGCCT

CCACTGATTGATAACATTGGTAACAACATGAACTTCAGGTATTGATAGGT GTCTCAGTGGAGGCACCCAGACAGAGCCAGCCCTGCCTTGCAAGCTTCCG

T CTTAGCCCCTTGCTCACTCTGGAGTCTGGAACCCTCTCACGGAGCCTCAG 0180 TCFAP2E C1 Deletion Allele Sequence: GAGAAGGCAGGTTTCACGTCTGCCTTCTGTTCTCAAGCTTCGCTGGCCTTG GCATGCAGGAGAGCAACTCAACCGCAAGGACCGTGGACAGTAATCATTTG

TGCACTGCATGTGTTTCTGGTGCCCTTGAAGATCAGAAGAAACATCAAAC CTTTGTGGCCTTTCCCT CACTTCCCCAGACTACCCGTTACTCCCATGATT

CCCCTAGGACTAGAGTTACAGATGGCTGTGAATCACCACGTGGGATGACA CCAAAGAACACTTGACAGCTCCCAAATCTGCCTC

GTTATCCGGAAGGGAGCATGAAAATGTTCCAGGGCCTTTTCTTGCTTTAT 0182 TCFAP2E C2 Deletion Allele Sequence:

GCACACTCAAAGCTGAAAATCTTTCCCATGTCAATGGATGAGACCATCAC GGGGGGGAAGAGTGTGACTGCCTTCTGTTCTCAAGCTTCGCTGGCCTTGG TCAATACCTAAACAGAAAACTAT CATGCAGGAGAGCAACT CAACCGCAAGGACCGTGGACAGTAATCATT 0181 TCFAP2E Locus Reference Sequence (C2): 0183. The CTCF-deletion lines at the Pou.5f1 and Prdm14 (C1-2) loci are heterozygous, while the CTCF TGATCCACTTCCAGCTCCTTGGTTCATTCAGCATCTTTCACCTGGCCCAT deletion lines at the Nanog, Tagfl. Prodm14 (C1) and miR 290-295 loci are homozygous for the mutation. Gene GCCCCTCCCCTTATGCAAAATGCCTCCGGGATCATGGCCTTGGCCCACTC Expression Analysis ESC lines were split off MEF's for two CAGGAATGAAGCTGACTAGCACTCCAGGACTCCTGGTTACTTTTCCTTTC passages. RNA was isolated using Trizol reagent (Invitro gen) or RNeasy purification kit (Promega), and reverse TTGCTCTGTGAGTCGCTTCACCAGAAGTCTAAAGTTGGGGGGGAGGGGGG transcribed using oligo-dT primers and SuperScript III reverse transcriptase (Invitrogen) according to the manufac GGGGGAAGAGTGTGACTGAAAAGATGGCTCAGAGGCTACGAGAACCGGCT turers’ instructions. Quantitative real-time PCR was per GCTCTTTCAAAGGAGTCAGGTTTCATCCCCAGTACCCACATGGCAGGAAT formed on a 7000 AB Detection System using the following Taqman probes, according to the manufacturers instructions GGCTATCTGTAATTCCAGTTCTGATGGATCTGGTGAGTCTGACGCCCTCT (Applied BioSystems). TCTGGCCCCCACAGGCATTGCGCTGCACAGACAGGCTGGCAGAACACCCC

ACACATAAAACAATAAAGGAATCTTTAAAAAAAAAAAAAAGTCTAAAGAA Gapdh: Mm99999915 g1 Prdm14: Mim()1237814 m1 US 2017/O 130247 A1 May 11, 2017 14

-continued for all data sets. For the histone modification H3K27me3 whose signal tends to be broad across large genomic regions, Sco5a1: Mm00556042 m1 we used MACS (Zhang et al., 2008) with the parameter “-p Pou5fl: Mmo0658129 gh 1e-09 -no-lambda -no-model. UCSC Genome Browser H2-Q10: Mmo1275264 g1 Tcfl9: Mm00508531 m1 (Kent et al., 2002) tracks were generated using MACS Mmu-mir-292b: Mmo3307733 pri wiggle outputs with parameters “-w -S-space=50. Nirp12: Mmo1329688 m1 Myadm: Mmo1329822 m1 AUO18091: MmO1329669 m1 SMC1 ChIP-Seq Nanog: Mmo2019550 s1 Dppa3: Mm01184.198 g1 (0208. Enrichment Heatmap FIG. 1B, S1A, and S1B Tagfl: Mmo3024051 g1 shows the average ChIP-seq read density (r.p.m./bp) of GmS90: MmO1250263 m1 different factors at the indicated sets of regions. The average Lirrc2: Mm.01250173 m1 Rtp3: Mmo0462169 m1 ChIP-seq reads in 50 bp bin were calculated and drawn using Tcfap2e: Mmo1179789 m1 bamToGFF (https://github.com/BradnerLab/pipeline). In PSmb2: Mm00449477 m1 FIG. 1B, +/-5 kb from the center of the SMC1-enriched Nicdn: MnOO449525 m1 region was interrogated. In Figure S1A, the enriched regions Sox2: Mm03053810 S1 Pax6: Mim()0443.081 m1 of OSN, MED1, and MED12 were merged together if GataG: MmO0802636 m1 overlapping by at least 1 bp. For each of the merged regions, Sox17: Mm00488363 m1 +/-5 kb from the center of the merged region was interro gated. On Figure S1B, +/-5 kb from the center of the CTCF 0184 Based on RNA-seq data (Shen et al., 2012), the enriched region was interrogated. genes are expressed at the following levels prior to deletion of the CTCF site: Gene Sets and Classification of Gene Transcriptional State 0185. Pou5fl: 79.4 RPKM (rank among 24,827 Refseq in ESCs transcripts: 232, top 1%) 0209 All gene-centric analyses in ESCs were performed 0186 Prdm14: 2.21 RPKM (rank: 9,745, 39th %) using mouse (mm.9/NCBI37) RefSeq annotations down 0187. Slco5a1: 0.93 RPKM (rank: 12,277, 50th %) loaded from the UCSC genome browser (genome.ucsc.edu). 0188 miR-295: 18.9 RPKM (rank: 1,902, 8th %) For counting purposes and for assignment of enhancers to (0189 H2-Q10: 0.48 RPKM (rank: 13,782, 56th %) target genes (Table S2A-C), we collapsed multiple identical 0.190 Tcf19: 1.03 RPKM (rank: 12,011, 49th %) TSS into one gene level TSS. Genes were separated into (0191 Nlrp12: 0.06 RPKM (17.108, 69th %) classes of activity as follows: A gene was defined as active (0192 AU018091: 17.1 RPKM (rank: 2,150, 9th %) if an enriched region for either H3K4me3 or RNA Pol II was (0193 Myadm: 14.6 RPKM (mean of multiple splice located within +/-2.5 kb of the TSS and lacked an enriched isoforms) (rank: 2610, 11th %) region for H3K27me3 therein. H3K4me3 is a histone modi (0194 Dppa3: 25 RPKM (rank: 1,320, 5th %) fication associated with transcription initiation (Guenther et (0195 Tagfl: 92 RPKM (rank: 167, top 1%) al., 2007). A gene was defined as Polycomb-occupied if an (0196) Lrrc2: 1.2 RPKM (rank: 10,292, 42nd %) enriched region for H3K27me3 (representing Polycomb (0197) Rtp3: 0.01 RPKM (rank: 14,587 59th) complexes) but not RNA Pol II was located within +/-2.5 kb (0198 Sox2: 122 RPKM (rank: 100, top 1%) of the TSS. H3K27me3 is a histone modification associated (0199 Nanog: 122 RPKM (rank: 99, top 1%) with Polycomb complexes (Boyer et al., 2006; Lee et al., 0200 Pax6: 0.07 RPKM (rank: 16,941, 68th %) 2006). A gene was defined as silent if H3K4me3, 0201 Gatao: 0.25 RPKM (rank: 14,981, 60th %) H3K27me3, or RNA Pol II enriched regions was absent 0202 Sox17: 0.15 RPKM (rank: 15,754, 64th %) from +/-2.5 kb of the TSS. Remaining genes to which we 0203 Psmb2: 85 RPKM (rank: 203, top 1%) were unable to assign a state were left as unclassified. 0204 Tcfap2e: 0.19 RPKM (rank: 15.402, 62nd %) Overall, there were 15,312 unique active TSSs, 1,091 unique 0205 Ncdn: 3.19 RPKM (rank: 8,388, 24th %) Polycomb-occupied TSSs, 8,477 unique silent TSSs, and 616 unclassified TSSs in mouse ES cells. ChIP-Seq Illumina Sequencing and Library Generation 0206 Purified DNA from a H3K27me3 ChIP was used to Defining Active Enhancers in ESCs prepare a library for Illumina sequencing. The library was 0210 Co-occupancy of ESC genomic sites by the OCT4, prepared following the Illumina TruSeq DNA Sample Prepa SOX2, and NANOG transcription factors is highly predic ration v2 kit protocol as previously described (Whyte et al., tive of enhancer activity (Chen et al., 2008) and Mediator is 2012). typically associated with these sites (Kagey et al., 2010). We Bioinformatics Analysis—ChIP-seq Data Analysis first pooled the reads of ChIP-seq profiles of transcription factors OCT4, SOX2, and NANOG, which were performed 0207 All ChIP-Seq data sets were aligned using Bowtie in parallel, to create a merged “OSN' ChIP-seq experiment (version 0.12.2) (Langmead et al., 2009) to build version (Whyte et al., 2013). These reads were processed by MACS MM9 of the mouse genome with parameter -k 1 -m 1 -n 2. to create an OSN binding profile for visualization. To define Data sets used in this manuscript can be found in Table S6. active enhancers, we first identified enriched regions for the We used the MACS version 1.4.2 (model-based analysis of merged “OSN' ChIP-seq read pool, and for both Mediator ChIP-seq) (Zhang et al., 2008) peak finding algorithm to complex components MED1 and MED12 using MACS. identify regions of ChIP-seq enrichment over input DNA Then we used the union of these five sets of enriched control. A p value threshold of enrichment of 1e-09 was used ChIP-Seq regions that fell outside of promoters (e.g., a US 2017/O 130247 A1 May 11, 2017

region not overlapping with t2.5 kb region flanking the of DNA ends from a single chromatin fragment in the RefSeq transcriptional start sites) as putative enhancers. ChIA-PET procedure. In contrast, chimeric PETs with het erodimeric linkers did not show an inverse relationship with SMC1 ChIA-PET Processing genomic distance (Figure S1E. Table S1A). 0211 All ChIA-PET datasets were processed with a Creation of High-Confidence ChIA-PET Interactions method adapted from a previous computational pipeline (Li et al., 2010). The raw sequences were analyzed for linker 0214. To identify long-range chromatin interactions, we barcode composition and separated into non-chimeric PETs first removed intrachromosomal PETs of length <4 kb with homodimeric linkers (AA or BB linkers) derived from because these PETs may originate from self-ligation of DNA specific ligation products, or chimeric PETs (AB linkers) ends from a single chromatin fragment in the ChIA-PET with heterodimeric linker derived from nonspecific ligation procedure (Figure S1E). We next identified PETs where each products. We trimmed the PETs immediately before a per end overlapped with a different PET peak (overlap of at least fect match of the first 10 nt of the linker sequences (Linker 1 bp). A with CTGCTGTCCG: Linker B with CTGCTGTCAT). 0215 Operationally, these PETs were defined as putative After removing the linkers, only the 5' ends of the trimmed interactions. Applying a statistical model based upon the PETs of at least 27 bp were retained, because the restriction hypergeometric distribution identified high-confidence enzyme EcoP151 cuts 27 bp away from its recognition interactions, representing high-confidence physical linking Sequence. between the PET peaks. Specifically, we first counted the 0212. The sequences of the two ends of PETs were number of PETs originating from each PET peak. We then separately mapped to the mm9 mouse genome using the asked, given the numbers of PETs originating from any two bowtie algorithm with the option "-k 1 -m 1 -v 1 (Lang PET peaks, what was the likelihood of seeing the observed mead et al., 2009). These criteria retained only the uniquely number of PETs linking the two PET peaks, using a hyper mapped reads, with at most a single mismatch for further geometric distribution to generate a p value for each poten analysis. Aligned reads were paired with mates using read tial interaction. To correct for multiple hypothesis testing, identifiers and, to remove PCR bias artifacts, were filtered we derived a background distribution for p-values of inter for redundancy. PETs with identical genomic coordinates actions through random shuffling of the links between PET and strand information at both ends were collapsed into a ends. Using this background distribution, we controlled the single PET. The PETs were further categorized into intrac number of false positives in our interaction set by setting a hromosomal PETs, where the two ends of a PET were on the p-value cutoff threshold such that only the top 1% of same chromosome, and interchromosomal PETs, where the simulated interactions from the background dataset would two ends were on different chromosomes. The two ends of be called significant. This threshold, which we term the false all non-chimeric PETs were used to call PET peaks that positive likelihood in figure legends, was then applied to the represent local enrichment of the PET sequence coverage by actual data. This method did not make any assumption of the using MACS 1.4.2 (Zhang et al., 2008) with the parameters distribution of p-values as the Benjamini-Hochberg proce “-p 1e-09 -no-lambda -no-model -keepdup=2. dure (Benjamini, 1995); both methods for multiple hypoth esis testing yielded similar number of interactions (Noble, Chimeric Versus Non-Chimeric PET Quality Assessment 2009). For each of the two SMC1 ChIA-PET replicates, two independent PETs were required to call high-confidence 0213 Chimeric PETs with heterodimeric linkers can be interactions between pairs of interacting sites (Table S1C, used to estimate the degree of noise in the ChIA-PET S1D (not shown); merged data in Table S1E). For the dataset. 7% of paired-end ligations involved heterodimeric merged SMC1 ChIA-PET dataset, non-chimeric PETs from linkers (AB linkers Table S1A). Since the frequency of two replicates were pooled together and three independent ligations involved heterodimeric linkers (AB linkers) gave PETs were required to call high-confidence interactions an estimate of non-specific homodimeric ligations (AA or (Table S1E). BB linkers), we estimated that less than 14% of total homodimeric ligations (AA and BB linkers) were nonspe Saturation Analysis of ChIA-PET Library cific. We also counted the chimeric PETs that overlapped with PET peaks at both ends by at least 1 bp. These chimeric 0216) To determine the degree of saturation within our PETs represented “non-specific' chromatin interactions. We ChIA-PET library (Figure S1H), we modeled the number of found that more than 99.8% “non-specific’ chromatin inter sampled genomic positions as a function of sequencing actions derived from chimeric PETs overlapping with PET depth by the Michaelis-Menten model. Intrachromosomal peaks had only 1 chimeric PET, 0.1% “nonspecific’ inter PETs with a distance span above our self-ligation cutoff of actions had 2 chimeric PETs. We thus used a 3 PET cut-off 4 kb were subsampled at varying depths, and the number of for our high-confidence interactions (Figure S1F). Since unique genomic positions (defined as the start and end contact frequency is expected to inversely scale with coordinates of the paired PETs) that they occupy were genomic distance, we examined the relationship between counted. Model fitting using non-linear least-squares regres PET frequencies over genomic distance between the two sion suggested that we have sampled approximately 70% of ends of intrachromosomal PETs. The frequency of non the available intrachromosomal PET space, encompassing chimeric PETs with homodimeric linkers was plotted over 2.22/3.17 million positions (Figure S1H). genomic span in increments of 100 bp (Figure S1E). The 0217 We considered whether ChIA-PET data limitations scatter plot suggested two populations within intra-chromo might limit detection of longer range interactions. If sparse somal PETs and showed that the vast majority of these PETs ness of data were a significant problem, resulting in under were within 4 kb (Figure S1E). We thus used a 4 kb cutoff calling of long-range interactions, we would likely miss to remove those PETs that may originate from self ligation previously detected long-range interactions. Instead, we US 2017/O 130247 A1 May 11, 2017

detect previously known long-range interactions, e.g. the silent gene (absence of H3K4me3, RNA Pol II and interaction between Sonic Hedgehog (Shh) and its enhancer H3K27me3 enriched regions). in the intron of the nearby Limbr1 gene (1 Mb away), 0223) We identified 216 enhancer-promoter interactions interactions between the HoxD gene cluster and its distal that involved super-enhancers (Table S2B), as defined in regulatory sequences (>300 kb away), and interactions (Whyte et al., 2013). The high-confidence enhancer-pro between the HoxA gene cluster and its distal regulatory moter interactions were used to assign Super enhancers and sequences (>500 kb away) (Lehoczky et al., 2004; Lettice et typical enhancers to their target genes (Table S2B, S2C). al., 2003: Spitz et al., 2003). Multiple enhancer constituents that are in close proximity can be computationally Stitched together into enhancer Reproducibility Analysis of SMC1 ChIA-PET Replicates regions (true for typical and Super-enhancers) as described 0218 Saturation analysis suggested that each of the two previously (Hnisz et al., 2013: Whyte et al., 2013). SMC1 ChIA-PET replicates sampled only ~50% of the 0224. We identified high confidence interactions overlap available intrachromosomal PET space (data not shown). ping with a Super-enhancer or typical enhancer region at one We thus investigated the reproducibility of SMC1 ChIA end and a promoter (+/-2.5 kb of a TSS) at the other end PET replicates by examining how often high-confidence (Table S2B, S2C). For 151 super-enhancers with sufficient interactions from one of the two SMC1 ChIA-PET replicates interaction data, we found that 83% of enhancer assignments were supported by PET interactions from the other replicate. to the nearest active gene (including Polycomb-occupied Operationally, we counted the percentage of high-confi genes) were confirmed/supported by high-confidence inter dence interactions from one replicate whose individual end actions. reads overlapped with those from high-confidence interac 0225. For typical enhancers with 1477 sufficient interac tions identified in the other replicate by at least 1 bp (Figure tion data, we found that 87% of enhancer assignments to the S1D). nearest active gene (including Polycomb-occupied genes) 0219. To compare the replicates genome-wide interac were confirmed/supported by high-confidence interaction tion frequency (Figure S1C), inter-chromosomal PETs and data. intra-chromosomal PETs below the self-ligation cutoff (4 kb) were filtered. Each chromosome was partitioned into 10 Heatmap Representation of High-Confidence ChIA-PET kb bins and 21 symmetric two-dimensional matrices (all Interactions at Topologically Associating Domains (TADS) binsxall bins) were constructed for each replicate. These 0226 Genome-wide average representations of ChIA matrices were populated such that bin ai, represented the PET interactions at TADs were created by mapping high number of PETs in that replicate with one end in bin i and confidence ChIA-PET interactions across TADs (Dixon et the other in binj. PET counts were separately normalized by al., 2012) (FIG. 2D). All 2.200 TADs plus their upstream and the number of mapped reads in each replicate as well as the downstream flanking regions (10% of the size of the bin Size:1000. domain) were aligned and each split into 60 equally-sized 0220. This resulted in an RPKM-like metric for all bins bins. To calculate interaction density in each TAD, we first in both matrices. Figure S1C represents the relationships filtered high-confidence interactions by requiring they were between each replicate where the X axis represents bin aij completely contained within the genomic region of the TAD in replicate 1 and the Y axis represents bin aij in replicate and its flanking regions defined above. 2. This relationship was also analyzed using the Pearson r. 0227 We next counted the interaction frequency between any two bins in each TAD to produce a 60 by 60 interaction Assignment of Interactions to Regulatory Elements matrix using a method as previously described in Dixon et 0221) To identify the association of long-range chromatin al., 2012 The numbers in the interaction matrices represent interactions to different regulatory elements, we assigned the interaction frequencies at the diagonals originating from two PET peaks of interactions to different regulatory elements, bins on the X- and y-axis. Average interaction frequencies including active enhancers, promoters (+/-2.5 kb of the across 2,200 TAD interaction matrices were calculated. The Refseq TSS), and CTCF ChIP-seq binding sites. Operation upper triangular matrix of the average interaction frequen ally, an interaction was defined as associated with the cies was displayed in the units of interactions per bin in FIG. regulatory element if one of the two PET peaks of the 2D. interaction overlapped with the regulatory element by at least 1 base-pair. Definition of Super-Enhancer Domains and Polycomb Domains Assignment of Enhancers to Genes 0228 Typical enhancer and Super-enhancer regions in 0222 Our analysis identified 2.921 high-confidence murine embryonic stem cells were described previously interactions involving an enhancer (contains an OCT4/ (Hnisz et al., 2013: Whyte et al., 2013), and their genomic SOX2/NANOG or MED1 or MED12 enriched region and is coordinates were downloaded (Table S2B, S2C). The 231 not located within +/-2.5 kb of an annotated TSS) and a Super enhancers were assigned to genes with a combination promoter (+/-2.5 kb of an annotated TSS) (FIG. 1D, Table of ChIA-PET interactions and proximity to their nearest S1E). Each high-confidence interaction, as defined above, is active transcriptional start sites (TSSs). We first used high required to be connected by three PET peaks. A large confidence SMC1 PET interactions (FDR 0.01, 3 PETs) majority (81%) of these enhancer-promoter interactions between super-enhancers and TSS regions (+/-2.5 kb of a (2071/2921 interactions) involved an active gene TSS) to identify their target genes. (H3K4me3 or RNA Pol II but not H3K27me3 enriched 0229 When super-enhancers did not have PET interac regions), while 302 interactions involved a Polycomb-occu tions to any TSS regions, they were assigned the nearest pied gene (H3K27me3) and 229 interactions involved a active TSSs (including Polycomb occupied genes) by prox US 2017/O 130247 A1 May 11, 2017 imity. Super-enhancers and the TSS regions (+/-2.5 kb of a containing a total of 349 boundary CTCF-CTCF PET inter TSS) of their target genes are considered as SE-gene units. actions and 380 Polycomb-associated genes (Table S5A, All 231 Super enhancers were assigned to target genes with S5B). this method. This approach resulted in a total of 302 SE-gene Support for SD and PD Structures from Published Datasets units because a SE occasionally interacted with multiple 0235. The existence of Super-enhancer Domains and genes. Polycomb Domains was supported by evidence from pub 0230 We next identified SMC1 PET interactions lished CTCF ChIA-PET datasets (GSE28247) (Handoko et between two CTCF-enriched regions (regardless of whether al., 2011). We applied our ChIA-PET processing method to these CTCF regions were at promoters or enhancers) that the published CTCF ChIA-PET dataset to identify unique encompass these SE-gene units, which we called Super PETs. We then counted the instances where a high-confi enhancer domains—we call these regions “CTCF-CTCF dence CTCF-CTCF boundary interaction from our ChIA PET interactions. The CTCF-CTCF PET interactions defin PET dataset showed a minimum 80% reciprocal overlap ing Super-enhancer domains were required to encompass the with the span of a unique PET from the CTCF ChIA-PET dataset, i.e. 80% of a high-confidence SD boundary inter TSS regions (+/-2.5 kb of a TSS) and the Super enhancer for action region is in common with a CTCF ChIA-PET unique each SE-gene unit. When multiple nested CTCF-CTCF PET PET and vice versa. To accomplish this, we used BEDtools interactions encompassed a SE-gene unit, we used the (https://github.com/arq5x/bed tools2) intersect with param smallest CTCF-CTCF PET interactions for simplicity. eters -f 0.8 -r -u. 0231 We identified 193 Super-enhancer Domains (SDs) 0236 We found that 34% (6770/20080) of our CTCF containing a total of 191 Super-enhancers. We noted that the CTCF interactions were confirmed by a unique PET within boundaries of Super-enhancer are sensitive to the algorithm the CTCF ChIA-PET dataset, 33% (65/197) of our SD that computationally defines Super enhancers. For 4 Super boundary interactions were confirmed by a unique PET enhancers, one Super-enhancer constituent out of multiple within the CTCF ChIAPET dataset, and 33% (115/349) of constituent enhancers that define the Super enhancers fall our PD boundary interactions were confirmed by a unique outside of the CTCF-CTCF PET interactions. These 4 PET within the CTCF ChIA-PET dataset (Table S3A). Most CTCF-CTCF PET interactions encompass the target gene Super-enhancer Domains and Polycomb Domains are dis TSS regions (+/-2.5 kb of a TSS) and more than 50% of the tinct from the previously described Topologically Associat genomic space covered by the Super-enhancer. Therefore, ing Domains (TADS). we qualified these 4 CTCF-CTCF PET interactions as 0237 We compared Super-enhancer Domains and Poly Super-enhancer Domains. comb Domains to TADs by counting the instances where a 0232. Thus, we identified a total of 197 Super-enhancer Super-enhancer Domain or a Polycomb Domain showed a Domains (SDs) containing a total of 197 boundary CTCF minimum 80% reciprocal overlap with a TAD. 3% (5/197) CTCF PET interactions and 195 super-enhancers (Table of our SDs and 4% (13/349) of our PD have an 80% S4A, S4B). For the ~15% super-enhancers that did not reciprocal overlap with a TAD (Dixon et al., 2012). 8% qualify for occurrence within a SD by using the high (16/197) of our SDs and 9% (30/349) of our PD have an 80% confidence ChIA-PET data, the interaction dataset (not the reciprocal overlap with a TAD (Filippova et al., 2014) (Table high confidence data) shows that all but one of these S3A). super-enhancers are located within CTCF-CTCF loops co 0238. The existence of enhancer-promoter and enhancer bound by cohesin. enhancer interactions was Supported by evidence from pub lished RNA Pall ChIA-PET datasets (Kieffer-Kwon et al., 0233 We also performed the same computational analy 2013). We applied our ChIA-PET processing method to the ses for the 8,563 typical enhancers. We found that only 48% published Pol2 ChIA-PET dataset to identify unique PETs. (4128/8563) typical-enhancers are contained in CTCF We then counted the instances where a high-confidence CTCF topological structures similar to SDs. Developmental enhancer-promoter or enhancer-enhancer interaction from regulators in embryonic stem cells frequently exhibit our Smc1 ChIA-PET dataset showed a minimum 80% extended binding of Polycomb complex at their promoters reciprocal overlap with a unique PET from the Pol2 ChIA spanning 2-35 kb from their promoters (Boyer et al., 2006; PET dataset, e.g. 80% of an enhancer-promoter interaction Lee et al., 2006). We thus focused on those Polycomb region is in common with a Pol2 ChIA-PET unique PET and occupied TSSs that showed enrichment of H3K27me3 span vice versa. We found that 82% (2,402/2.921) of our ning greater than 2 kb in size. This distance cutoff was based enhancer-promoter interactions were confirmed by a unique on analyses performed in (Lee et al., 2006). We noted that PET within the Pol2 ChIA-PET dataset, and 73% (1,969/2, ~60% H3K27me3 regions called by MACS had neighboring 700) of our enhancer-enhancer interactions were confirmed H3K27me3 regions within 2 kb. In order to accurately by a unique PET within the Pol2 ChIA-PET dataset (Table capture the large genomic regions that show enrichment of S3A). H3K27me3 signal, we first merged the H3K27me3 regions 0239. Several types of structural domains have been that were within 2kb of each other. 546 genes, including 203 previously described, and we expect our interactions to encoding transcription factors, showed enrichment of occur largely within their boundaries. Thus, we determined H3K27me3 spanning greater than 2 kb at their promoters. how many of our interactions spanned a boundary. Topo 0234 We next identified high confidence CTCF-CTCF logically Associating Domains (TADs) (Dixon et al., 2012) PET interactions that encompassed the H3K27me3 regions were determined using Hi-C in mouse ESCs; 6% (1,354/23, of these 546 genes at promoters. When multiple nested 739) of high-confidence intrachromosomal cohesion-medi CTCF-CTCF PET interactions encompassed the H3K27me3 ated interactions cross a TAD boundary. LOCK (large orga regions, we took the smallest CTCFCTCF PET interactions nized chromatin K9 modification) domains were determined for simplicity. We identified 349 Polycomb Domains (PDs) using ChIP data (Wen et al., 2009): 4% (1,053/23,739) of US 2017/O 130247 A1 May 11, 2017

high-confidence, intrachromosomal cohesin-mediated inter an interaction were incremented by 1. Per row counts were actions cross a LOCK boundary. Lamin-associated domains normalized by dividing each bin count by the row maximum (LADS) were determined using DamID (Meuleman et al., and displayed in Heatmaps in FIGS. 3D and 5D. 2013); 5% (1,180/23,739) of high confidence intrachromo Somal cohesin-mediated interactions cross a LAD boundary Definition of Putative Chromatin Insulator Elements at the (Table S3A). Boundaries of Polycomb Domains 0244 An entropy-based measure of Jensen-Shannon Meta Representations of ChIP-Seq Occupancy at Divergence (JSD) was adopted to identify putative SMC1 Super-Enhancer Domains and Polycomb Domains and CTCF-bound chromatin insulator elements at PD 0240 Genome-wide average “meta’ representations of domain boundaries (FIG. 5E). We divided 20 kb regions ChIP-seq occupancy of different factors were created by centered on CTCF enriched regions within SDs or PDs into mapping ChIP-seq read density to different sets of regions 100 equally-sized bins. We used H3K27me3 and SUZ12 (FIG. 3C, FIG.5C). All regions within each set were aligned ChIP-seq profiles to identify putative insulator elements at and the average ChIP-Seq factor density in each bin was PD boundaries. For each 20 kb region, the average ChIP-seq calculated to create a meta genome-wide average in units of read density within each bin was calculated and the density rpm/bp. For Super-enhancers, each Super enhancer or their in each bin was divided by the sum of the row so the new corresponding flanking region (+/-3 kb) was split into 100 normalized vector sums to 1. Since we expect high ChIP-seq equally-sized bins. This split all Super-enhancer regions, signal at one side of insulator elements and low ChIP-seq regardless of their size, into 300 bins. For the target genes signal at other side of insulator elements, we defined two within SDs or PDs, we created three regions: upstream, gene vectors to represent the chromatin patterns at insulator body and downstream. 80 equally-sized bins divided the elements at the left or right borders of PDs: one vector has 2000 to 0 promoter region, 200 equally-sized bins divided 500s followed by 501s, and the other has 50 1s followed the length of the gene body, and 80 equally-sized bins by 500s. These vectors were normalized so their sum was divided the 0 to +2 kb downstream region. For SMC1 and 1. CTCF ChIP-Seq binding sites at the SD, PD, and TAD 0245 We next used JSD as described in (Fuglede and borders, flanking regions (+/-2 kb) around the center of Topsoe, 2004) to quantify the similarity between normalized CTCF ChIP-Seq binding sites were aligned and split into 40 ChIP-seq patterns and the two pre-defined patterns, which equally-sized bins. results in a similarity score between each normalized ChIP 0241 Heatmap representations of ChIP-seq read density seq vector and the ideal vectors described above. We took of different factors were created by mapping the reads within the top 15 percent of our 20 kb regions ranked by their Super-enhancers and/or their target genes across Super-en similarity score and extracted those that were at the bound hancer domains (FIG. 3E). We first filtered reads for those aries of Polycomb Domains (PD). For robustness, only PD contacting (>-1 bp) super-enhancers and/or their target border regions whose average ChIP-seq signal (H3K27me3) genes. Then we created three types of regions: SD and their within the 20 kb window was above the 60 percentile of all corresponding flanking regions (+/-10 kb). We divided the CTCF enriched regions at the side within the domain and upstream and downstream flanking regions into 10 equally below 50 percentile of all CTCF enriched regions at the side sized bins each. We divided the SD into 50 equally-sized outside of the domain were considered as putative chromatin bins. The average filtered ChIP-seq read density (r.p.m./bp) insulator elements. of different factors in each bin was calculated and drawn. 0246 FIG. 5E show normalized ChIP-seq density at these putative chromatin insulator elements by standard Heatmap Representation of High-Confidence ChIA-PET Z-transform across all CTCF enriched regions. Interactions Superenhancer Domains and Polycomb Domains Conservation of CTCF Binding Across Cell Types 0242 Heatmap representations of ChIA-PET interactions 0247 CTCF peaks in 18 tissues/cell types from were created by mapping high-confidence ChIA-PET inter ENCODE were downloaded from the UCSC table browser actions across Super-enhancer Domains (SD) and Polycomb (http://genome.ucsc.edu/cgibin/ Domains (PD), which are defined above. We created three hgFileUidb-mm9&g wgEncodeLicrTfbs). types of regions: upstream, SD or PD, and downstream. 0248 We restricted our analysis to autosomal CTCF Upstream and downstream regions are 20% of the SD's or sites, because these 18 cell types could be derived from mice PD's length each. We divided the upstream and downstream of different sex or strains. We first took the intersection of regions into 10 equally-sized bins each. We divided the SD our autosomal CTCF peaks in murine V6.5 ESC 129 or PD into 50 equally-sized bins. To calculate interactions in 057BL/6 line and autosomal CTCF peaks in the murine ESC each bin, we filtered high-confidence interactions in two Bruce4 line from ENCODE to account for differences in ways. 1) We required high-confidence interactions to have at cells and experimental technique. We next quantified how least one end in the interrogated region. This removed frequently these autosomal CTCF peaks from ESCs were interactions that are anchored outside of our region of occupied by CTCF ChIP-Seq peaks in 18 tissues/cell types interest. 2) We removed interactions that are not related to (including ESC Bruce4 cells) from ENCODE. The histo the internal structure of the domain. This removed interac gram of CTCF occupancy across 18 tissues/cell types were tions that have one end at an SD or PD border PET peak and plotted in FIG. 6C. the other end outside of the SD or PD. Super-Enhancers in NPCs 0243 We considered the whole span of each filtered high-confidence ChIA-PET interaction. The density of such 0249 Super-enhancers were identified in mouse neural spans in each bin was calculated, where all bins contacting progenitor cells (NPCs) using ROSE (https://bitbucket.org/ US 2017/O 130247 A1 May 11, 2017

young computation/rose). This code is an implementation - Continued of the method used in (Hnisz et al., 2013; Loven et al., 2013). Target, SEQ 0250 Briefly, regions enriched in H3K27ac signal were Region, ID identified using MACS with background control, -keep primer name Sequence (5'-3') NO dup auto, and -ple-9. These regions were Stitched together if they were within 12.5 kb of each other and enriched SE, region 3, CAGATTTTTTATTTCCTTCAGTTCTGTG regions entirely contained within +/-2 kb from a TSS were excluded from stitching. Stitched regions were ranked by H2-Q10, AGGATGGCTCAGCGGTTAAG H3K27ac signal therein. promoter, 0251 ROSE identified a point at which the two classes of H2O10 F enhancers were separable. Those Stitched enhancers falling SE region, AGGGCTCACCTTCACGTCAAGTT above this threshold were considered super-enhancers. H2-Q10, Taqman CGGCCTGTCTACTTTAGCCTCAGACTCCA 5C CTCF-CTCF Interactions in NPCs probe, H2O10 probe 0252) Phillips-Cremins et al. performed 5C at 7 genomic loci (Phillips-Cremins et al., 2013). We filtered for statisti , Actin-F GGGAGTGACTCTCTGTCCATTCA cally significant 5C interactions in mouse NPC by requiring a p value for both replicates <0.05, resulting in 674 inter Actin, Actin-R ATTTGTGTGGCCTCTTGTTTG. A actions. We filtered for CTCF-CTCF interactions by requir Actin, Taqman TCCAGGCCCCGCGTGTCC ing an overlap with a CTCF ChIPSeq enriched region in probe, Actin NPC on both ends resulting in 32 CTCF-positive 5C inter probe actions. 34% (11/32) 0253 CTCF5C interactions in NPCs have an 80% recip 0257 F, and R denote forward and reverse primers, rocal overlap with a SMC1 ChIA-PET interactions in mouse respectively. ESCs (Table S3B). ChIA-PET 3C Assays 0258. In brief, murine ESCs (up to 13x10 cells) were treated with 1% formaldehyde at room temperature for 10 0254 For each sample, 2x107 ESCs cells were cross min and then neutralized using 0.2 M glycine. The cross linked with 1% formaldehyde for 20 min at RT. The reaction linked chromatin was fragmented by Sonication to size was quenched by the addition of 125 mM glycine for 5 min lengths of 300-700 bp. The anti-SMC1 antibody (Bethyl, at RT. Crosslinked ESCs were washed with PBS and resus A300-055A) was used to enrich SMC1-bound chromatin pended in 10 ml lysis buffer (10 mM Tris-HCl, pH 8.0, 10 fragments. A portion of ChIP DNA was eluted from anti mM NaCl, 0.2% NP40 and proteinase inhibitors) and lysed body-coated beads for concentration quantification and for with a Dounce homogenizer. Following BglII digestion enrichment analysis using quantitative PCR. For ChIA-PET overnight, 3C-ligated DNA was prepared as previously library construction, ChIP DNA fragments were end described (Lieberman-Aiden et al., 2009). repaired using T4 DNA polymerase (NEB) and ligated to 0255. The 3C interactions at the miR-290-295 and either linker A or linker B. After linker ligation, the two Pou5fl loci (Figure S4A, S4B) were analyzed by quantita samples were combined for proximity ligation in diluted tive real-time PCR using custom Taqman probes as previ conditions. Following proximity ligation, the paired-end tag ously described (Xu et al., 2011). The amount of DNA in the (PET) constructs were extracted from the ligation products qPCR reactions was normalized across 3C libraries using a and the PET templates were subjected to 50 350 paired-end custom Taqman probe directed against the Actb locus. sequencing using Illumina HiSeq 2000. Primer sequences are listed below. 0256 Target Region Primer Name Sequence (5'-3') ChIA-PET Library Construction (0259 ChIA-PET was performed as previously described Target, SEQ (Chepelev et al., 2012: Fullwood et al., 2009; Goh et al., Region, ID 2012: Li et al., 2012). Briefly, ES cells (up to 1x10 cells) primer name Sequence (5'-3') NO were treated with 1% formaldehyde at room temperature for N1rp12, CACATCTTCAAAGCAAACACTATTGTT 20 min and then neutralized using 0.2M glycine. The cross promoter, linked chromatin was fragmented by Sonication to size N1irp12 R lengths of 300-700 bp. The anti-SMC1 antibody (Bethyl, A300-055A) was used to enrich SMC1-bound chromatin N1rp12, Taqman TCTCCTACCCATTGCTTCTCTGCTACC TOC fragments. A portion of ChIP DNA was eluted from anti probe, body-coated beads for concentration quantification and for N1irp12 Probe enrichment analysis using quantitative PCR. SE, region 1, TTCCTGGAACCTGGGCAA 0260 For ChIA-PET library construction ChIP DNA N1irp12 eF1 fragments were end-repaired using T4 DNA polymerase SE, region 2, TGATACAGCACAGCTTTCCTTCA (NEB). ChIPDNA fragments were divided into two aliquots N1irp12 eF2 and either linker A or linker B was ligated to the fragment ends. The two linkers differ by two nucleotides which are US 2017/O 130247 A1 May 11, 2017 20 used as a nucleotide barcode (Linker A with CG: Linker B Results of Experimentals with AT) (Table S1A). After linker ligation, the two samples were combined and prepared for proximity ligation by Cohesin ChIA-PET in ESCs diluting in a 20 ml Volume to minimize ligations between 0267. The organization of mammalian chromosomes different DNA-protein complexes. The proximity ligation involves structural units with various sizes and properties, reaction was performed with T4 DNA ligase (Fermentas) and cohesin, a structural maintenance of chromosomes and incubated without rocking at 22 degrees Celsius for 20 (SMC) complex, participates in DNA interactions that hours. include enhancer-promoter loops and larger loop structures 0261 During the proximity ligation DNA fragments with that occur within topologically associating domains (TADs) the same linker sequence were ligated within the same (FIG. 1A). ESC ChIP-seq data indicate that ~40% of chromatin complex, which generated the ligation products cohesin-occupied sites involve active enhancers and pro with homodimeric linker composition. However, chimeric moters, ~3% involve genes with polycomb modifications, ligations between DNA fragments from different chromatin and ~50% involve CTCF sites that are not associated with complexes could also occur, thus producing ligation prod enhancers, promoters, or polycomb-occupied sites (FIG. 1B ucts with heterodimeric linker composition. These heterodi and Figures S1A and S1B available online). We employed meric linker products were used to assess the frequency of cohesin ChIA-PET to further investigate the relationship nonspecific ligations and were then removed bioinformati between control of the ESC pluripotency program and cally. control of local chromosome structure. We selected cohesin 0262. As shown in Figure S1E, all heterodimeric linker because it is a relatively well-studied SMC complex that is ligations, giving rise to chimeric PETs, are by definition loaded at enhancer-promoter loops and can thus identify nonspecific. Because random intermolecular associations in those interactions and can also migrate to CTCF sites and the test tube are expected to be comparable for linkers A and thus identify those interactions as well Kagey et al., 2010; B, the frequency of random homo and heterodimeric linker Parelho et al., 2008; Rubio et al., 2008; Schaaf et al., 2013; ligations should also be equivalent. In our SMC1 ChIA-PET Wendt et al., 2008). library, only 7% of pair-end ligations involved heterodimeric 0268. The ChIA-PET technique was used because it linkers (Table S1A). Thus, we estimate that less than 14% of yields high-resolution (~4 kb) genome-wide interaction total homodimeric ligations are nonspecific. data, which is important because most loops involved in transcriptional regulation are between 1 and 100 kb (Gibcus 0263. Following proximity ligation, samples were treated and Dekker, 2013). We hoped to extend previous findings with Proteinase K and DNA was purified. An EcoP15I that mapped interactions among regulatory elements across (NEB) digestion was performed at 37 degrees Celsius for 17 portions of the ESC genome (Denholtz et al., 2013: Phillips hours to linearize the ligated chromatin fragments. The Cremins et al., 2013; Seitan et al., 2013) and gain a detailed chromatin fragments were then immobilized on Dynabeads understanding of the relationship between transcriptional M280 Streptavidin beads. An End-Repair reaction was per control of ESC identity genes and control of local chromo formed (Epicentre #ER81050), then As were added to the some structure. To identify interactions between cohesin ends with Klenow treatment by rotating at 37 degrees occupied sites, we generated biological replicates of SMC1 Celsius for 35 minutes. Next, Illumina paired-end sequenc ChIA-PET data sets in ESCs totaling ~400 million reads ing adapters were ligated on the ends and 18 cycles of PCR (Table S1A). The two biological replicates showed a high was performed. The Paired-End-Tag (PET) constructs were degree of correlation (Pearson’s re-0.91, Figures S1C and extracted from the ligation products and the PET templates S1D), so we pooled the replicate data and processed it using were subjected to 50x50 paired-end sequencing using Illu an established protocol (Li et al., 2010), with modifications mina HiSeq 2000. SMC1 ChIA-PET was performed as described in the Extended Experimental Procedures (Figure previously described (Chepelev et al., 2012: Fullwood et al., S1 and Table S1A). The data set contained ~19 million 2009; Goh et al., 2012: Li et al., 2012). unique paired-end tags (PETs) that were used to identify PET peaks (FIG. 1C). Interactions between PET peaks were Data Analysis identified and filtered for length and significance (FIG. 1C, S1E, and S1F, Table S1B, and Extended Experimental 0264 ChIA-PET data analysis was performed as previ Procedures). The analysis method produced 1,234,006 ously described (Li et al., 2010), with modifications cohesin-associated DNA interactions (FIG. 1C and Table described in the Extended Experimental Procedures. The S1B). The vast majority (92%) of these interacting cohesin high-confidence interactions for the two biological replicate occupied sites occurred at enhancers, promoters, and CTCF SMC1 ChIAPET experiments and for the merged data set binding sites, consistent with the known roles of cohesin at are listed in Tables S1C, S1D (not shown) but merged into these regulatory elements (FIG. 1D). Table S1E, respectively. All data sets used in this study are 0269 Genomic data of any type are noisy, and our listed in Table S6. confidence in the interpretation of DNA interaction data is improved by identifying PETs that represent independent events in the sample and pass statistical significance tests. Example 2: Accession Numbers For this reason, we generated a high-confidence interaction 0265 Raw and processed sequencing data were depos data set (described in Extended Experimental Procedures) ited in GEO under accession number GSE57913 (http:// by requiring that at least three independent PETs support the www.ncbi.nlm.nih.gov/geo/). identified interaction between two PET peaks. The high confidence data set consisted of 23,835 interactions that 0266 The GEO accession ID for aligned and raw data is were almost entirely intrachromosomal (99%) and included GSE57913 (www.ncbi.nlm.nih.gov/geo/). 2.921 enhancer-promoter interactions, 2,700 enhancer-en US 2017/O 130247 A1 May 11, 2017

hancer interactions, and 7.841 interactions between non contain one or two genes (Tables S4A and S4C). It was enhancer, non-promoter CTCF sites (FIGS. 1C, 1D, S1G, evident that there were cohesin-associated interactions and S2 and Table S1B). Unless stated otherwise, the high between individual enhancer elements (constituents) of confidence data set was used for further quantitative analy Super-enhancers as well as interactions between Super-en sis. We used the interaction data sets to create a table of hancers and the promoters of their associated genes (Figures enhancer-promoter assignments for ESCs (Tables S2A S3A-S3J). S2C). 0274 Indeed, the results suggest that super-enhancer (0270. We found that the interaction data supported 83% constituents have cohesin-associated interactions with one of Superenhancer assignments to the proximal active gene another (345 interactions) even more frequently than they do and 87% of typical enhancer assignments to the proximal with their associated genes (216 interactions). The SDs active gene (Tables S2B and S2C), with approximately half contain high densities of pluripotency transcription factors, of the remainder assigned to the second most proximal gene. Mediator, and cohesin, together with histone modifications The interaction data most frequently assigned Super-enhanc associated with transcriptionally active enhancers and genes ers and typical enhancers to a single gene, with 76% of (FIG. 3C). It was notable that the majority (82%) of inter Super-enhancers and 84% of typical enhancers showing actions within SDs do not cross the CTCF sites at SD evidence of interaction with a single gene. Prior studies have borders (FIG. 3D) and that the majority of Mediator, Pol2, Suggested that there can be more frequent interactions and H3K27ac signal associated with Super-enhancers and between enhancers and genes (Kieffer-Kwon et al., 2013; their associated genes occurs inside of the CTCF sites at SD Sanyal et al., 2012; Shen et al., 2012); our high-confidence borders (FIG. 3E). data are not saturating and do not address the upper limits of 0275. The cohesin ChIA-PET interaction data and the these interactions (Figure S1H and Extended Experimental distribution of the transcription apparatus Suggest that the Procedures). interacting cohesin-occupied CTCF sites tend to restrict the 0271 The catalog of enhancer-promoter assignments interactions of Super-enhancers to those genes within the provided by these interaction data should prove useful for SD. future studies of the roles of ESC enhancers and their associated factors in control of specific target genes. The majority of cohesin ChIA-PET interactions did not cross the Super-Enhancer Domain Function boundaries of previously defined TADs (Dixon et al., 2012: 0276 Because super-enhancers contain an exceptional Filippova et al., 2014; Meuleman et al., 2013: Wen et al., amount of transcription apparatus and CTCF has been 2009) (FIG. 2 and Table S3A). FIG. 2A shows a represen associated with insulator activity (Essafi et al., 2011; Han tative example of a TAD, in which the majority (96%) of doko et al., 2011: Ong and Corces, 2014; Phillips and interactions occur within the domain. As expected from Corces, 2009; Phillips-Cremins and Corces, 2013), we pos previous studies, the TAD boundaries are enriched for tulated that SD structures might be necessary for proper cohesin and CTCF and thus cohesin ChIA-PET peaks (FIG. regulation of genes in the vicinity of these structures. To test 2B). Genome-wide analysis shows that 88% of all interac this model, we investigated the effect of deleting SD bound tions are contained within TADs (FIG. 2C) and are some ary CTCF sites on expression of genes inside and immedi what enriched near the boundaries of TADs (FIG. 2D). The ately outside of SDs (FIG. 4). majority of cohesin ChIA-PET interactions did not cross 0277 For this purpose, we studied five SDs whose super lamin-associated domains (LADs), which are associated enhancer-associated genes play key roles in embryonic stem with repression at the nuclear periphery, or LOCK domains, cell biology (miR-290-295, Nanog, Todgf1, Pou5f1 Oct4. which are large regions of chromatin marked with histone and Prdm14). In all cases, we found that deletion of a CTCF H3K9 modifications (Table S3A) (Meuleman et al., 2013: site led to altered expression of nearby genes. In four out of Wen et al., 2009). These results are consistent with proper five cases, deletion of a CTCF site led to increased expres ties previously described for TAD, LAD, and LOCK domain sion of genes immediately outside the SDs, and in three of Structures. five cases, deletion of a CTCF site caused changes in expression of genes within the SDs. The miR-290-295 locus, Super-Enhancer Domain Structure which specifies miRNAs with roles in ESC biology, is 0272 Super-enhancers drive expression of key cell iden located within an SD (FIG. 4A). The miR-290-295 SD tity genes and are densely occupied by the transcription contains no other annotated gene, and the closest gene that apparatus and its cofactors, including cohesin (Dowen et al., resides outside this SD is Nlrp12, located -20 kb down 2013; Hnisz et al., 2013). Analysis of high-confidence Stream of miR-290-295. CRISPR-mediated deletion of a cohesin ChIA-PET interaction data revealed a striking fea boundary CTCF site (C1) at the miR-290-295 locus caused ture common to loci containing Super-enhancers and their an ~50% reduction in the miR-290-295 pri-miRNA tran associated genes (FIG. 3). This feature consisted of a super script and an 8-fold increase in transcript levels for Nlrp12 enhancer and its associated gene located within a loop (FIG. 4A). The CTCF deletion had no effect on expression connected by two interacting CTCF sites co-occupied by of two genes located further away, AUO18091 and Myadm cohesin (FIGS. 3A, 3B, and S3A-S3J). The vast majority of (FIG. 4A). ESC super-enhancers (84%) are contained within these 0278. These results indicate that normal expression of the structures, which we call Super-enhancer domains (SDs) miR-290-295 primiRNA transcript is dependent on the (FIG. 3B, Tables S4A and S4B, and Extended Experimental CTCF boundary site and furthermore that genes located Procedures). immediately outside of this SD can be activated when the 0273. In contrast, only 48% of typical enhancers were SD CTCF boundary site is disrupted. The Nanog gene, found to occur within comparable loops between two CTCF which encodes a key pluripotency , is sites. The 197 SDs average 106 kb and most frequently located within an SD shown in FIG. 4B. The Nanog SD US 2017/O 130247 A1 May 11, 2017 22 contains no other annotated gene, and the closest upstream studies have shown that miR-290-295 null ESCs show an gene that resides outside this SD is Dppa3, which is located increased propensity to differentiate into ectodermal lin ~50 kb upstream of Nanog. eages at the expense of endoderm (Kaspi et al., 2013). In (0279 CRISPR-mediated deletion of the boundary CTCF summary, the loss of CTCF sites at the boundaries of SDs site C1 of the Nanog SD led to a ~40% drop in Nanog can cause a change in the level of transcripts for Superen transcript levels (FIG. 4B). In this case, there was no hancer-associated genes within the SD and frequently leads significant change in the level of the Dppa3 transcript (FIG. to activation of genes near these CTCF sites. These results 4B). These results indicate that normal expression of the indicate that the integrity of SDs is important for normal Nanog transcript is dependent on the C1 CTCF site. The expression of genes located in the vicinity of the SD, which Tdgfl gene, which encodes an epidermal growth factor can include genes that are key to control of cell identity. essential for embryonic development, is located within an SD (FIG. 4C). In this SD, it is possible that the Super Polycomb Domains enhancer regulates both the Tdgfl and Lrrc2 genes and this 0283 Maintenance of the pluripotent ESC state requires Tdgf1/Lrrc2 SD also contains the Rtp3 gene. The closest that genes encoding lineage-specifying developmental regu gene that resides outside this SD is Gm590, which is located lators are repressed, and these repressed lineage-specifying 30 kb downstream of Tdgfl. CRISPR-mediated deletion of genes are occupied by nucleosomal histones that carry the a boundary CTCF site (C1) of the Tdgfl/Lrrc2 SD had little polycomb-associated mark H3K27me3 (Margueron and effect on Tagfl and Rtp3 transcript levels but had a modest Reinberg, 2011; Young, 2011). The mechanisms responsible effect on Lrrc2 transcript levels and caused a nearly 10-fold for maintaining the H3K27me3 mark across short spans of increase in the levels of GmS90 transcripts (FIG. 4C). The regulatory regions and promoters of repressed genes are not Pou5fl gene, which encodes the pluripotency transcription well understood, although CTCF sites have been implicated factor OCT4, is located within an SD (FIG. 4D). The Pou5f1 (Cuddapah et al., 2009; Schwartz et al., 2012; Van Bortle et SD contains no other annotated gene. al., 2012). 0280 We were not able to obtain a bi-allelic CRISPR 0284 Analysis of the H3K27me3-marked genes revealed mediated deletion of a boundary CTCF site despite multiple that they, like the Super enhancer-associated genes, are attempts, but we did obtain a mono-allelic deletion of the typically located within a loop between two interacting boundary CTCF site C1 (FIG. 4D). This mono-allelic dele CTCF sites co-occupied by cohesin (FIGS. 5A, 5B, and tion had little effect on the levels of Pou5f1 transcripts but S5A-S5J and Table S5A). These polycomb domain (PD) increased the levels of transcripts for H2-Q10, the gene structures share many features with the Super enhancer closest to the deleted boundary, by ~2.5-fold (FIG. 4D). domains. The majority (70%) (380/546) of polycomb-asso Transcription of the gene closest to the uninterrupted bound ciated genes occur in PD structures. PDs average 112 kb and ary of the Pou5f1 SD, Tcf19, was unaffected by the C1 generally contain one or two genes (Table S5B). The PDs deletion. The Pram 14 gene, which encodes a pluripotency contain exceptionally high densities of the polycomb pro transcription factor, is located within an SD (FIG. 4E). The teins EZH2 and SUZ12 and the associated histone modifi Prdm14 SD contains no other annotated gene, and the closest downstream gene that resides outside this SD is cation H3K27me3 (FIG.5C). Slco5a1, which is located 100 kb downstream of Prodm14. (0285. The majority (78%) of cohesin ChIAPET interac The Prdm14 SD has two neighboring cohesin-associated tions originating in PDs occur within the PD boundaries CTCF sites at one boundary: CRISPR-mediated deletion of (FIG. 5D). Furthermore, the polycomb mark H3K27me3 a single boundary CTCF site (C1) had no effect on expres tends to be retained within the PD (FIG. 5E). (0286 We postulated that the CTCF boundaries that form sion of Pram14 or Slco5a1, but deletion of both CTCF sites PD structures might be important for repression of the (C1 and C2) at that boundary caused an ~4.5-fold increase polycomb-marked genes within the PD and investigated the in expression of Slco5a1 (FIG. 4E). effect of deleting boundary CTCF sites on a PD containing 0281 We tested whether the super-enhancers from dis Tcfap2e to test this idea (FIG.5F). CRISPR-mediated dele rupted SD structures show increased interaction frequencies tion of one of the boundary CTCF sites (C1) of the Tcfap2e with the newly activated genes outside the SD by using 3C. PD caused a 1.7-fold increase in transcript levels for Tcfap2e At two loci where loss of an SD boundary CTCF site led to (p<0.05) and no significant change in transcript levels for significant activation of the gene outside the SD (miR-290 295 and Pou5fl), we performed quantitative 3C experiments nearby genes within or outside of the PD. to measure the contact frequency between the Super-enhanc (0287. CRISPR-mediated deletion of the other boundary ers and the genes immediately outside of SDs in wild-type CTCF site (C2) caused a 4-fold increase in the expression of cells and in cells where the SD boundary CTCF site was Tcfap2e (p<0.001) and had little effect on adjacent genes. deleted. In both cases, loss of the CTCF site led to an These results suggest that the integrity of the CTCF bound increase in the contact frequency between the Super-enhanc aries of PDs is important for full repression of H3K27me3 ers and the genes immediately outside of SDs that were occupied genes. newly activated (Figures S4A and S4B). Insulated Neighborhoods in Multiple Cell Types 0282 We investigated whether altered SD boundaries that affect cell identity genes cause ESCs to express markers 0288 A previous study suggested that DNA loops medi consistent with an altered cell state. Indeed, we found that ated by cohesin and CTCF tend to be larger and more shared ESCs lacking the miR-290-295 boundary CTCF site C1 among multiple cell types than DNA loops associated with exhibit increased expression of the ectodermal marker Pax6 cohesin and Mediator, which represent enhancer-promoter and decreased expression of the endodermal lineage markers interactions that may be cell type specific (Phillips-Cremins Gatao and Sox 17, suggesting that loss of the SD structure is et al., 2013). This led us to postulate that: (1) the interacting sufficient to affect cell identity (Figure S4C). Previous CTCF structures of SDs and PDS may be common to US 2017/O 130247 A1 May 11, 2017 multiple cell types and (2) the acquisition of Super-enhanc 0293. This supports the view that the interacting CTCF ers and polycomb binding within these common domain structures of ESC SDS may be common to multiple cell structures will vary based on the gene expression program of types. If the CTCF boundaries of ESC SDs and PDs are the cell type (FIG. 6A). common to many cell types, we would expect that the 0289. To test this model, we compared the SDs identified binding of CTCF to the SD and PD boundary sites observed in ESCs to comparable regions in neural precursor cells in ESCs will be conserved across multiple cell types. (NPCs) for which 5C interaction data was available for 0294. To test this notion, we examined CTCF ChIP-seq specific loci (Phillips-Cremins et al., 2013). peaks from 18 mouse cell types and determined how fre 0290 We found, for example, that the Nanog locus SD quently CTCF binding occurred across these cell types (FIG. observed in ESCs with ChIA-PET data was also detected by 6C). When all ESC CTCF ChIP-seq peaks were included in 5C data in NPCs (FIG. 6B). In NPCs, the Nanog gene is not the analysis, we found that there was fairly even distribution expressed, and no Super-enhancers are formed at this locus of the data into bins representing one or more cell types (FIG. 6B). Similarly, there is evidence for a common struc (FIG. 6C). In contrast, CTCF peaks co-bound by cohesin, ture involving CTCF sites bounding the Olig1/Olig2 locus in which included those at SD and PD borders, were observed both ESCs and NPCs (FIG. 6B). more frequently in bins representing a larger fraction of the 0291. In this domain, the Olig1/Olig2 genes are not cell types (FIG. 6C). These results indicate that the CTCF active and no Super-enhancers are formed in ESCs, whereas boundary sites of ESC SDs and PDs are frequently occupied there are three super-enhancers in NPCs, where these genes by CTCF in multiple cell types and, together with the are highly expressed (FIGS. 6B and S6). analysis of interaction data for NPCs described above, 0292 For regions where 5C interaction data in NPCs and support the idea that CTCFCTCF interaction structures may ChIA-PET interaction data in ESCs could be compared, a often be shared by ESCs and more differentiated cell types. total of 11 out of 32 interactions between CTCF sites identified in NPCs were supported by interaction data in TABLES ESCs (Table S3B), which is impressive given the sparsity of 0295 The following Tables are referenced throughout the interaction data. specification. TABLE S1A. ChIA-PET linker sequences and mapping statistics

Name Sequence

Linker A 3- GAC GAC AGG CTA T (biotin) AG CGC CGG-5 s' - CTG CTG TCC GAT ATC GC-3'

Linker B 3- GAC GAC AGT ATA T (biotin) AG CGC CGG-5 s' - CTG CTG TCA. TAT ATC GC-3'

mESC Smc1 Biological Biological ChIA-PET replicate 1 replicate 2

Total reads 221 653, 525 176, 7 O5,499 chimeric 16, 137,37O 7% 10,348, 925 6% non chimeric 49,993, 038 23% 58, 657, 541 33% too short 81, 5 O1, 625 37% 80, 181, 174 45% (<27 nt after trimming)

ambiguous 74, O21, 492 33% 27, 517,859 16%

PET1 PET2 PET1 PET2

Trimmed 49,993, O38 49,993, O38 58,657, 541 58,657, 541 Total aligned 36, 439, 501 36, 149, 18O 42,213, 793 41,932, 15 O

Joined Mate 28,988, 477 32, 284, 454 Pairs

Unique Intra- 7, O25,799 3, 418, 723 chromosomal Mate Pairs

Unique Inter- 4, 349, 4 O7 4, 648, 962 chromosomal Mate Pairs US 2017/O 130247 A1 May 11, 2017 24

TABLE S1A.- continued ChIA-PET linker sequences and mapping statistics Liigations

intra- 1,239, O23 98O, 462 chromosomal (>4 kb)

interactions

interactions 22 O25 1754 O (PET2, FDR < 0.01)

intra- 21 630 17, O86 chromosomal interactions

ter- 395 454 chromosomal interactions chimeric

Trimmed 16, 137,370 16, 137 370 10,348,925 10,348,925 Total aligned 11, 597, O24 11, 438, 519 7, 443, 723 7, 38 O 674

Joined Mate 8,322, 581 5,309,975 Pairs

Unique Intra- 188, 5O1 7 O 398 chromosomal Mate Pairs

Unique Inter- 29 O1, 591 1, 167, 445 chromosomal Mate Pairs

TABLE S1B TABLE S1B-continued Frequencies of PETs and interactions at various thresholds Frequencies of PETs and interactions at various thresholds Low Confidence High Confidence Inter-chromosomal Intra-chromosomal Inter:Intra Ratio Unique Intra- and inter- Intra-chromosomal intra- chromosomal Interactions i PE 1,062 2. fa, chromosomal Interactions (>4 kb, 3 -- S s PETs (>4> kb) PET, FDR 0.01) Overlap with SMC1 ChIP- High Confidence Total 10,444,522 1,234,006 23,739 Seq peaks Interactions (>4 kb, 3 % of High confidence Enhancer- 147,270 148,080 2,921 (1e–05) PET, FDR 0.01) Interactions Promoter Enhancer- 2,252,066 104,394 2,700 Both End 21,195 89.3% Enhancer 1 End 2456 10.3% CTCF-CTCF 2,306,128 262,149 7,841 None 88 O.4%

TABLE S2E Super-enhancer to gene assignments

Proximal active genes Interacting genes (SE within (SMC1 PET Nearest Chr Start End SEID SE rank 4 kb of TSS) interactions) active genes chr1 36O7O190 36074608 INT STITCHED 100 215 . HS6St1 HS6St1 chr1 36111164 36118698 INT STITCHED 101 152 . HS6St1 HS6St1 chr1 72260528 72261272 INT STITCHED 230 189 Mreg Mreg chr1 72839563 72858.199 INT STITCHED 237 99 . Igfbp2 Igfbp2 chr1 91766947 91773527 INT STITCHED 315 231 Gbx2 Gbx2 US 2017/O 130247 A1 May 11, 2017 25

TABLE S2B-continued Super-enhancer to gene assignments

Proximal active genes Interacting genes (SE within (SMC1 PET Nearest Start End SEID SE rank 4 kb of TSS) interactions) active genes 120538712 120545414 INT STITCHED 368 96 Tcfcp211 Tcfcp211 120971968 120973737 INT STITCHED 372 73 Gli2 Gli2 1212O1424 1212O2481 INT STITCHED 374 170 Inhbb Inhbb 121295.085 121296031 INT STITCHED 376 137 Inhbb Inhbb 138841643 138850970 INT STITCHED 466 1S6 NrSa2 NrSa2 NrSa2 168054897 168073079 INT STITCHED 556 97 Dusp27, Gpa33 Dusp27 1692O1106 1692.20423 INT STITCHED 559 190 Uck2 Uck2 1828.18684 182819554 INT STITCHED 610 59 Lefty2 Lefty2 1828S4521 1828.64307 INT STITCHED 611 S4 Lefty1 Lefty2, Lefty1 Tmem63a, Lefty1 183948.212 183961841 INT STITCHED 615 123 Enah Enah 205746O2 205.91747 INT STITCHED 746 25 Etl4 Etl4 320O8891 32030736 INT STITCHED 812 31 Bat? 33282O29 3330O860 INT STITCHED 817 56 Zbtb34 Zbtb34 71488O13 71494617 INT STITCHED 928 115 Gm1631 Gm1631 152552277 152563676 INT STITCHED 1198 178 Id1 162877048 162893.236 INT STITCHED 1257 18 16598.1373 16598.3444 INT STITCHED 1279 121 168589688 16861717OINT STITCHED 1300 95 Sall4 34544904 34553511 INT STITCHED 1480 34 Sox2 Mr1897 34633687 3466O705 INT STITCHED 1482 Mir1897, Sox2 8837S442 88380O83 INT STITCHED 1607 193 SSr2 954SSO34 95468269 INT STITCHED 1626 Mc1 9638O383 96382115 INT STITCHED 1629 69 TXnip, Ankrd34a, Polr3gl chr3 96479158 96484864 INT STITCHED 1630 116 Ankrd35 Ankrd35, Ankrd35 Nudit17 129247012 12926 1362 INT STITCHED 1732 112 Elové Elov6 118743867 118745786 INT STITCHED 2152 23 Slc2a1 125211671 125223450 INT STITCHED 2192 143 Grik3 Grik3 1373294.36 1373.57766 INT STITCHED 2268 108 Alpl Alpl Alpl 138OOOSS4 138OO6368 INT STITCHED 2273 228 Camk2n1 Camk2n1 14112O768 141126477 INT STITCHED 2292 39 Gm694, B330016D10Rik, Fblim1, Tmem82, AISO7597 hr4 47459254 47.463850 INT STITCHED 2317 196 Agtrap rS 539.33177 53947327 INT STITCHED 2510 71 Rbp rS 65255735 65256794. INT STITCHED 2535 84 rS 13758941 13775389 INT STITCHED 2712 64 MirA69 MirA69 rS 1684.5764 16860853 INT STITCHED 2736 153 Hspb8, 2410137F16Rik rS 18884660 18896412 INT STITCHED 2745 35 Med13 rS 23S84659 23590728 INT STITCHED 2770 41 Rhof Rhof rS 35417523 35421698 INT STITCHED 2830 142 Cldn4 hrö 39370384 39371286 INT STITCHED 3044 225 Mkrn1 hrö 9164O161 91661247 INT STITCHED 3217 124 hrö 22290093 22293017 INT STITCHED 3342 13 Phc1 2264.0118 22657871 INT STITCHED 3348 32 Nanog, Nanog, Nanogpd Nanogpd 22714316 2272O862 INT STITCHED 3349 40 Slc2a3 42458,188 42461905 INT STITCHED 3429 222 Ldhb 4304.7309 43065758 INT STITCHED 3437 98 573O41909Rik 45223.385 45225674 INT STITCHED 3450 230 Kras 3193004 3218183 INT STITCHED 3467 LOC10O303645, LOC10O303645, Mir291b, Mir291b, Mir293, Mir293, Mir290, Mir290, Mir291a, Mir291a, Mir292, Mir292, Mir294, Mir294, Mir295 Mir295 chrt 13599.334 1360O325 INT STITCHED 3481 172 Zbtb45 Zbtb45, Trim28 chrt 30982397 30983.339 INT STITCHED 3523 154 Capins 1 Cox7a.1 Capns1

US 2017/O 130247 A1 May 11, 2017 27

TABLE S2B-continued Super-enhancer to gene assignments

Proximal active genes Interacting genes (SE within (SMC1 PET Nearest Chr Start End SEID SE rank 4 kb of TSS) interactions) active genes chr17 3563921.1 35642435 INT STITCHED 7784 186 Poll.Sf1 Tcf19, Cchcr1 POSf1 chr17 371102O2 37134996 INT STITCHED 7792 12 2410137M14Rik, Znrd1, Znrdlas, 2410137M14Rik Zip57, 2410137M14Rik, H2-MS Trim40, Gabbr1, H2-M5, Zip57, Ofr0 chr17 45593477 45596SO3 INT STITCHED 7812 37 . Spats 1 Spats 1 chr17 47640414 47649043 INT STITCHED 7822 141 Cend3, Taf3 Ccnd3, Bysl, Ccnd3 Med20 chr17 71.213804 71222433 INT STITCHED 7887 42 Tgif1 Tgif1 chr17 71.241991 71250610 INT STITCHED 7888. 203 Tgif1 Tgif1 chr18 75504155 75505202 INT STITCHED 8260 198 Smadf Smadf chr18 7552O332 75527277 INT STITCHED 8261 91 Smadf Smadf Smadf chr19 S835881 S847014 INT STITCHED 8324 16 Neat1 Malat1, Neat1 Slc25a45, Firmd8 chr19 21858770 21866770 INT STITCHED 8378 103 Tmem2 Tmem2 Tmem2 chr19 232O7455 23208806 INT STITCHED 8386 82 . Klf), Mir1192 KIf chrX SOO98631 SO11.4110 INT STITCHED 86.29 150 Mir18b, Kis2 Mir19b–2, Kis2, Mir2Ob, Mir92-2, Mir106a, Mir363

TABLE S2C Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET C l Start end TEID TE rank 4 kb of TSS) interactions)

l 4845264 4846195 NT STITCHED 4 4358 Tcea1 l 969.1815 9692699 NT STITCHED 22 2S43 Mybl1 l 10O21675 10O22274 NT STITCHED 25 4284 Csppl, Copsis l 12681792 12682608 NT STITCHED 40 2368 Sulf1 l 12734270 1273.5431 NT STITCHED 41 521 Sulf1 l 136SO476 136511 S6 NT STITCHED 48 726 Lactb2 l 14302611 14303160 NT STITCHED 54 5331 Eyal l 20944720 20945.563 NT STITCHED 66 3579 Efhc1 l 34O63557 34O64527 NT STITCHED 87 1218 Dst l 3629.6870 3629.7624 NT STITCHED 102 4246 l 38960832 38961217 NT STITCHED 122 5874 l 39959153 3996O298 NT STITCHED 127 3253 l 52O65557 S2O66452 NT STITCHED 153 3287 l 53.009978 53.010966 NT STITCHED 156 499 l SS188460 SS1891.93 NT STITCHED 165 3945 l S8045669 58046468 NT STITCHED 177 7085 l 59.543982 S9544804 NT STITCHED 185 1108 l 63256644 63260998 NT STITCHED 199 729 l 6348853S 6348.9242 NT STITCHED 201 2410 l 66746373 66747073 NT STITCHED 217 2899 l 71692414 71697825 NT STITCHED 227 505 Fin1 l 71719790 71720485 NT STITCHED 228 1904 Fin1 l 72359403 7236.0070 NT STITCHED 234 1813 l 72873970 72875037 NT STITCHED 238 905 l 74354703 74355378 NT STITCHED 245 5971 Tmbim1 l 75477134 75477769 NT STITCHED 250 5588 Chpf, Tmem198 l 7SS08284 755O8949 NT STITCHED 251 1914 Obsl1, Inha l 82816971 828.17766 NT STITCHED 274 2678 l 8792O884 8792.1540 NT STITCHED 284 3696 Spata3 l 87959141 879.59957 NT STITCHED 285 4733 PSmd1 l 882O4615 882O8596 NT STITCHED 290 250 B3gnt7

US 2017/O 130247 A1 May 11, 2017 29

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr2 28.976424 2897 6978 INT STITCHED 783 3537 Setx Setx chr2 291OO520 2.91.01142 INT STITCHED 785 2171 Nting2 Nting2 chr2 29484671 79487572 INT STITCHED 790 375 . Rapgef1 Rapgef1 chr2 29675439 29676619 INT STITCHED 793 1058 Urm1 Urm1 chr2 29917303 2991.7729 INT STITCHED 796 1358 Set Set chr2 29990414. 29.991153 INT STITCHED 797 2134 Tbclc13 Tbclc13 chr2 31427293 31428087 INT STITCHED 808 635 Fubp3 Prdm12 Fubp3 chr2 31736153 31736754 INT STITCHED 810 SO15 Lanc3 Lamc3 chr2 33738008 33739000 INT STITCHED 823 4826 Fam125b Fam125b chr2 38780488 38781118 INT STITCHED 83.9 3934 Olfml2a, Wr38 Nréa1 Nréa1 chr2 44959885 4496055OINT STITCHED 849 4033 Gm.13476 Gm13476 chr2 49304139 49305.084 INT STITCHED 856 73.63 Epc2 Epc2 chr2 50894.125 50905941 INT STITCHED 861 897 . Rind Rind3 chr2 51926608 51.928267 INT STITCHED 865 280 Rifl Rifl chr2 60213132 60223107 INT STITCHED 884 419 Ly75 Ly75 chr2 62246033 62246887 INT STITCHED 895 3985 Dpp4 Dpp4 chr2 65679605 656873O2 INT STITCHED 900 745 Csmp3 Csrnp3 chr2 71556963 71557704 INT STITCHED 930 400 Gm1631 Gm1631 chr2 71704256 71705255 INT STITCHED 932 380 Pok1 Pok1 Pdk1 chr2 72823322 72824.095 INT STITCHED 936 3178 Sp3 Sp3 chr2 75544.463 75545103 INT STITCHED 951 2667 Nfe22 Nfe22 chr2 76238133 76238918 INT STITCHED 958 3707 Osbp16 Osbp 16 chr2 79269600 7927O299 INT STITCHED 973 2411 Cerkl Neurod1 Cerkl chr2 83550377 83.559168 INT STITCHED 980 1203 Itgav Itgav chr2 84677181 84.679035 INT STITCHED 986 860 Sc43a1 Sc43a1 chr2 91544344 91548597 INT STITCHED 993 3127 Atgl3, Atg13 Harbi1 chr2 9177253O 917731.98 INT STITCHED 994 1226 Mdk Chrma. Mdk chr2 94103254. 94103817 INT STITCHED 1002 2880 Miró70 Miró70 chr2 O2O23859 102024282INT STITCHED 1013 6628 Lolrad3 Ldrad3 chr2 O2243909 102243994 INT STITCHED 1014 8718 Trim44 Trimal4 chr2 03632070 103632867 INT STITCHED 1022 3964 Caprin1 Caprin1 chr2 O3938892 103939697 INT STITCHED 1024. 4914 ColS9a Cd59a chr2 O4428555 10442.9275 INT STITCHED 1026 6129 Cstf3 Cstf3 chr2 O4657804 104666793 INT STITCHED 1028 1623 Qser1 Qser1 chr2 O4695622 104696627 INT STITCHED 1029 3340 Prrg4 Prrg4 chr2 O5237758 105249007INT STITCHED 103 1507 Rcn 1 Rcn1 chr2 O9753059 109753825 INT STITCHED 1048 2198 Gm13939, Gm13939 Lgr4 chr2 14003762 114004516 INT STITCHED 1054 2091 Aqr Aqr chr2 15885482 115886413 INT STITCHED 106 3063 Meis2 311.0099EO3Rik, Meis2 281040SF15Rik chr2 16945256 116946216 INT STITCHED 1066 1301 Spred1 Spred1 chr2 19615493 119616460 INT STITCHED 1079 1269 Rpap1 Tyro3 Rpap1 chr2 20069560 1 20070347 INT STITCHED 108 4738 Pla2.g4e Pla2.g4e chr2 21095895 121096949 INT STITCHED 1083 5489 Trp53bp1 Trp53bp1 chr2 21303040 121303595 INT STITCHED 1084 4080 Mfap1b Mfap 1b chr2 21628647 121629289 INT STITCHED 1086 6300 Frmd5 Firmd5 chr2 2219566O 122196747 INT STITCHED 109 1739 Slf Shif chr2 25973476 125973812 INT STITCHED 1098 7151 Dtwd Dtwd chr2 27664.828 127665772 INT STITCHED 1105 2730 . Bub1 Bub1 chr2 28638582 12863915OINT STITCHED 111 3886 Tmem87b Tmem87b chr2 28887217 128888.053 INT STITCHED 1113 6432 Tt Ttl chr2 2.9412877 129414126 INT STITCHED 1117 2016 Sirpa Sirpa chr2 295 21285 129528427 INT STITCHED 1118 967 Pdyn Polyn chr2 29623870 129626180 INT STITCHED 1119 1155 Stk35, Stk3S 4932416HOSRik chr2 31058146 131070063 INT STITCHED 1125 1091 Mays Mays chr2 405O2439 140503025 INT STITCHED 1155 3559 FIrt3 Frt3 chr2 45614651. 145615227 INT STITCHED 1167 5472 Rin2 Rin2 chr2 46837021 146837769 INT STITCHED 1177 2634 Xrn2 Xrn2 chr2 48700392 14870 1063 INT STITCHED 1185 1716 Cst3 9230104L09Rik Cst3 chr2 50482907 150491844 INT STITCHED 1190 744 AcSS1 AcSS1 chr2 50863293 150863974 INT STITCHED 1193 7212 Nanp Nanp chr2 52048814 152049781 INT STITCHED 1197 4239 Csink2a1 CSnk2a1 chr2 52932871 152.933554 INT STITCHED 1200 1971 Hek Hick chr2 53172351 153173278 INT STITCHED 1202 2564 ASX11 ASX1 chr2 534673.02 153469325 INT STITCHED 1208 3527 Dnimt3b Dnmt3b chr2 54905247 154906205 INT STITCHED 1212 5226 Ahcy Ahcy US 2017/O 130247 A1 May 11, 2017 30

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr2 56251118 156251931 INT STITCHED 1218 1519 Epb4.111 Epb4.111 chr2 56667667 156668409 INT STITCHED 1223 1808 Tgif2 Tgif2 chr2 57243952 157244973 INT STITCHED 1229 337 Src. Ghrh, Src Src. chr2 57978463 157980974 INT STITCHED 1233 239 Tgm2 Tgm2 chr2 58614658 158615457 INT STITCHED 1238 4450 Dhx35 Dix35 chr2 62753625 162754468 INT STITCHED 1255 7920 Srsf6 Srsfö chr2 63890381 163891019 INT STITCHED 1261 6983 Tomm34 Tomm34 chr2 64277250 164277.825 INT STITCHED 1263 4381 . Sdc4 5dc4 chr2 64649260 16464988OINT STITCHED 1265 1164 ZSwim1, Zswim1 17OOO2OOO7Rik chr2 64680558 16468.1078 INT STITCHED 1266 5540 Pltp Pltp chr2 65682802 165683641 INT STITCHED 1273 1421 . Zmynd8 Zmynd8 chr2 65700 123 165704304 INT STITCHED 1274 2229 Zmynd8 Zmynd8 chr2 657 19928 165725066 INT STITCHED 1275 557 . Zmynd8 Zmynd8 chr2 66O19666 166020678 INT STITCHED 1280 431 Sulf2 Sulf2 chr2 66884O75 166898489 INT STITCHED 1288 240 Snord 12, Kcnb1 Snord 12 1500012EO1 Rik, Znfx1 chr2 67248382 1672493OOINT STITCHED 1294 2679 Slc9a8 Slc9a8 chr2 67355489 167366175 INT STITCHED 1295 652 Snail Snail chr2 67979755 167980437 INT STITCHED 1297 1110 . Adnp Adnp chr2 68039253 1680401.34 INT STITCHED 1299 6482 . Adnp Adnp chr2 68664O93 168667592INT STITCHED 1301 675 Atp9a, Sall4 Sall4 chr2 68784552 168785568 INT STITCHED 1302 1060 Zip64 Z?p64 chr2 74082837 174087567 INT STITCHED 1339 943 Mir296 Mir296 chr2 74113130 174113943 INT STITCHED 1340 2341 Gnas Gnas chr2 80443435 180445072 INT STITCHED 1358 534 Dido1, Sc17a9 Dido1 231 OOO3C93Rik chr2 80462O24 180475727 INT STITCHED 1359 329 Sc17a) Slc17a9, Slc17a9 Dido1 chr2 806874.47 180688955 INT STITCHED 1362. 2111 Nkaina. Birc7, Arfgap1 Nkaina. chr2 80947699 180948243 INT STITCHED 1364 3425 BC051628 Ppdpf BCOS1628 chr2 80979223 18098.0098 INT STITCHED 1365 7507 BCOO6779 BCOO6779 chr2 81056733 181067845INT STITCHED 1366 1346 Rtel1 Rtel1 chr3 7366490 7367336 INT STITCHED 1376 8686 Pkia Pkia chr3 8509668 851 1844 INT STITCHED 1381 318 Stinn? Stmin2 chr3 8963969 8965O13 INT STITCHED 1382 3662 Tpd52 TS2 chr3 9008384 9009250 INT STITCHED 1384 1165 Tpd52 TS2 chr3 9243943 9244611 INT STITCHED 1386 4.596 Zbtb10 Zbtb10 chr3 96.10853 9611748 INT STITCHED 1391 1939 Zip704 CO3.0034L19Rik Zfp704 chr3 103.53887 103.54547 INT STITCHED 1395 5545 Zfand 1 Zfand 1 chr3 14863O38 14863599 INT STITCHED 1406 1714 Car3 Car3 chr3 18523768 18524685 INT STITCHED 1418 1649 . Cyp7b1 Cyp7b1 chr3 197971.16 19797596 INT STITCHED 1423 7359 4632415LOSRik . 463241SLOSRik chr3 19957995 19958715 INT STITCHED 1425 3002 Hitf Htf chr3 21834336 21835378 INT STITCHED 1430 708 . Tbl1Xr1 Tbl1Xr1 chr3 21964172 21975748 INT STITCHED 1433 312 Tb1xr1 Tbl1Xr1 chr3 276O2172 27603039 INT STITCHED 1447 1051 Findc3b Findc3b chr3 27834469 2783.6247 INT STITCHED 1449 2019 Pld1 C chr3 30545417 30547747 INT STITCHED 1462 694 Lrriq4 Lrriq4 chr3 308.99964. 30917089 INT STITCHED 1468 432 Pirkci. Pkci. chr3 30988O14 30988781 INT STITCHED 1469 1571 Skil Skill chr3 53261382 53262179 INT STITCHED 1518 245O 281 0046LO4Rik, . Nhrc3 Nhlrc3 chr3 53453757 53454852INT STITCHED 1521 1227 Frem2 Frem2 Frem2 chr3 53672950 53673878 INT STITCHED 1522 3451 Ufm1 Ufim1 chr3 57536429. 57536949 INT STITCHED 1530 4492 Rinfl3 Rnf13 chr3 62145578 6214625OINT STITCHED 1539 5028 4631416L12Rik . 4631416L12Rik chr3 63098862 630995.42 INT STITCHED 1541 2296 Mme Mme chr3 65339039 65.339696 INT STITCHED 1548 7074 493.144OP22Rik . 493144OP22Rik chr3 67307480 673 18756 INT STITCHED 1552 1308 Rarres1 Rarres1 chr3 683.684.82 68369877 INT STITCHED 1558 1231 Schip1 Schipl chr3 79375842 79376337 INT STITCHED 1573 6825 Fnip2 Fnip2 chr3 80838336 808394.01 INT STITCHED 1575 1948 Pdgfc Pdgfic chr3 83824381 83838289 INT STITCHED 1586 957 D93OO15EO6Rik . D93OO15E06Rik chr3 83985389 84010994 INT STITCHED 1587 439 Gm6525, GinóS25 Trim2 chr3 84624646 84.625298. INT STITCHED 1589 7481 Fbxw7 Fbxwf chr3 85952913 85953.587 INT STITCHED 1596 4519 Rps3a Rps3a chr3 86.940621 86941199 INT STITCHED 1600 3865 . Kirrel Kirrel

US 2017/O 130247 A1 May 11, 2017 32

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr4 82047159 82047945 INT STITCHED 2033 1456 . Nfb Nfb chr4 82153147 82158518 INT STITCHED 2036 558 Nfb Nfb Nfb chr4 82180557 821898.32 INT STITCHED 2037 7204 . Nfb Nfb chr4 82973.975 82976812 INT STITCHED 2041 343 Ttc39b Ttc39b chr4 83059982 83060620 INT STITCHED 2042 2352 Snapc3 Snapc3 chr4 83164867 8316562OINT STITCHED 2044 2906 4930473A06Rik . 4930473A06Rik chr4 84854.207 84.855065 INT STITCHED 2051 2922 Sh3g|2 Sh3g2 chr4 91065881 91066800 INT STITCHED 2065 4904 Elavl2 Elaw2 chr4 98394.367 98395329 INT STITCHED 2073 336 L1td1 L1td1 chr4 9976978.0 9977O611 INT STITCHED 2084 6051 Ror1 Ror1 chr4 OO451625 100452191 INT STITCHED 2085 5943 Cachd1 Cachd1 chr4 O1223833 1.01224306 INT STITCHED 2087 6107 Dnajcó Dinajcó chr4 O4831497 104842867 INT STITCHED 2096 535 Ppap2b Ppap2b Ppap2b chr4 O4868425 104869271 INT STITCHED 2097 947 . Ppap2b Ppap2b chr4 O4896507 104898349 INT STITCHED 2098 235 . Ppap2b Ppap2b chr4 O70353.27 107035897 INT STITCHED 2106 3847 Tmem48 Tmem48 chr4 07550428 107550830 INT STITCHED 2107 8621 Magoh Magoh chr4 O7974909 107975592 INT STITCHED 2112 7080 Zyg11b Zyg11b chr4 O8515672 108516321 INT STITCHED 2117 5600 Ktil 2 Kti12 chr4 O069602 110070245 INT STITCHED 2124 5744. Agbl4 Agbl4 chr4 5743638 115744295 INT STITCHED 2133 4916 Uq.crh, Lrrca.1 Uqcrh chr4 6397804 1163984.74 INT STITCHED 2134 5503 Tesk2 Tesk2 chr4 68251.41 116825878 INT STITCHED 2138 599 Snord38a, Snord38a. Rps8, SnordS5 chr4 6924570 116925269 INT STITCHED 2140 7045 Gm1661, Gm1661 TmemS3 chr4 7085700 117088622 INT STITCHED 2143 815 . Rnf220 Rnf220 chr4 8212353 118213114 INT STITCHED 2147 6341 Tmem125 Timen125 chr4 8359572 11836O154 INT STITCHED 2148 2902 Olfr1342 Olfr1342 chr4 8910530 118911287 INT STITCHED 2154 4305 Lepre1, Lepre1 AUO222S2 chr4 8964,429 118965089 INT STITCHED 2155 7516 Ybx1 Ybx1 chr4 9076.007 119084885 INT STITCHED 2156 4562 Cedc30 Ccdc30 chr4 9170858 119171363 INT STITCHED 2157 4607 Rimkla Rimkla chr4 9487998 119488754 INT STITCHED 2160 1604 Hivep3 Hivep3 chr4 98.31385 119832O62INT STITCHED 2161 3892 Edin2 Edn2 chr4 22514588 122515039 INT STITCHED 2172 5193. Ppt1 Ppt1 chr4 22676422 122676991 INT STITCHED 2173 7715 Mycl1 Mycl1 chr4 24811451 124811907 INT STITCHED 2185 641 Zc3h12a Zc3h12a chr4 25779943 125780736 INT STITCHED 2196. 2275 Lsm10, Stk40 Stk40 chr4 26142985 126143622 INT STITCHED 2198 1019 Eif2c1 Eif2c1 chr4 26741328 126741791 INT STITCHED 2202 4312 Zmym1 Zmym1 chr4 268O1236 126802125 INT STITCHED 2203 2032 Gm12942 Gm12942 chr4 27021763 127022891 INT STITCHED 2208 690 . Gib3, Gib5 Gib5 chr4 28334780 1283353.98 INT STITCHED 2211 5180 Phc2 Phc 2 chr4 284.05346 1284O6067 INT STITCHED 2212 8615 Phc2 Phc 2 chr4 28556315 128556889 INT STITCHED 2213 2454 Trim62 Trim62 chr4 29050358 129052712INT STITCHED 2218 2SS Zbtb8:a Zbtb8a. chr4 29221212 129221953 INT STITCHED 2219 5426 Holac1 Holac1 chr4 29779132 129780024 INT STITCHED 2222 1812 Pef1 Tinagl1 Pef chr4 30213352 130214031 INT STITCHED 2225 2627 Pum1 Pum1 chr4 3206147O 132062305 INT STITCHED 2233 2028 Sesn2 Sesn2 chr4 33371377 133372545 INT STITCHED 224.5 436 . Arid1a, Pigv Arid1a, chr4 33567462 133568.053 INT STITCHED 2248 2013 . Lin28a. Lin28a. chr4 34406960 134407757 INT STITCHED 2251 7098 Tmem57 TimenS7 chr4 35137211 135138212 INT STITCHED 2256 13 OO . Grh3 Gr3 chr4 35702424. 135703370 INT STITCHED 2258 1210 Id3 d3 ch14 135847102 135847490 INT STITCHED 2260 5446 Zip46 ZfAJ46 chr4 35873252 135874283 INT STITCHED 2261 3874 Hinrnpr Hinrnpr chr4 36016956 136O19978 INT STITCHED 2262 655 Luzp1 Luzp1 chr4 37025750 137026764 INT STITCHED 2264 587 Hspg2 Hspg2 chr4 37879059 137879857 INT STITCHED 2271 4511 Pink1 Pink1 chr4 3952O165 13952O892INT STITCHED 2282 3096 KIhdc7a KIhdc7a chr4 40200327 140201030 INT STITCHED 2286 3387 Arhgefl01 Arhgef101 chr4 40249934 14025 1467 INT STITCHED 2287 1738 Ricc2 Ricc2 chr4 4051566O 140516266 INT STITCHED 2288 3301 Schb Sdhb chr4 41154043 141155.058 INT STITCHED 2293 SO9 Fbin1 Fblim1 chr4 41784.495 141791290 INT STITCHED 2298 117O 9030409G11Rik . 9030409G11Rik chr4 42891686 142896659 INT STITCHED 2306 555 Pelpn Popn chr4 474.07829 147408586 INT STITCHED 2316 2518 Clcné, Mthfr Clcnó US 2017/O 130247 A1 May 11, 2017 33

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr4 1496.04905 14960.5684 INT STITCHED 2326 1574. Eno1, GmS506 Eno1 chr4 150415590 150416311 INT STITCHED 2333 7184 Per3 Pers chr4 151373.388 151381390 INT STITCHED 2335 1548 KIhl21,Phfl3 Khl21 chr4 151417116 151417695 INT STITCHED 2336 4939 No.19, Tas1r1 No.9 chr4 151559144 15155.9677 INT STITCHED 2337 3681 Acot7 Acot7 Acot7 chr4 151664406 15166515OINT STITCHED 2338 3943 Icmt Icmt chr4 153469755 153470645 INT STITCHED 2346 2164 Trp73 Trp73 chr4 155043894 155044798 INT STITCHED 2356 991 Mb2 Mb2 chr4 155562905 155563924 INT STITCHED 2358 1143 Agrin Agrin chrS 5698769 56995.06 INT STITCHED 2369 3137 Steap2 Steap2 chrS 8423915 84.24874 INT STITCHED 2372 1629 Slc25a40, Slc25a40 Dbfa. chrS 20389584 20390410 INT STITCHED 2405 3630 Phtf2, TmemóO . Phtf2 chrS 21144388 21144924 INT STITCHED 2410 7716 Fbx113 Fbx13 chrS 21157440 21158426 INT STITCHED 2411 5853 Fbx|13, Armc10 . Armc10 chrS 21237845 21238483 INT STITCHED 2412 7016 Pmpcb Pmpcb chrS 241901.57 24190688 INT STITCHED 2425 4368 Nub 1 Nub1 chrS 24515415 24516073 INT STITCHED 2426 2110 . Prkag2 Prkag2 chrS 25038744 25042771 INT STITCHED 2428 863 1700096K18Rik MI13, 1700096K18Rik 4831440E17Rik chrS 25266891 2526757 OINT STITCHED 2431 3382 Actr3b Actrib chrS 27368835 27369369 INT STITCHED 2438 8605 Dpp6 Dpp6 chrS 295.25401 2.9526304 INT STITCHED 2446 2.338 Rinf32 Rnf32 chrS 30063886 30064743 INT STITCHED 2450 3241 Dnab6, Dinajbó GmS129 chrS 31791484 31795140 INT STITCHED 2453 1536 4930548H24Rik, . Gpn1 Gpn1 chrS 32918629 3291.9653 INT STITCHED 2458 8602 Yes1 Yes1 chrS 33444956 3345O141 INT STITCHED 2461. 2733 Slc5a1 Slc5a1 chrS 35447171 35447793 INT STITCHED 2473 2708 Lipap1 Lirpap1 chrS 37188271 37189788 INT STITCHED 2477 1412 Mrfap1 Mrfap1 chrS 4.4251617 4426.1048 INT STITCHED 2494 SO3 C38 C38 chrS 45886696 45887517 INT STITCHED 2497 2501 Lap3 Lap3 chrS 45906343 45906785 INT STITCHED 2498 5778 Med28 Med28 chrS 53.857769 538591.25 INT STITCHED 2508 1228 . Rbp Rbp chrS 5398.3387 53983835 INT STITCHED 2512 4655 Rbp Rbp chrS 64362131 64362846 INT STITCHED 2524 8593 Rel11 Rel1 chrS 64486701 64494.650 INT STITCHED 2526 3034 Pgm1 Pgm1 chrS 64891420 64892.157 INT STITCHED 2531 SO6 . Kf3 Kf3 chrS 64970809 64971754 INT STITCHED 2532 2035 . Kf3 Kf3 chrS 65038454 65039558 INT STITCHED 2533 747 . Kf3 Kf3 chrS 652O3O49 65203823 INT STITCHED 2534 60S . Kf3 Kf3 chrS 65521902 655226OOINT STITCHED 2537 3759 KIhl5 Khl5 chrS 65832295 65833237 INT STITCHED 2539 4448 Ugdh Ugdh chrS 66091258 66091720 INT STITCHED 2541 5666 Pols5a Pds5a chrS 66153811 66154536 INT STITCHED 2543 8591 N4bp2 N4bp2 chrS 66542614 66545335 INT STITCHED 2544 1354 Rbm47 Rbm47 chrS 68240570 68241089 INT STITCHED 2556 4241 Atp8.a1 Atp8.a1 chrS 75972506 75973498 INT STITCHED 2583 2620 Kit Kit chrS 76798260 76799006 INT STITCHED 2588 2195 Nmu Nmu chrS 7733.9613 7734.0383 INT STITCHED 2590 8032 Aasdh Alasdh ch15 77683194 77694467 INT STITCHED 2593 238 Rest Rest chrS 89108097 89114909 INT STITCHED 2614 1675 Grsf1 GrSfl. chrS 92.926941 92.93.3665 INT STITCHED 2621 904 Scarb2 Scarb2 chrS 93112043 93112714 INT STITCHED 2624 5586 Shroom3 Shroom3 chrS 93322548 93323285 INT STITCHED 2626 3484 Shroom3 Shroom3 chr5 100472593 100473735 INT STITCHED 2652 2432 Enoph1, Enoph1 Hinrpdl chr5 101101211 101101831 INT STITCHED 2655 2682 Coq2 Coq2 chr5 1038.47896 1038484.33 INT STITCHED 2669 5025 Ptpn13 Ptpn13 chr5 104862866 10487233OINT STITCHED 2676 320 Spp1 Spp1 chr5 109097366 109098.181 INT STITCHED 2691 2714 Idua, Slc26a.1 Idua chrS 110527885 110528617 INT STITCHED 2695 2800 Plcxd1 Plcxd1 chrS 110656201 110658296 INT STITCHED 2696 2455 Ankle2 Ankle2 chrS 110897904. 110899242INT STITCHED 2700 292 . Fbrsl1 Fbrsl1 chrS 111232926 111236161 INT STITCHED 2702 276 Uk1 Uk1 Uk1 chrS 1113.0646O 111307534 INT STITCHED 2703 4183 Ttc28 Ttc28 chrS 113.915656 113916217 INT STITCHED 2715 5417 Wscd2 Wisco2 chrS 114725389 114725938 INT STITCHED 2720 3071 Foxna. Foxna. chrS 115021009 115022171 INT STITCHED 2724 1268 BC057022 BCO57022 US 2017/O 130247 A1 May 11, 2017 34

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chrS 15721625 115722390 INT STITCHED 2727 1373 Rnf10 Rnf10 chrS 15750818 115751483 INT STITCHED 2728 1977 Dynll1 Dynll1 chrS 1588.6385 115894671 INT STITCHED 2732 368 . Msi1 Msi1 chrS 159293.83 115929949 INT STITCHED 2733 390 Sirta SirtA chrS 16739839 116740353 INT STITCHED 2735 4769 Cedco,0 Ccdc60 chrS 17764474 117765420 INT STITCHED 2741 2391 Vsig10 WSb2 Vsig10 chrS 21986673 121990552 INT STITCHED 2757 2098. Mapkapk5 Mapkapk5 chrS 236O7764. 123608595 INT STITCHED 2771 1902 . Rhof Rhof chrS 23840532 123840817 INT STITCHED 2773 7999 Mixip Mlxip chrS 281 03516 128103953 INT STITCHED 2789 8575 Sc15a4 Slc15a4 chrS 291.05766 12911.0009 INT STITCHED 2794 828 5930412G12Rik, . S93O412G12Rik FZ10 chrS 29459327 129459946 INT STITCHED 2795 4878 Rimbp2 Rimbp2 chrS 30216498 130217009 INT STITCHED 2801 4380 Mrps17 Mrps17 chrS 30234026 130234832INT STITCHED 2802 4152 Gbas Gbas chrS 30363609 1303.64562INT STITCHED 2803 8574 Chchd2 Chchd2 chrS 34654188 134654880 INT STITCHED 2825 1841 Wbscr16, Wbscr16 Gtf2ird2 chrS 35402290 135403027 INT STITCHED 2829 1370 Wbscr27 Wbscrf chrS 35476827 135477544 INT STITCHED 2832 1445 Wbscr25 Abhd11 WbscrS chrS 36105127 136105914 INT STITCHED 2834 3937 Rhbdd2 Rhbdd2 chrS 36410740 136412181 INT STITCHED 2838 736 Ywhag Ywhag chrS 377470.12 137747785 INT STITCHED 2845 2219 Srrt Srt chrS 37778171 137778873 INT STITCHED 2846 2182 Sc12a9 Slc12a9 chrS 37792787 137793553 INT STITCHED 2847 3627 Ephb4 Ephb4 chrS 38531655 138532586 INT STITCHED 2849 4057 Zkscan1 ZkScan1 chrS 39675717 139676582 INT STITCHED 2854. 1584 Sun1 Sun1 chrS 39734064 139734931 INT STITCHED 2855 4824 GetA GetA chrS 41087076 141088486 INT STITCHED 2864. 1220 Lifng Lifrig chrS 43O22054. 14303753OINT STITCHED 2874 494 Radi1 Radi1 chrS 43543892 143544993 INT STITCHED 2876 2397 . Thrc.18 Thrc.18 chrS 4356O794. 143579912 INT STITCHED 2877 946 Thrc.18 Thrc.18 Thrc.18 chrS 41718476 143719226 INT STITCHED 2878 3909 Fscn1 Fscn1 chrS 4935O152 149362682 INT STITCHED 2907 864 Ub3 Ub3 chrS 49556262 149556865 INT STITCHED 291O 3402 2210417AO2Rik . 2210417A02Rik chr6 7644080 7663321 INT STITCHED 2939 847 ASns ASns chr6 15664O27 15664459 INT STITCHED 2958 5388 Mdfic Mdfic chr6 3.0109355 3.01.100.97 INT STITCHED 2995 772 Mr182 Mir183, Mir182 Mir96 chr6 373842O2 37386490 INT STITCHED 3.028 3134 Creb312 Creb32 chr6 38242963 38243836 INT STITCHED 3037 5549 Zc3 hav1.1 Zc3 hav1 chr6 42205849 42206459 INT STITCHED 3.058 7001 Tmem139 Tmem139 chr6 50066.065 50074818 INT STITCHED 3.072 1637 Mpp6 Mpp6 chr6 50209997 50210610 INT STITCHED 3073 6221 Dfna5 Dfnas chr6 51483.276 51494780 INT STITCHED 3.078 826 Snx1O Snx10 chr6 51956055 51963851 INT STITCHED 3081 577 Skap2 Skap2 chr6 52103875 521.04482 INT STITCHED 3084 5045 Hoxa1 Hoxa1 chr6 54.914440 54915345INT STITCHED 3101 1628 Nod1 Nod1 chr6 6698O137 66981045 INT STITCHED 3129 2263 Gadd45a Gadd45a Gadd45a chr6 71575730 71576551 INT STITCHED 3144 3531 Kolm3a Kidm3a chr6 72502671 72503229 INT STITCHED 3147 2910 Capg Capg chr6 72703827 727O8966 INT STITCHED 3150 1561 . Tcf71 Tcf71 chr6 72736980 72745613 INT STITCHED 3151 634 Tcf71 Tcf71 Tcf71 chr6 72848O28 72848774 INT STITCHED 3153 5856 Kcmf1 Kcmf1 chr6 827 10614 82711291 INT STITCHED 3.177 2460 . Hk2 Hk2 chr6 84O12527 84015422 INT STITCHED 3186 1066 . Zfml, Dysf Dysf chr6 85O24817 85025619 INT STITCHED 3191 5735 Exocob Exocob chr6 87863162 87864161 INT STITCHED 3199 1880 843O410A17Rik . 8430410A17Rik chr6 92196.225 92198121 INT STITCHED 3219 1001 Trh Trh chr6 97051422 97052960 INT STITCHED 3248 492 Fam19a4 Fam19a4 chr6 98979333 9898.0035 INT STITCHED 3262 2081 Foxp1 Foxp1 chr6 100235278 10023657 INT STITCHED 3276 4629 Rybp Rybp chr6 100369906 10O372686 INT STITCHED 3279 988 . Rybp Rybp chr6 108020709 108021593 INT STITCHED 3290 2984 Setmar Setmar chr6 112899075 112900921 INT STITCHED 3305 1751 Srgap3 Srgap3 chr6 113452636 113453205 INT STITCHED 3310 4977 Prrt3 Pirrt3 chr6 113566995 113567675 INT STITCFIED 3311 7573 Vhl VI chr6 115633121 115635.841 INT STITCHED 3320 1845 Raf1 Raf1 chr6 116148172 116148826 INT STITCHED 3324 4749 Tmcc1 Tmcc1 chr6 119274532 119275309 INT STITCHED 3331 2012 Lrtm2 Lrtm2 US 2017/O 130247 A1 May 11, 2017 35

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr6 19364465 119369724 INT STITCHED 3332 869 Adipor2 Adipor2 chr6 20241456 120242188. INT STITCHED 3335 2421 B4galnt3 B4galnt3 chr6 22316022 122317010 INT STITCHED 3343 278 . Phc1 PC1 chr6 22558176 1225591.03 INT STITCHED 3.345 907 Gdf3, Apobec1 Gdf3 chr6 22574257 122575075 INT STITCHED 3346 1790 Dppa3 Dppa3 chr6 24754367 124755031 INT STITCHED 3353 6468 Spsb2 Spsb2 chr6 2478.1521 124782304 INT STITCHED 3354 3019 Cdcas, Usp5 Coca chr6 25146126 125147005 INT STITCHED 3.357 2986 Mrpl51, Ncapd2 . Mrpl51 chr6 27053459 127054097 INT STITCHED 3.369 3276 9630O33F2ORik . 9630O33F2ORk chr6 28092062 128092947 INT STITCHED 3375 2085 Tspan.9 Tspang chr6 28395390 128396444 INT STITCHED 3380 1491 . Gm10069, Gm10069 Fkbp4 chr6 34839572 13484.0045 INT STITCHED 3399 7135 Gpr19 Gpr19 chr6 35016609 135017490 INT STITCHED 3402 2909 Gprc5a Gprc5a chr6 351436OO 1351441.85 INT STITCHED 3403 7525 843O419L09Rik . 843.04.19L09Rik chr6 37118500 137119078 INT STITCHED 3414. 4375 Rerg Rerg chr6 429.10352 142911545 INT STITCHED 3433 1851 St&sia1 St8sia1 chr6 42969103 142969851 INT STITCHED 3435 1647 . St8sia1, St8sia1 573O41909Rik chr6 43119001 143119729 INT STITCHED 3439 5592 Etnk1 Etnk1 chr6 449.97218 145O10452 INT STITCHED 3446 738. Bcat1 Bcat1 chr6 46668166 146674748 INT STITCHED 3453 2469 Stk381 Stk381 chr6 48894789 148906259 INT STITCHED 3463 362 FalmöOa. Famó0a chr6 4926O435 14926 1383 INT STITCHED 3466 2613 2810474O19Rik . 2810474O19Rik chirf 4076531 4077316 INT STITCHED 3470 1222 94.30041.J12Rik, Leng.9 9430041.J12Rik Ttyh1 chirf 4487732 4488465 INT STITCHED 3473 7092 6030429GO1 Rik . 6030429GO1Rik chirf 4703001 4704500 INT STITCHED 3474 696 Cox6b2 Cox6b2 chirf 7085385 708.6148 INT STITCHED 3480 2995 Zip773 Z?p773 chirf 20045717 20046525 INT STITCHED 3486 3637 Mark4 Mark4 chirf 20093.826 20094324 INT STITCHED 3487 7263 Bloc1s3, Bloc1S3 Trappc6a chirf 2O157934 20161677 INT STITCHED 3488 357 Geminf Gemin7, Gemint Zfp296 chirf 20326833 20328288. INT STITCHED 3489 579 . Pyr2 Pwr2 chirf 20400342 20402067 INT STITCHED 3490 649. Bc3 Bc3 Bc3 chirf 74877697 94873996 INT STITCHFD 3492 4012 Zip 180 Zf180 chirf 2528977O 25302672INT STITCHED 3493 236 Zfp428 Zfp428 Zf428 chirf 25664201 25665192 INT STITCHED 3495 3298 Rps19 Rps 19 chirf 25998956 25.999482 INT STITCHED 3498 4323 Dedd2 Dedd2 chirf 2604876O 26049461 INT STITCHED 3499 S3O . Erf Er chirf 27965321 27966083 INT STITCHED 3504 2673 Rab4b, Mia1 Rab4b chirf 27983767 2798.4476 INT STITCHED 3505 3541 Snirpa Snirpa chirf 280433O2 28043716 INT STITCHED 3506 2757 Numb| Numb chirf 28518631 28521086 INT STITCHED 3510 1758 Zip60 Zf 60 chirf 28624916 28625830 INT STITCHED 3511 1744 Zip59 Zfs9 chirf 29363488 29364974 INT STITCHED 3513 845 . PakA. PakA chirf 2.942O248 29420834 INT STITCHED 3514 4991 C330005M16Rik C33OOOSM16Rik chirf 29477392 294782OOINT STITCHED 3515, 1927 Fbxo27 FbxO27 chirf 30078547 30079377 INT STITCHED 3517 1761 Ppp1r14a Ppp1r14a Pop1r14a chirf 3.0757305 30768472 INT STITCHED 3520 612 Zfp568 Z?p74 Zfs68 chirf 3O872494 30873179 INT STITCHED 3522 7178 Zip566 Zfs66 chirf 31372373 31372940 INT STITCHED 3526 4996 Zbtb32, Wbp7 Wbp7 chirf 31392634 31393353 INT STITCHED 3527 1582 Upkla Wbp7, Opkla Prodh2 chirf 3.1937 273 31938066 INT STITCHED 353O 4926 Gramd1a. Gramd1a chirf 34949720 34950749 INT STITCHED 3531 8526 Uba2 Uba chirf 35437649 35438290 INT STITCHED 3532 2994 Kctd15 Kctd.15 chirf 36107701 361.20798 INT STITCHED 3535 610 Gpatch1, Rhpn2 Rhpn2 chirf 383.386OO 3833.9282INT STITCHED 3547 2440 . Zip536 Zip536 chirf 50862978 50863932 INT STITCHED 3560 1804 Zip819 Z?p819 chirf 51630296 51631278 INT STITCHED 3562 1315 1700008OO3Rik Syt3 7OOOO8OO3Rik chirf 51954.357 51964.325 INT STITCHED 3565 582 Izumo.2 Zumo2 chirf 52878661 528792.59 INT STITCHED 3569 6738 Izumo.1, Zumo1 Rasip1, Fut1 chirf 54053865 54.054483 INT STITCHED 3571 5445 Gitf2h1, Gitf1 Hps5 chirf 57029746 57030493 INT STITCHED 3581 4420 Prmt3 Print US 2017/O 130247 A1 May 11, 2017 36

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chirf 70585 007 705.85674 INT STITCHED 3600 7567 Otud7a Otud7a chirf 723O2006 72302711 INT STITCHED 3608 4605 Fam189a1 Fam189a1 chirf 75418913 75421020 INT STITCHED 3619 2227 Fam169b Fam169b chirf 80286O18, 80286,585 INT STITCHED 3632 4127 . Rgma Rgma chirf 80531416 80545623 INT STITCHED 3636 1261 . Rgma Rgma chirf 8067,9726 80684471 INT STITCHED 3637 S12 C2 Chd2 chirf 86284.126 86.2848O2 INT STITCHED 3651 4132 . Hapin3 Hapin3 chirf 8.6423381 86424O1OINT STITCHED 3653 4382 Abhd2 Abhd2 chirf 867993.96 86806853 INT STITCHED 3657 741 S730590G19Rik . 5730590G19Rik chirf 87253990 87254677 INT STITCHED 3660 8111 Idh2 dh2 chirf 87954.066 87954752 INT STITCHED 3666 4847 Iqgap1 qgap1 chirf 88208035 88208759 INT STITCHED 3671 2017 A1pk3 Alpk3 chirf 955.78089 95.578712INT STITCHED 3691 5604 Rab38 Rab38 chirf 96490332 96490891 INT STITCHED 3692 4627 Tmem135 Tmem135 chirf 971.23764 9712461 OINT STITCHED 3695 2094 Eed Eed chirf 97410448 97410819 INT STITCHED 3698 6727 Ccdc83 Ccdc83 chirf O3353190 103354O78 INT STITCHED 3713 1669 Odz4 Odz4 chirf O7377250 107377769 INT STITCHED 3734 7819 Pgm211 Pgm211 chirf O7436366 107436962 INT STITCHED 3735 7651 P4ha3 P4ha3 chirf O784.8144 10784913OINT STITCHED 3738 1234 Fam168a. Fam168a. chirf O9248288 10924.9058 INT STITCHED 3741 1466 Artl1. Art5 Art1 chirf 11392319 111392670 INT STITCHED 3744 5692 Trim34 Trim34 chirf 12551O3O 112551509 INT STITCHED 3746 7631 Fam160a2 Fam160a2 chirf 14351139 114351754 INT STITCHED 3748 3262 Rbmx2 Rbmx2 chirf 21782144 121782733 INT STITCHED 3780 2434 Calca Calca chirf 21883989 121884609 INT STITCHED 3.781 2465 Tnsc SC chirf 25723349 1257241.14 INT STITCHED 3793 5085 Tmc7 Tmcf. chirf 27040059 127042484 INT STITCHED 3801 448 Dcun1d3, Dcun1d3 Lyrm1 chirf 30354891. 130355503 INT STITCHED 3820 2662 Slc5a11 Slc5a11 chirf 33520840 133521647 INT STITCHED 3830 1910 Spns1 Spns1 chirf 33535744 133536602 INT STITCHED 3831 4899 Nfatc2ip Nfatc2ip chirf 34367327 134369311 INT STITCHED 3833 282 Sept1 Mylpf, -Sep Sept1 chirf 34659607 13466O269 INT STITCHED 3834 5484 1700008JO7Rik . 70000807Rik chirf 34.912208 134912952 INT STITCHED 3835, 1928 Orai3 Orai3 chirf 37395511 137408698 INT STITCHED 3841 529 Fgfr2 Fgfr2 Fgfr2 chirf 38004134 138005OO2INT STITCHED 3845 2881 Plekha1 Plekha.1 chirf 398,07718 1398,08551 INT STITCHED 3853 1520 Lhpp Lhpp chirf 40000477 140001636 INT STITCHED 3854. 2558 Fams3b Fams3b chirf 4604.1339 146048877 INT STITCHED 3.884 388 Ppp2r2d, Ppp2r2d Mapklip1 chirf 48026799 148027562 INT STITCHED 3894 2582 1190003J15Rik . 1900O31SRik chirf 481 24539 14812546OINT STITCHED 3895 6228 Athl1 BCO24386 Athl1 chirf 48144326 14814935OINT STITCHED 3896 269 Ifitm1, BCO24386, fitm2 fitm2 Ifitm3 chirf 48210613 148.212675 INT STITCHED 3898 1559 . Ifitm3 fitm3 chirf 49767035 149767709 INT STITCHED 3903 8496 H19 H19 chr8 3394398 34O1126 INT STITCHED 3919 466 Arhgefl8 Arhgef18 chr8 4341931 4370480 INT STITCHED 3923 630 CC25 CC25 chr8 8655610 8656423 INT STITCHED 3932 8493 Efnb2 Efnb2 chr8 10924754 10925468 INT STITCHED 3939 2611 3930402G23Rik . 393O4O2G23Rik chr8 11640471 11650624 INT STITCHED 3941 1100 Ankrd10 AnkrdlO AnkrdlO chr8 1248.1728 12482441 INT STITCHED 3946 1518 . Sox1 Sox1 chr8 12759924 12760464 INT STITCHED 3951 3.049 Atp11a Atp11a. chr8 12928814 1292.9349 INT STITCHED 3953 2295 . Mcf2 Mcf2l chr8 235.26226 23527098 INT STITCHED 3973 2380 Mrps31 Mrps31 chr8 23913976 23914613 INT STITCHED 3975 4178 Ap3m2 Ap3m2 chr8 24360828 24361492 INT STITCHED 3979 1685 Golga7 Golga7 chr8 24470989 2447 1980 INT STITCHED 3980 3038 . Sfrp1 Sfr1 chr8 24509604 24521644 INT STITCHED 3981 472 Sfr1 Sfrp1 Sfr1 chr8 25555859 25556317 INT STITCHED 3986 6273 1810011O1ORik . 1810011O1 ORik chr8 26623789 26624247 INT STITCHED 3990 2962 Fgfr1 Fgfr1 chr8 26863847 26864429 INT STITCHED 3992 4336 Dohd2 Ddhd2 chr8 3228.0929 32281569 INT STITCHED 4004 4366 Mak16 Mak16 chr8 41720049 41720645 INT STITCIIED 4039 2047 Mtmr7 Mtmrf chr8 420481.48 42048839 INT STITCHED 4041 1897 . Pdgfrl, Pdgfrl Frg1, Mtus1 chr8 42217833 42218697 INT STITCHED 4043 2284 Mtus1 Mtus1 chr8 44.2654OO 44266109 INT STITCHED 4044 408 Triml2 Triml2

US 2017/O 130247 A1 May 11, 2017 38

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr) 34919369 34920459 INT STITCHED 44.41 1254 St3gal4, St3gal4, 493.0581F22Rik 493 OS81E22Rik 493.0581F22Rik chr) 366O1807 3661 1238 INT STITCHED 4443 929 E.24 Ei24 chr) 37056O42 37056855 INT STITCHED 4448 5381 Slc37a2 Slc37a2 chr) 37292583 37293531 INT STITCHED 4449 1633 BCO24479 BCO24479 chr) 43537646 43538191 INT STITCHED 4476 1993 . Pyr1 Pvr1 chr) 44119256 4420060 INT STITCHED 4481 4619 Hinfp Hinfp chr) 44141293 44142111 INT STITCHED 4482 302 Dpagt1, H2afx H2afx chr) 45649159 45649717 INT STITCHED 4488 8454 Bace1 Bace1 chr) 47341278 47342102 INT STITCHED 4500 5543 Cadm1 Cadm1 chr) 48647618 48.648.098 INT STITCHED 4507 6925 Zbtb16 Zbtb16 chr) 48799482 488.00362 INT STITCHED 4509 1285 Usp28 Usp28 chr) 48862028 48862859 INT STITCHED 4511 5295 Zw 10 Zw10 chr) 49559299 4956O181 INT STITCHED 4519 1413 . Ncam1 Ncam1 chr) 496O4426 49604976 INT STITCHED 4520 2399 Ncam1 Ncam1 chr) 49743610 49744745 INT STITCHED 4521 1765 . Ncam1 Ncam1 chr) 51854804 51855366 INT STITCHED 4527 8453 Rdx, Gimé981 Rdx chr) 53618496 53619776 INT STITCHED 4531 781 Sc3Sf2 Sc3Sf2 chr) 54703743 54705821 INT STITCHED 4539 1908 Ireb2 Ireb2 chr) 55063210 55073663 INT STITCHED 4542 28S FbxO22 FbxO22 chr) 570 10741 57014562INT STITCHED 4548 2444 Commd4 Commd4 chr) 57358103 57371993 INT STITCHED 4549 395 Coxsa Cox5a chr) 57677536 57678655 INT STITCHED 4550 2513 Arid3b Arid3b chr) 60635345 60636054 INT STITCHED 4568 2794 Uaca Uaca chr) 61208965 61209654 INT STITCHED 4575 1542 . Te3 Te3 chr) 61233897 61234814 INT STITCHED 4576 4412 . Te3 Te3 chr) 62185724 62186999 INT STITCHED 4582 2018 Anp32a Amp32a chr) 62383015 623834.32 INT STITCHED 4585 7274 Coro2b Coro2b chr) 64988516 64989016 INT STITCHED 4595 8450 Igdec3 Igdcc3 chr) 65186892 65195417 INT STITCHED 4598. 1055 Polcd7 PCd7 chr) 65441290 65442044 INT STITCHED 4599 1266 Pif1 Pfl. chr) 67056190 67062471 INT STITCHED 4607 611 . Tpm1 Tpm1 chr) 70067253 70071655 INT STITCHED 4620 3207 . Myole Myole chr) 72505560 72506130 INT STITCHED 4635 4790 Nedd4 Nedd4 chr) 72627390 72628209 INT STITCHED 4636 2084 . Prtg Prtg chr) 72660060 72660935 INT STITCHED 4637 1839 Prtg Prtg chr) 7291845O 72.9193O8 INT STITCHED 4639 23.73 . Rab27a Rab27a chr) 75412496 75413425 INT STITCHED 4648 2748 Tmod3 Tmod3 chr) 77606686 776O765OINT STITCHED 4652 2346 Gclc Gclc chr) 77951768 77952528 INT STITCHED 4654. 8446 Ick, Fbxo9 FbXO9 chr) 78038277 7803923OINT STITCHED 4656 1191 Gstaa. Gstaa. chr) 882271 78 88228053 INT STITCHED 4678 2664 Ntise Ntise chr) 88377528 88378586 INT STITCHED 4680 2087 Syncrip Syncrip chr) 97O13591 97O15267 INT STITCHED 4696 251 Sc2Sa36 Slc25a36 chr) 00540961 100541608 INT STITCHED 4706 5773 Stag1 Nick1 Stag1 chr) O0934315 100.942688 INT STITCHED 4708 795 Pccb Pccb chr) O2592120 102593562 INT STITCHED 4714 366 . Amot2 Amot2 chr) O6069546 106074893 INT STITCHED 4726 870 Wor Wr32 chr) O7477688 107478518. INT STITCHED 4729 1235 Hyal2, Nat6, Hyal1 Hyal3, Hyal1 chr) O7496491 107497240 INT STITCHED 4730 3614 Ifrcl2, BYO8O835 BYO8O835 chr) O75291.04 107533461 INT STITCHED 4731 483 Gnai2 BYO80835, Gnai2 Sema3b, Gnai2 chr) O7579734 107580327 INT STITCHED 4732 8441 . BYO80835, Slc38a3 Sema3b, Sc38a3 chr) O81986.31 1082O8673 INT STITCHED 4735 642 Nicn1, Tcta, Tcta Rhoa, Amt chr) O8424057 108425078. INT STITCHED 4736 6827 Qrich1 Qrich1 chr) O8469174 10847 0010 INT STITCHED 4737 746 Ndufaf3, Ndufaf3 Dalrd3. MirA25, Mir191, Impdh2 chrg 108556747 108557725 INT STITCHED 4738 910 Arih2 Arih2 chrg 109774888 109775365 INT STITCHED 4742 6978 Colc25a Cde25a US 2017/O 130247 A1 May 11, 2017 39

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chrs 1105625.02 110563156 INT STITCHED 4745 5313 Nbeal2,C Ccdc12 codc12 chrg 110647812 110648457 INT STITCHED 4746 4458 Pth1r Pt1r chrg 110757426 110757966 INT STITCHED 4747 4546 Prss50 PrSSSO chrg 114493073 1144941.32 INT STITCFIED 4767 342 Trim71, Trim71 Ccr4, Cnot10 chrg 1145491OO 114549959 INT STITCHED 4769 1830 Cnot10 Cnot10 chrg 121898518 1218991.06 INT STITCHED 4805 5244 C85492 C85492 chr1O 4521315 4524373 INT STITCHED 4811 2104 Mtrf1 Mtrf1 chr1O 6056O28 6057101 INT STITCHED 4818 761 . Akap12 Akap12 chr1O 608.0029 6080762INT STITCHED 4819 917 Akap12 Akap12 chr1O 6096866 6104516 INT STITCHED 4820 441 . Akap12 Akap12 chr1O 8605895 8609821 INT STITCHED 4831 1322 Sashl Sashl chr10 1273.3586 12734303 INT STITCHED 4846 54.42 Sf3b5 Sc3b5 chr1O 14423031 14423523 INT STITCHED 4852 7521 Vita Vita chr10 18188738 18189638 INT STITCHED 4868 3221 Nhs.11 Nhsl1 chr10 19428463 1942.9212 INT STITCHED 4875 6208 II2Ora I2Ora chr10 19658492 1965.8963 INT STITCHED 4878 7066 Map3k.5 Map3k5 chr10 21018761. 21019274 INT STITCHED 4887 8431 Hbs11 Hbsl1 chr1O 21421393 21422103 INT STITCHED 4889 1253 Sgk1 Sgk chr1O 21565286 21568461 INT STITCHED 4892 286 . Sgk1 Sgk chr10 24310265 24311057 INT STITCHED 4897 2376 Ctgf Ctg chr10 24.5458,11 245552OOINT STITCHED 4901 526 Empp3 Emp3 chr10 24582734 24583438 INT STITCHED 4902 5640 Med23 Med23 chr10 3395OO12 33957066 INT STITCHED 4929 283 . Dse, Bet31 Dse chr10 39338030 39338814 INT STITCHED 4948, 1945 Traf3ip2 Traf3ip2 chr1O 394.55931 39456518 INT STITCHED 4951 3274 Rev31 Rev3 chr10 41186875 41187821 INT STITCHED 4960 960 Mical1 Mical1 chr10 41346832 41347467 INT STITCHED 4962 5177 . Co164 Cd164 chr10 431968.81 431981.35 INT STITCHED 4973 840 Bend Bend3 chr10 44792055 44792644 INT STITCHED 4991 4429 Prep Prep chr1O 56093669 5609637OINT STITCHED 5009 954 Ga1 Gjal chr10 5721 1848 57212434 INT STITCHED 5011 4297 Hsf2 HSf2 chr10 60837711 60845392INT STITCHED 5034 833 X99384 Nodal X99384 chr10 60878292 60880002 INT STITCHED 5035 307 . X99384 X99384 chr10 61695639 61705502INT STITCHED 5041 1103 Tspan 15 Tspan15 chr10 61976766 61977458 INT STITCHED 5043 4198 Srgn Srgn chr10 62699664 6270.0521 INT STITCHED 5046 2504. Herc4 Herc4 chr10 664.65837 66467256 INT STITCHED 5056 264 . Reep3 Reep3 chr10 69388191 693892.59 INT STITCHED 5073 2588 Ank3 Ank3 chr10 69557563 69558392 INT STITCHED 5074 1760 Cedco Ccdc.6 chr10 69895948 6989663.OINT STITCHED 5079 5291 . Fam13c Fam13c chr10 70808586 70809333 INT STITCHED 5083 4254 Cisd1, Ipmk Cisd1 chr10 75491765 75492542 INT STITCHED 5094 1070 Zip280b Z?p28Ob chr10 75902165 75902804 INT STITCHED 5096 1995 Pent Pont chr10 77065603 77066355 INT STITCHED 5102 2639 Sumo.3 Sumo3 chr10 77500956 77501452 INT STITCHED 5103 1526 Aire, Dnmt31 Dnmt3 chr10 79135588 79136419 INT STITCHED 5110 S72 G16517 G16517 chr10 806925.40 80693324 INT STITCHED 5115 2022 Zfr2, Atcay Atcay chr10 82323.925 82324687 INT STITCHED 5120 3996 Txnroll Txnrd chr10 828,06815 828.09352 INT STITCHED 5121 1327 Slc41a2 Slc41a2 chr10 84218273 84221859 INT STITCHED 5131 1622 Rifx4 Rfx4 chr10 84373.188 84374377 INT STITCHED 5135 1465 Ric8b Rfx4 Ric8b chr1O 86247737 86248402INT STITCHED 5145 3820 NtSdc3 NtSc3 chr10 87323661 87324242 INT STITCHED 5151 8413 Igf1 Igfl chr1O 9162576O 91626428 INT STITCHED 5166 4875 Rimst Rims chr10 92628272 9262.9369 INT STITCHED 517O 6413 Cak17, Mr1931 Mr1931 chr10 94877813 94878427 INT STITCHED 5186 1768 Socs2 Socs2 chr10 99484471 99485.189 INT STITCHED 5204 4932 Kitl Kitl chr10 102831472 102832261 INT STITCHED 5212 2315 Slc6a15 Slc6a15 chr10 114819854. 114820700 INT STITCHED 5234 8036 Zfc3h1, Zfc3h1 Thap2 che1O 116589007 116589893 INT STITCHED 5241 2171 Firs2 FrS2 chr1O 117140641 117141390 INT STITCHED 5243 84.06 Mdm2 Mdm2 chr10 11728.5236 11728.6671 INT STITCHED 5245 5005 Rap1b Rap1b chr10 121179350 121180065 INT STITCHED 5263 3080 BC04.8403 BCO4.8403 chr10 125407173 125407844 INT STITCHED 5274. 4971 Lrig3 Lrig3 chr10 126707459 126707875 INT STITCHED 5278 5620 Dctm2 Dctn2 chr10 127027476 127027943 INT STITCHED 5279 1610 Lrp1 Lrp1 US 2017/O 130247 A1 May 11, 2017 40

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr10 127531580 127532717 INT STITCHED 5281 1409 Baz2a Baz2a chr10 127806805 127807509 INT STITCHED 5284 2080 Coq10a Coq10a chr10 128068373 128069108 INT STITCHED 5286 2913. Rps26 Rps26 Cl 3189708 3193686 INT STITCHED 5290 SO2 Patz1 Drg1 Patz Cl 4543384 4544389 INT STITCHED 5303 4817 Ascc.2 Ascc.2 Cl 4605024. 4605900 INT STITCHED 5305 5202 Zmat5, Zmats Uqcr10 Cl 56025.09 5603082 INT STITCHED 5311 4735 Mrps24 Mrps24 Cl 5640939 5641684 INT STITCHED 5312 1772 Urgep Urgep Cl 18925903 18926553 INT STITCHED 5359 8097 . Meis1 Meis1 Cl 19817749 19818637 INT STITCHED 5365 2246 Spred2 Spred2 Cl 20094772 20095341 INT STITCHED 5371 5514 Rab1 Rab1 Cl 29922145 29922784 INT STITCHED 5413 6619 Emlé Emlé Cl 34215057 34215897 INT STITCHED 5430 3638 Fam196b Fam196b Cl 40550744. 4055.1400 INT STITCHED 5447 4727 Nudcd2, Nudcd2 Hmmr Cl 43338.652 43339165 INT STITCHED 5452 6258 Cenil Ccn Cl 43399014. 43399833 INT STITCHED 5453 3868 . Ccnil Ccn Cl 45869858 45873818 INT STITCHED 5465 1130 Adam19 Adam19 Cl 51451223 51451993 INT STITCHED 5482 848 RmindSb D930048N14Rik, Rmind5b N4bp3 Cl 54672906 54673698 INT STITCHED 5498 3249 Hint1, Lyrm7 Lyrm7 Cl 59324891 59325529 INT STITCHED 5512 5165 Zkscan17, ZkScan17 4933439C10Rik Cl 5962766O 59635.535 INT STITCHED 5513 2071 Flcn Flcn Cl 60236311 60237216 INT STITCHED 5515 6543 Atpaf2, 4933439F18Rik 4933439F18Rik Cl 605.90721 60591349 INT STITCHED 5516 3272 Top3a, Top3a Sincr8 Cl 6065O172 60651011 INT STITCHED 5517 3776 Dhrs7b Dhrs7b Cl 6O736253 60737190 INT STITCHED 5519 1008 . Map2k3, Gm16516 Gm16516 Cl 60994880 60995724. INT STITCHED 5521 3196 Usp22 Usp22 Cl 61391409 61392089 INT STITCHED 5527 78.49 Epn2 Epn2 Cl 62351245 6235.2154 INT STITCHED 5534 2339 Cenpv Cenpv Cl 62686433 62687279 INT STITCHED 5535 1997 Fam18b Fam18b Cl 62950261 6295,0815 INT STITCHED 5538 7452 Pmp22 Pmp22 Cl 69377638 69378461 INT STITCHED 5563 3113 Efnb3 Efnb3 Cl 69394651 69396118 INT STITCHED 5564 602 Trp53, Dnahc2, Trp53 Wrab53 Atp1b2 Cl 74534022 74534794 INT STITCHED 5581 3606 Pafah1b1 Pafah1b1 Cl 77486774 77494239 INT STITCHED 5595 593 Nufip2 Nufip2 Cl 78060890 78061883 INT STITCHED 5601 2175 Suptóh, Suptóh Sf2 Cl 7906677O 79067524 INT STITCHED 5610 3716 Wsb1 Wsb1 Cl 79945260 79951952 INT STITCHED 5618 259 111OOO2N22Rik . 1110002N22Rik Cl 799.90875 79991582 INT STITCHED 5619 1000 Rinfl35 Rnf13S Cl 80942931 80947,956 INT STITCHED 5623 1654 . Accin1 Accin1 Cl 82648.830 82649486 INT STITCHED 5636 8383 Rf Rf Cl 85052297 85054262 INT STITCHED 5646 4478 Appbp2, D630O32NO6Rik D63OO32NO6Rik Cl 86809492 86810234 INT STITCHED 5657 8055 Ypel2 Ypel2 Cl 87983937 8798.4484 INT STITCHED 5660 5125 Cuedc1 Cuedc1 Cl 88.006524 88O16104 INT STITCHED 5661 1190 Mrps23 Wezf1 Mrps23 Cl 887 18915 88724318. INT STITCHED 567O 1418 Akap1 Akap1 Cl 88829276 88830722 INT STITCHED 5672 515 Coil, Coil, 2210409E12Rik 2210409E12Rik 2210409E12Rik Cl 891.03298 891O3997 INT STITCFIED 5678 2377 . Nog Nog Cl 8985,7107 898.57767 INT STITCHED 5681 6616 Petp Pctp Cl 93.74946O 93.75003OINT STITCHED 5687 8380 Mbtd1, Mbtd1 Ut18 Cl 95172573 95174382 INT STITCHED 5689 796 Myst2 Myst2 Cl 95841647 95842789 INT STITCHED 5696 698 Igf2bp1 Igf2bp1 Cl 95878892 95879756 INT STITCHED 5697 1471 . Igf2bp1 Igf2bp1 Cl 95.975203 95.975,967 INT STITCHED 5698 5002 Calcoco2 s Calcoco2 Cl 9664.1262 96642250 INT STITCHED 5703 4098 Snx11 Snx11 Cl 96712000 96712739 INT STITCHED 5705 2650 Copz2, Copz2 Mr152 Cl 97053223 97053840 INT STITCHED 5706 5798 Kpnb1 Kpnb1 US 2017/O 130247 A1 May 11, 2017 41

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes Cl 97428279 9744O933 INT STITCHED 5709 449 Sircin1 Srcin1 Srcin1 Cl 97486767 97487595 INT STITCHED 5710 8378 E130012A19Rik . E130012A19Rik Cl 97638286 97638942 INT STITCHED 5713 2574 Snora21, 7OOOO1PO1Rik Rp123, 7OOOO1 PO1Rik Cl 98062871 98.06357OINT STITCHED 5715 2004 Codk12 Cdk12 Cl 98653131 98.654.005 INT STITCHED 5717 1984 Msl1 Msl1 Cl 98.787666 9878.8235 INT STITCHED 5718 1642 . Rara Rara Cl 99105144. 991.05721 INT STITCHED 5723 5192 Kirt222 Krt222 Cl 00394982 100396447 INT STITCHED 5727 3839 Acly Acly Cl 00477440 100478074 INT STITCHED 5728 4810 Dnajc7, Dinajc7 Nkiras2 Cl O11687.96 101169594 INT STITCHED 5732 5395 Becn1 Becn1 Cl O1188000 101188651 INT STITCHED 5733 56O1 Aoc2 AOc2 Cl O1409700 101.410324 INT STITCHED 5735 3018 Nbr1, Brca1 Brca1 Cl O1442692 101443650 INT STITCHED 5736 1150 Tmem106a Tmem106a Cl O2464622 1024651.76 INT STITCHED 5744 3982 Fzd2 FZd2 Cl O2680.774 102688178 INT STITCHED 5745 573 Gjc1 Gjc1 Cl O2973458 102974493 INT STITCHED 5747 3140 Hexim1 Hexim1 Cl O4461680 104.462667 INT STITCHED 5754 3299 . Itgb3 tgb3 Cl O6130335 106130757. INT STITCHED 5756 4137 Smarcd2 Smarcd2 Cl O7372036 107379089 INT STITCHED 5769 1159 . Helz Helz Cl O828.5157 108286105 INT STITCHED 5772 3390 CCdc46 Ccdc46 Cl O9212771 109222498 INT STITCHED 5778 498 Gna.13 Gna13 Cl 09337038 109341696 INT STITCHED 5779 2659 Slc16a6, Slcléa.6 Arsg Cl 10904378 110905140 INT STITCHED 5784 4199 . Kcnj2 Kcnj2 Cl 13570971 113572300 INT STITCHED 5798 646 CpsfA1 CpsfA1 Cl 16272666 116273421 INT STITCHED 5813 3629 Rnf157 Rnf157 Cl 16286146 116291508 INT STITCHED 5814 242 Fam100b, Fam100b, Fam100b Gm7367 Gm7367 Cl 16447490 1164483.78 INT STITCHED 5815 1674 Ube2O Rhbdf2, Ube?o Aanat Cl 17180985 117182314 INT STITCHED 5821 683 . Sept9 9-Sep Cl 17736789 11773.7509 INT STITCHED 5830 985 Tha1 Gm125 Tha1 81, Socs3 Cl 18335167 118335821 INT STITCHED 5833 3791 Engase Engase Cl 19256568 1192573.32 INT STITCHED 5838 3429 Mir1932 Mr1932 Cl 19798926 119799586 INT STITCHED 5842 2594 Baiap2 Baiap2 Cl 20009574 120010144 INT STITCHED 5843 5089 Slc38a10 Slc38a10 Cl 200927.62 120093212 INT STITCHED 5845 1725 Bahcc1 Bahcc1 Cl 20638338 120641722 INT STITCHED 5848 8042 Rfng, Gps1 Dcxr, Rac3, Rfng Lrrca5, Stra13, Cd7 Cl 20687588 120688365 INT STITCHED 5850 2331 Fasn Fasn chr12 3889637 3890,588 INT STITCHED 5856 1043 Dnimt3a Dnmt3a chr12 850471O 8505271 INT STITCHED 5867 3636 Rhob Rhob chr12 8947875 894.9112 INT STITCHED 5868 1831 Matn3 Matn3 chr12 12826439 12835796 INT STITCHED 5877 291 . Mycn Mycn chr12 12882O69 12897292 INT STITCHED 5878 704 . Mycn Mycn chr12 12916267 1291717OINT STITCHED 5879 619 Mycn Mycn chr12 15821175 15834231 INT STITCHED 5886 3248 Trib2 Trib2 chr12 21424419 21425048 INT STITCHED 5899 1820 Ywhaq Ywhaq chr12 30624985 3.0625635 INT STITCHED 592O 7091 Pxdin Pxdn chr12 33509420 33509920 INT STITCHED 5930 7423 Nampt Nampt chr12 34111976 34112967 INT STITCHED 5937 1469 Twistmb Twistnb chr12 3573O495 35731263 INT STITCHED 5940 6653 Snx13 Snx13 chr12 39505938 39511410 INT STITCHED 5946 1065 Etv1 Ety1 chr12 4O7241.79 40725294 INT STITCHED 5951 685 . Arl4a Arl4a chr12 53192336 53192990 INT STITCHED 5981 4952 Nubpl Nubpl chr12 56574.184 56574829 INT STITCHED 5999 1512 . Nfkbia Nfkbia chr12 57621203 57622656 INT STITCHED 6005 3963 . Nkx2-9, NkX2-9 NkX2-1 chr12 70265691 70266182 INT STITCHED 6029 4819 Rps29, Ppils Ppils chr12 70831225 70831963 INT STITCHED 6030 6534 L2hgdh, Atp5s Atp5s chr12 73766180 73766775 INT STITCHED 6045 4739 Dhrs7 Dhs 7 chr12 741864.48 74187579 INT STITCHED 6050 440 . Six4 Stx4 chr12 76765083 76765824 INT STITCHED 6059 4579 Wor89 Wr39 chr12 78065948 78.066766 INT STITCHED 6068 1575 Max Max US 2017/O 130247 A1 May 11, 2017 42

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr12 80129129 8O136928 INT STITCHED 6073 1017 Plekhh 1 Plekhhl chr12 81212072 81218304 INT STITCHED 6080 510 Zip3611 2310015A10Rik Zfp3611 chr12 82022144 82022886 INT STITCHED 6087 1355 1700052122Rik 17OOOS2122Rik chr12 85349930 85350505 INT STITCHED 6099 3435 Acot1. Acot1 chr12 85395735 85396.391 INT STITCHED 6101 7236 Acot3 Acot3 chr12 85542980 85.552631 INT STITCHED 6102 547 . C130O3901.6Rik C130O3901.6Rik chr12 8595.9743 85966609 INT STITCHED 6104 2233 Abcd4 Abcd4 chr12 85981457 85981955 INT STITCHED 6105 3978 7420416PO9Rik . 742O416PO9Rik chr12 8696.1855 86962511 INT STITCHED 6108 1400 Jap2 Jolp2 chr12 88272118 88274195 INT STITCHED 6119 243 . 2310044G17Rik 2310044G17Rik chr12 92821598 92822554 INT STITCHED 6125 6873 Gitf2a1 Gtf2a1 chr12 101575664 101576434 INT STITCHED 6140 2502 . Calm 1 Calm 1 chr12 101958.877 101959439 INT STITCHED 6144 8355 Rpséka5 Rps6ka5 chr12 102234372 10223.5188 INT STITCHED 6147 1248 Ccdc88c, Ccdc88c Mir1190 chr12 10225.6843 102257522INT STITCHED 6148 4400 . Ccdc88c Ccdc88c chr12 103997082 103997697 INT STITCHED 6153 1726 LOC100233175, LOC10O2.33175 Ubrf chr12 106462467 106463587 INT STITCHED 6165. 1527 Tcl1 Tcl1 chr12 10725O243 107250762INT STITCHED 6169 6539 Vrk1 Wrk1 chr12 111940531. 111941453 INT STITCHED 6190 295 Hsp90aa1 Hsp90aa1 chr12 112075298. 11207624OINT STITCHED 6192 273 Stk30 Stk30 chr12 112908378 11291.4863 INT STITCHED 6195 705 Ckb, Ckb Trimtö1a. chr12 117645942 117646660 INT STITCHED 6200 8345 Ncapg2, Ncapg2 D43002OJO2Rik chr12 120476393 120476813 INT STITCHED 6215 7174 Itgb8 tgb8 chr13 17791823 17792717 INT STITCHED 6256 6230 5033411D12Rik, 2810O21BO7Rik 281 0021BO7Rik chr13 19720204 1972.1138 INT STITCHED 6261 3029 Sfrp4 Sfrp4 chr13 20132694 20133451 INT STITCHED 6264 3726 . Elmo1 Elmo1 chr13 20275154 20275936 INT STITCHED 6267 1192 Elmo 1 Txndc3 Elmo1 chr13 21819716 21820326 INT STITCHED 6274 3648 Hist1 habm Hist1.h.2bm chr13 21840858 21841391 INT STITCHED 6275 4915 Hist1.h3bn, Hist1 ha-k Hist1 h2ak, Hist1.h4k chr13 2 1877978 21883093INT STITCHED 6276 351 Hist1.h3bp, Hist1 h2kbp Hist1 h2an, Hist1h1b chr13 23583742 23.584526 INT STITCHED 6279 3060 Btn 2a2 Btn2a2 chr13 23678018 2368797OINT STITCHED 6280 344. Hist1.h4d Hist1 had chr13 2383,3718 23834547 INT STITCHED 6281 391. Hist1h1c, Hist1.h3c Hist1.h3c, Hist1.h.2bb chr13 24929573 24930214 INT STITCHED 6284 6023 Tap2, Tolp2 Acot13 chr13 25366681 2.5367257 INT STITCHED 6287 3396 Nrsn1 Nrsn1 chr13 34130311 34131047 INT STITCHED 6307 6483 Bph Bph 1 chr13 34201201 34202268. INT STITCHED 6308 434 . Tubb2b Tubb2b chr13 34222647 34226305 INT STITCHED 6309 622 Tubb2b Tubb2a Tubb2b chr13 358298.99 35831775 INT STITCHED 6317 1240 Cdy Colyl chr13 40959133 40964971 INT STITCHED 6339 3.19 GcInt2 GcInt2 chr13 41 012552 41 013182 INT STITCHED 6340 1683 Gcint2 GcInt2 chr13 43709828 43710545 INT STITCHED 6355 5257 Rinfl82 Rnf182 chr13 4480O270 448O1323 INT STITCHED 6366 461 Jarid2 arid2 chr13 44817274 44824484 INT STITCHED 6367 581 Jarid2 arid2 chr13 492768O1 49277722 INT STITCHED 6386 2201 Nin1 Ninj1 chr13 51940931 51942277 INT STITCHED 6396 523 Gadd45g Gadd45g chr13 52015693 52O16983 INT STITCHED 6398 674 Gadd45g Gadd45g chr13 52347463 52348.186 INT STITCHED 6401 2330 Gadd45g, Diras2 Diras2 chr13 533299.91 5333O847 INT STITCHED 6409 1992 . Ror Ror? chr13 54162877 54163531 INT STITCHED 6412 6319. Sfxn1 Sfxn1 chr13 548.84318 54885163 INT STITCHED 6417 3296 Tspan17 Tspan17 chr13 56244196 56245355 INT STITCHED 6422 854 . H2afy H2afy chr13 56713179 56714077 INT STITCHED 6427 2610 Tgfbi Tgfbi chr13 59690911 59691486 INT STITCHED 644O 7090 Naa35 Naa35 chr13 60272270 60272721 INT STITCHED 6442 5894 Gas1 Gas1 chr13 607 28970 60729626 INT STITCHED 6443 2400 Dapk1 Dapk1

US 2017/O 130247 A1 May 11, 2017 44

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr14 121905233 121906137 INT STITCHED 7018 716 Sc15a1 Slc15a1 chr15 7065153 7068694 INT STITCHED 7046 435 . Lifr Lifr chr15 9064661 9065183 INT STITCHED 7054 1757 Skp2, Tmbrd2 Skp2 chr15 10615155 10617848 INT STITCHED 7060 687 . Rai14 Rai14 chr15 10640824 10653875 INT STITCHED 7061 331 Rai14 Rai14 Rai14 chr15 10906708 10907428 INT STITCHED 7063 6029 Amacr Amacr chr15 12045435 12046215 INT STITCHED 7072 2830 Zfr Zfr chr15 12133202 12134066 INT STITCHED 7073 1153 Mtmr12 Mtmr12 chr15 122431.17 12252382 INT STITCHED 7075 4108 Golph3 Golph3 chr15 18748.130 18748.646 INT STITCHED 7092 3781 Coh10 Cdh10 chr15 28000620 28001311 INT STITCHED 7111 2701 . Trio Trio chr15 36941214 36941629 INT STITCHED 7133 3228 Zip706 Z?p706 chr15 37989429 380O8159 INT STITCHED 7140 718 Ubris Ubris Ubris chr15 53173777 53175470 INT STITCHED 7170 2943 Ext Ext1 chr15 54746572 54747189 INT STITCHED 7173 7556 Empp2 Enpp2 chr15 55095920 55100307 INT STITCHED 7175 571 . Depdc6 Depdc6 chr15 56573900 56575358 INT STITCHED 7180 1521 . Has2as, Has2 Has2 chr15 58052823 58053270 INT STITCHED 7184 4922 Fbxo32 FbXO32 chr15 58244484 58245397 INT STITCHED 7186 2649 D15Erto621e D15Ertd621e chr15 58901790 5891.5123 INT STITCHED 7187 323 MtSS1 MtSS1 chr15 61810188 61811058 INT STITCHED 7200 3145 Myc Myc chr15 61841217 61844889 INT STITCHED 7201 648 Myc Myc chr15 68752O23 68752528 INT STITCHED 7219 7222 Khdrbs3 Khdrbs3 chr15. 73013516 73O14184INT STITCHED 7224 1136 Eif2c2 Kcnk9 Eif2c2 chr15 75708006 75732455 INT STITCHED 7240 354 Naprt1 Ze3.h3, Pycrl Naprt1 chr15 76909283 7690.9985 INT STITCHED 7245 7260 . Rbm9 Rbm9 chr15 77678011 77678785 INT STITCHED 7249 2418 Myh9 Myh9 chr15 78726841 78727700 INT STITCHED 7252 6124 Sh3bp1 Sh3bp1 chr15 79278426 7927.9059 INT STITCHED 7254 5952 Csink1e CSnkle chr15 81530381 81531523 INT STITCHED 7267 1732 Chad Chad chr15 81560234 81560989 INT STITCHED 7268 1988 Rangap1 Rangap1 chr15 8205O108 82060103 INT STITCHED 7269 965 Tnfrsfl3c Tnfrsfl3c chr15 84O16374 84020249 INT STITCHED 7275 763 SammSO SammSO chr15 856683O8 85668848 INT STITCHED 7280 8064 Ttc38 Ttc38 chr15 8.5696253 8.5696964 INT STITCHED 7281 1767 Gitsel GtSe1 chr15 893O8072 893O8194 INT STITCHED 7289 3434 Arsa. Arsa chr15 908863O2 9088.7037 INT STITCHED 7294 2964 Kif21aa Kif21a chr15 93227987 93228712 INT STITCHED 7300 3529 Pphln1, Zcrb1 Pphln1 chr15 95618532 95618978 INT STITCHED 7305 8246 AnoG Anoo chr15 96009647 96010368 INT STITCHED 7308 3414 Gm4371 Gm4371 chr15 96532896 96533663 INT STITCHED 7314 2240 Slc38a2 Sc38a2 chr15 97.659779 9766O356 INT STITCHED 7319 1887 Holac7 Hdac7 chr15 997.02760 997O5639 INT STITCHED 7327 425 Lima.1 Lima1 chr15 998.71865 998.72478. INT STITCHED 7329 5280 Dip2b Dip2b chr15 100304382 1003051.76 INT STITCHED 7332 1241 Letmd1 Letmd1 chr15 101107344 101108241 INT STITCHED 7336 865 943 0023L20Rik 943 0023L2ORik 943 0023L2ORik chr15 102044350 102045008 INT STITCHED 7337 1330 Zip740 Z?p740 chr15 102073324 102074272 INT STITCHED 7338 421 Rarg Itgb7, Rarg Mfsd5, Rarg chr16 3849599 3850539 INT STITCHED 7344. 4390 Zip174 Z?p174 chr16 5252339 5252888 INT STITCHED 7349 4879 Fam86 Fam86 chr16 11260700 11261.095 INT STITCHED 7374 5533 Gspt1, Mr1945 Mr1945 chr16 13785.839 13786611 INT STITCHED 7389 2293 Rrn3 Rrn3 chr16 155876OO 1558,8338 INT STITCHED 7394 5959 Ube2v2 Ube?v2 chr16 16302093 16302827 INT STITCHED 7399 3684 Yars2 YarS2 chr16 16825494. 16826238 INT STITCHED 7401 4532 Spagó Spagó chr16 17928.738 1792.9621 INT STITCHED 7404 2773 Slc25a1 Dgcr14, Sle25a1 Gsc2 chr16 18242932 18243570 INT STITCHED 7405 4927 Ranbp1, Ranbp1 Trimt2a chr16 21995.128 21996.129 INT ST1TCHED 7422 1389 Liph Liph chr16 220 15572 22016286 INT STITCHED 7423 3870 Senp2 Senp2 chr16 221397 12 22161378 INT STITCHED 7425 750 Igf2bp2 Igf2bp2 chr16 22441980 22442886 INT STITCHED 7428 4560 Etvs EtyS chr16 24443296 24444.024 INT STITCHED 7439 4206 Lipp Lipp chr16 3.0062920 30.063517 INT STITCHED 7457 379 Hes1 4632428C04Rik Hes1 chr16 30280722 30286306 INT STITCHED 7459 799 Lirc15 Lirrc15 chr16 33178393 33179116 INT STITCHED 7474. 2437 Osbpl11 Osbpl11 US 2017/O 130247 A1 May 11, 2017 45

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr16 33246666 332471.68 INT STITCHED 7475 3771 Snx4 Snx4 chr16 33379458 3338.0156 INT STITCHED 7476 2939 Zip148 Zfp148 chr16 34692.943 34693565 INT STITCHED 7487 7029 CCdc14 Ccdc14 chr16 35966441 35966967 INT STITCHED 7498 4502 . Parp9, Dtx31 chr16 360694O6 36070168 INT STITCHED 7499 4226 Fam162a, Fam162a Ccdc58 chr16 36821606 36822164 INT STITCHED 7504 5866 Iqcb1, Eaf2 Eaf2 chr16 38558387 38558980 INT STITCHED 7510 553 Timen39a Tmem39a chr16 45726028 45726467 INT STITCHED 7533 5076 Taglin3 Taglin3 chr16 465OO728 465O1425 INT STITCHED 7535 2532 Pvr13 Pwr3 chr16 48283498 4828425 INT STITCHED 7537 4304 Dppa4 Dppa4 chr16 483O3474 48311051 INT STITCHED 7538 424 Dppa2 Dppa2 chr16 49861220 49862107 INT STITCHED 7545 61.87 Co47 Cd47 chr16 59643496 5964.4072 INT STITCHED 7559 6185 Arlé Arlé chr16 64766788 64767304 INT STITCHED 7563 3850 4930453N24Rik . 4930453N24Rik chr16 78573987 78574492 INT STITCHED 7592 6130 D16Erto.472e D16Ertd472e chr16 84.825OOO 84.825796 INT STITCHED 7598 6321 LOC654.426 LOC654426 chr16 90283267 90283824 INT STITCHED 7605 4518 Srsf15 SrsfS chr16 91379084 91379709 INT STITCHED 7614 5871. Ifnar2 fnar2 chr16 91480485 91.481228 INT STITCHED 7616 3701 Ifna.1 fnar1 chr16 93.826367 93.827291 INT STITCHED 7625 1443 Morc3 Morc3 chr16 96304699 96305588 INT STITCFIED 7631 500 Brwd1 Brwd1 chr17 3551086 3551760 INT STITCHED 7642 6038 Tfb1m Tfb1m chr17 5407856 5417145 INT STITCHED 7655 1717 Lohal6b Ldhal6b chr17 54903.01 5490994 INT STITCHED 7656 6367 Zdhhc14 Zdhhc14 chr17 8568963 8569332 INT STITCHED 7669 7541 T2 T2 chr17 10527703 10528580 INT STITCHED 7679 1195 . Qk Qk chr17 10693553 10694165 INT STITCIIED 7683 2446 . Qk Qk chr17 14336522 14337258 INT STITCHED 7693 3352 Dact2 Dact2 chr17 1482995O 14831243 INT STITCHED 7696 2466 Thbs2 ThbS2 chr17 15326610 15327365 INT STITCHED 7697 2593 . D1 D1 chr17 153.84508 15385475 INT STITCHED 7699 733 . D1 D1 chr17 15513799 15519743 INT STITCHED 7700 994 DI1 D1 D1 chr17 17749211 17749858 INT STITCHED 7709 5200 . Lnpep Lnpep chr17 23739864. 23740605 INT STITCHED 7716 2010 Zip13, Zscan10 Zscan10 chr17 23780427 23780792 INT STITCHED 7718 6220 Mmp25 Mmp25 chr17 23864226 238.64239 INT STITCHED 7720 8089 Pkmyt1 Pkmyt1 chr17 23926672 23927280 INT STITCHED 7721 3.044 Flywch2 Flywch2 chr17 24270829 24277012 INT STITCHED 7722 3300 Papk1 Pdpk1 chr17 24339885 24340488 INT STITCHED 7723 2003 Ntn3 Ntn3 chr17 25891931 25892657 INT STITCHED 7727 8210 . Narf Narf chr17 27638391 276393.20 INT STITCHED 7735 2037 Grm4 Grima. chr17 28317168. 28317281 INT STITCHED 774O 8206 Zfb523 Zip523 chr17 2.9144582 29145786 INT STITCHED 7744 843 Stk38 Stk38 chr17 29631024 29631722 INT STITCHED 7754 2521 Pim1 Pim1 chr17 31433655 31445349 INT STITCHED 7763 393 Sc37a.1 Sc37 a1 Slc37a1 chr17 32330938 32331519 INT STITCHED 7769 6586 Ephr3 Ephx3 chr17 33493.774 33494196 INT STITCHED 7774 7483 Z?p81 Z?p81 chr17 34029161 34029875 INT STITCHED 7777 2571 Kifc1 Kifc1 chr17 35029917 35030731 INT STITCHED 7781 4836 Ehmt2, Zbtb12 Zbtb12 chr17 35087242 35087957 INT STITCHED 7782 3133 Snord52, SnordS2 1110038B12Rik chr17 35201745 35214088 INT STITCHED 7783 489 Ly6g6c, G6b, Dolah2, Ly6.g6c G6b, Dolah2, AUO23871, Ly6g6d Ly6g6d chr17 36969445 36981968 INT STITCHED 7790 811 Trim26 Trim26 chr17 4301.0751 43011486 INT STITCHED 7803 7133 Cd2ap Cd2ap chr17 457.43747 45744742 INT STITCHED 7814 249 Gm732S Gm7325, Gm732S chr17 457.43747 45767400 INT STITCHED 7815 Gm7325 Gm732S chr17 46134937 46135825 INT STITCHED 7817 2549 . Vegfa Vegfa chr17 46929748 46930535 INT STITCHED 7819 1600 . Rp1711 Rp1711 chr17 47702974 47709143 INT STITCHED 7823 233 . Ccnd3 Ccnd3 chr17 4787.6080 47877026 INT STITCHED 7825 668 Firs3 Firs3 chr17 479251.79 47925764 INT STITCHED 7826 3923 Tcfeb Taal US 2017/O 130247 A1 May 11, 2017 46

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr17 48013535 48035.748 INT STITCHED 7828 415 . Foxp4 Foxp4 chr17 48112639 48113467 INT STITCHED 783O 1618 . Foxp4 Foxp4 chr17 53690666 53701713 INT STITCHED 7849 516 Kat?b Kat?b chr17 56437658 56438221 INT STITCHED 7852 8198 Arrdc5, Arrdc5 Urf1 chr17 56972466 56973551 INT STITCHED 7853 1996 Rfx2 Rfx2 chr17 57008637 5700.9589 INT STITCHED 7854. 1483 1700061 G19Rik 17OOO61G19Rik chr17 66305463 66305913 INT STITCHED 7870 4767 Twsg1 Twsg1 chr17 66453333 66453853 INT STITCHED 7871 4415 Nclufv2 Ndufv2 chr17 68041107 68061143 INT STITCHED 7877 468 Lama1 Lama1 chr17 68353194 68354191 INT STITCHED 7878 2835 Arhgap28 Arhgap28 chr17 74927O87 74927097 INT STITCHED 7897 8063 Birc6 Birc6 chr17 75.956321 75.956983 INT STITCHED 7905 5048 Fam98a. Fam8a. chr17 79313190 793 13799 INT STITCHED 7912 1560 1110001A16Rik . 1110001A16Rik chr17 81128692 81129358 INT STITCHED 7931 6152 Map4k3 Map4k3 chr17 84582744 84583323 INT STITCHED 7948 1781 Zip3612 Zfp3612 Zfp3612 chr17 865357OO 865484.95 INT STITCHED 7964. 2602 Srbd1 Srbd1 chr17 8793.0591 87931316 INT STITCHED 7969 1832 . Calm2 Calm2 chr17 88.076955 88077637 INT STITCHED 7971 5992 Msh2 Msh? chr17 88372701 88373685 INT STITCHED 7972 900 Mshé Mshes chr18 33854O2 3385870 INT STITCHED 7995 5889 CuI2 Cu2 chr18 5163169 5163726 INT STITCHED 8001 1733 . Z?p438 Z?p438 chr18 6516515 6517140 INT STITCHED 8006 5074 Epc1 Epc1 chr18 9214813 92.16976 INT STITCHED 8012 81.86 Fzd8 FZ8 chr18 12002313 12002823 INT STITCHED 8021 6352 Cables1, Mir1901 Mir1901 chr18 123973.12 12398585 INT STITCHED 8024. 2342 Npc1 Npc1 chr18 12797614 12797946 INT STITCHED 8029 7046 Ttc39c Ttc39c. chr18 1483.2586 1483.5577 INT STITCHED 8038 2705 SS18 SS18 chr18 20899101 20900111 INT STITCHED 8051 1742 B4galté B4galté chr18 21101897 21102870 INT STITCHED 8053 899 Rifl2S Rifl2S chr18 24151920 24152681 INT STITCHED 8065 3525 Zip35 Zip35 chr18 33621690 33622128 INT STITCHED 8099 7581 DOH4S114 DOH4S114 chr18 34109932 34110743 INT STITCHED 8104 1197 . Epb4.114a Epb4.114a chr18 34.157592 341.58493 INT STITCHED 8106 1702 . Epb4.114a Epb4.114a chr18 3470.1440 34715289 INT STITCHED 8109 1027 Wint8a. Wnt3a Wnt3a chr18 34884236 34884615 INT STITCHED 81.11 4424 Gfra3 Gfra chr18 3493.7293 34938906 INT STITCHED 8112 1343 Kidm3b Kdm3b chr18 35982138 3598.3371 INT STITCHED 8122 717 CXXCS Cxxc5 Cxxc5 chr18 36924103 36924650 INT STITCHED 8127 4027 WarS5, Dnd 1 Dnd1 chr18 37892.904 37893442 INT STITCHED 8.128 5966 Pedhgb5, Pcdhgb5 Pcdhga9 chr18 37911821 37912317 INT STITCHED 8129 7509 Pedhga11 Pcdhga11 chr18 37931149 3793.1649 INT STITCHED 8130 2988 Pedhga12 Pcdhgc3 Pcdhga12 chr18 38000290 38000857 INT STITCHED 8.131 3762 . Pcdhgc3, Pcdhgc5 Pcdhgc5, Pcodhga12, Pcdhgb8 chr18 38092039 38092926 INT STITCHED 8132 1852 Diap1 Diap1 chr18 42432273 42432907 INT STITCHED 8152 8174 Rbm27, Rbm27 Gm4013 chr18 44538369 44547262 INT STITCHED 816O 8173 Dep2 Dcp2 chr18 47518O17 47518632 INT STITCHED 8170 3236 . Sema6a Sema6a chr18 47652330 47656137 INT STITCHED 8172 1172 . Sema6a Sema6a chr18 49992787 49.993.420 INT STITCHED 8179 8172 Dmix11 Dmxl1 chr18 53570940 53571612 INT STITCHED 8184. 4918 Ppic Ppic chr18 58016521 58042474 INT STITCHED 8201 1921 Slc12q2 Slc12q2 chr18 61194.178 611964.05 INT STITCHED 8207 1611 . Sc6a7 Slc6a7 chr18 65575946 65587598 INT STITCHED 8222 1816 Malt1. Malt1 chr18 65737262 65738119 INT STITCHED 8223 2710 Zip532 Zip532 chr18 66620651 66.621456 INT STITCHED 8233 2103 Pmaip1 Pmaip1 chr18 74223279 74224086 INT STITCHED 8255 2698 Mapk4 Mapk4 chr18 777.99062 77800652 INT STITCHED 8272 446 Rnf165 Rnf16S chr18 77947287 77947954 INT STITCHED 8273 4453 8030462N17Rik, . 8O3O462N17Rik 4930465K10Rik chr18 79302745 79303389 INT STITCHED 8281 2887 Setbp1 Setbp1 chr18 80345132 80345668 INT STITCHED 8285 4859 Adnp2 Adnp2 chr18 83085081 83O85678 INT STITCHED 8304 5214 Zip516, Zfp516 4930592IO3Rik US 2017/O 130247 A1 May 11, 2017 47

TABLE S2C-continued Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest Cl Start end TEID TE rank 4 kb of TSS) interactions) genes chr18 851O1655 85104293 INT STITCHED 8309 369 FbXO15 FbXO15 chr19 3578494 3579148 INT STITCHED 8314 5749 Ppp6r3 Ppp6r3 chr19 4809961 4810938 INT STITCHED 8317 1591 Rbm14 Rbm4b Rbm14 chr19 4849752 4850334 INT STITCHED 8318 2176 Ctsf CtSf chr19 5362176 5362769 INT STITCHED 8321 5559 Eiflad, Banf1 Banfil chr19 5806301 5807102INT STITCHED 8323 567 Malat1 Neat1 Malat1 chr19 6008924 6009669 INT STITCHED 8325 2365 Capn1 Capn1 chr19 7556157 7556974 INT STITCHED 8330 7656 Rtn3 Rtn3 chr19 10750948. 10751863 INT STITCHED 8339 5044 Pga5 Pga5 chr19 10909944 10910853 INT STITCHED 8342 3107 Slc15a3 Slc15a3 chr19 1189765O 11898585 INT STITCHED 8348 4958 Stx3 Stx3 chr19 23226190 23228627 INT STITCHED 8387 340 Mir1.192 KIf Mir1.192 chr19 24625504 24626811 INT STITCHED 8395 5338 Pip5k1b Pip5klb chr19 300 19855 3.0020504 INT STITCHED 8420 4142 II33 133 chr19 30103.198. 30104145 INT STITCHED 8422 8079 Uhrf2 Uhrf2 chr19 34270171 34270617 INT STITCHED 8437 4678 Stambpl1 Stambpl1 chr19 3811.0109 3.8124425 INT STITCHED 8451 1141 Myof, Myof Cep55 chr19 40335693 40347774 INT STITCHED 8457 575 Pin1 Pdlim1 chr19 41340652 41341549 INT STITCHED 8462 2243 Tmssf3 Tm9sf3 chr19 41924.877 41928,541 INT STITCHED 8467 4990 Frat2 Frat chr19 4221016O 42211146 INT STITCHED 8468 381 . Avpil Avpil chr19 44336251 44340919 INT STITCHED 8482 397 . Scd2 Sco2 chr19 45431191 45431778 INT STITCHED 8492 4657 Btrc Btrc chr19 46111427 46112378 INT STITCHED 8495 1345 Lob1 Ldb1, Ldb1 Nolc1 chr19 46147545 46148245 INT STITCHED 8496 2738 Nolc1 Nolc1 chr19 46571486 46579213 INT STITCHED 8497 398 Trim8 Trim8 chr19 47062698 4706342OINT STITCHED 8499 2624 . Cnnm2, NtSc2 NtSc2, Ina chr19 47524932 47525874 INT STITCHED 8502 923 . Sh3pxd2a Sh3pxd2a chr19 55.810476 55811196 INT STITCHED 8528 5265 Tcf712 Tcf712 Tcf712 chr19 57440147 574.41699 INT STITCHED 8534 1163 Fam16Ob1 Fam16Ob1 chrX 7341901 7342556 INT STITCHED 8551 1800 Tcfe3 Tcfe3 chrX 7504022 7504775 INT STITCHED 8553 2401 Eras Eras chrX 7783981 7797322 INT STITCHED 8558 902 Porcin Porcin chrX 1158506O 11585603 INT STITCHED 8566 6900 . Bcor Bcor chrX 11650689 11656529 INT STITCHED 8568 2596 Bcor Bcor chrX 12336063 12336542 INT STITCHED 8571 8138 Med14 Med14. chrX 12623919 12624670 INT STITCHED 8574 1251 . Usp9x Usp9x chrX 17148568 17149028 INT STITCHED 8579 2968 Fundc1 Fundc1 chrX 20450222 20452884 INT STITCHED 8589 1414 Timp1 Timp1 chrX 2063.9136 2064O111 INT STITCHED 8591 4023 Zifo182 Zfp182 chrX 34345348 34357788 INT STITCHED 85.99 417 Sc2SaS Sc2SaS chrX 45948994 45949828 INT STITCHED 8615 5413 Z?p280c Z?p280c chrX 49037620 49038464 INT STITCHED 8621 4650 Hs6st2 HS6St2 chrX 49516745. 4952.1331 INT STITCHED 8623 1198 Gpc4 Gpc4 chrX 49542203 4954.2936 INT STITCHED 8624 1909 . Gpc4 Gpc4 chrX 684.72013 68472518 INT STITCHED 8648 607.6 Mtm1 Mtm1 chrX 68812416 68812867 INT STITCHED 8650 6998 Hmgb3 Hmgb3 chrX 71062069 71066856 INT STITCHED 8654 1325 Pozd4 Pdzd4 chrX 74758128 74771144 INT STITCHED 8657 2215 Tbl1x Tb1x chrX 834.32509 83439416 INT STITCHED 8673 699 Nrob1 NOb1 chrX 90909066 90918.687 INT STITCHED 8678 2200 Peyt1b Pcyt1b chrX 98279269 98.28O110 INT STITCHED 8695 2327 Sc7a3 Sc7a3 chrX 98832291 98.835492 INT STITCFIED 8698 888 Ogt Ogt chrX 10.0521875 100522311 INT STITCHED 8704 81.32 Cox4 Cdx4 chrX 100633203 100633921 INT STITCHED 8705 545 Tsix Tsix chrX 10.1176973 101183244 INT STITCHED 8708 520 Rim Rlim chrX 10.1391779 101.392412 INT STITCHED 8709 4316 C77370 C77370 chrX 1031199.17 103120479 INT STITCHED 8713 7677 Atrx Atrix chrX 103207250 103.207835 INT STITCHED 8714 3325 Cox7b, Magt1 Magt1 chrX 106210257 106210674 INT STITCHED 8718 6403 Hmgn5 Hmgn5 chrX 131251894 131252554 INT STITCHED 8736 2782 Armcx1 Armcx1 chrX 133571771. 133572514 INT STITCHED 8739 4136 Mcartó Mcarté chrX 136317111 136317651 INT STITCHED 8742 2358 Ripply1 Ripply1 chrX 136990746 136995388 INT STITCHED 8745 2167 Prps1 Prps1 US 2017/O 130247 A1 May 11, 2017 48

TABLE S2C-continued

Typical enhancer to gene assignment proximal interacting active genes active genes (TE within (SMC1 PET nearest chr Start end TEID TE rank 4 kb of TSS) interactions) genes chrX 146912148 14691.2772 INT STITCHED 8754 11.99 Tmem29 Tmem29 chrX 147484225 147485001 INT STITCHED 8756 2685 Fgd1 Fgd1 chrX 15O159561 15O160090 INT STITCHED 8766 2210013O21Rik chrX 151776277 151776643 INT STITCHED 8767 7880 Prox4 Prdx4 chrX 15393O097 15393.0709 INT STITCHED 8769 4541 Sms Sms chrX 154034237 154039879 INT STITCHED 8771 1270 Mbtps2 Mbtps2 chrX 1542S3329 154253706 INT STITCHED 8773 7680 Kh134 Kh134 chrX 159193849 159194432 INT STITCHED 8781 2630 Rbbp7 chrX 159269864 1592.70795 INT STITCHED 8782 1741 Txing Txing

TABLE S3A Overlap with previously defined domain structures or interactions Overlap (at Overlap (at Sets of SMC1 least 80% least 80% high-confidence reciprocal) Out of ii reciprocal) (8. ions (X) Testing Query (Y) Metric (counts) interactions (percentage) al SMC1 CLA-PET Reciproca 1251 23739 59% intrachromosomal DeMare et al. OW(8) interac ions (limb bud) CTCF CTCF CTCF CIA-PET Reciproca 6770 2008O 34% Interac ions PETS Handoko et OW(8) al. SD boundary CTCF CIA-PET Reciproca 65 197 33% interactions PETS Handoko et OW(8) al. SD boundary TDTAD Dixon et Reciproca 197 3% interacti OS al. OW(8) SD boundary TDTAD Filippova Reciproca 16 197 8% interacti OS et al. OW(8) SD boundary SMC1 CLA-PET Reciproca 16 197 8% interacti OS DeMare et al., OW(8) (limb bud) PD boundary CTCF CIA-PET Reciproca 115 349 33% interactions PETS Handoko et OW(8) al. PD boundary TDTAD Dixon et Reciproca 13 349 4% interacti OS al. OW(8) PD boundary TDTAD Filippova Reciproca 30 349 9% interacti OS (8. OW(8) PD boundary SMC1 CLA-PET Reciproca 29 349 8% interac ions DeMare et al., OW(8) (limb bud) PoII CLA-PET Reciproca 873 2921 30% interactions OW(8) Kieffer-Kwon et al. PoII CLA-PET Reciproca 2921 82% PETs Kieffer-Kwon overlap et al. PoII CLA-PET Reciproca 511 2700 19% interactions OW(8) Kieffer-Kwon et al. PoII CLA-PET Reciproca 1969 2700 7396 PETs Kieffer-Kwon overlap et al. PoII CLA-PET Reciproca 264 818 32% interactions OW(8) Kieffer-Kwon et al. PoII CLA-PET Reciproca 649 818 79% PETs Kieffer-Kwon overlap et al. US 2017/O 130247 A1 May 11, 2017 49

TABLE S3A-continued Overlap with previously defined domain structures or interactions Sets of SMC1 Interactions Interactions high-confidence Crossing Out of ii Crossing Y interactions (X) Testing Query (Y) Metric Y (counts) interactions (percentage) all TDFTAD % X 1,354 23,739 6% intrachromosomal boundaries Dixon spanning interactions et al. Y all LAD boundaries % X 1,180 23,739 59 intrachromosomal Meuleman et al. spanning interactions Y all LOCK boundaries 96 X 1,053 23,739 4% intrachromosomal Wen et al. spanning interactions Y

TABLE S3B NPC SC interactions that are overlapped with SMC1 interactions Thick Thick Item Block Block Block Chr Start End ID Score Strand Start End Rgb Count Sizes Starts chr17 35876754 36OO4126 chr17: 999 -- 35876754 36004126 O, O, O 2 1213, 0. 35876754- 732O 1200S2 36004126 chr16 91081692 91.315585 chr16: 999 -- 91081692 91315585 0, 0, 0 2 8121, 0. 91081692- 6123 227770 91.315585 chr6 122605776 1226.704O1 chr6: 999 -- 122605776 122670401 O, O, O 2 8322, 0. 1226OS776- 3902 6O723 1226704O1 chr16 91079632 91353O41 chr16: 999 -- 91079632 91353,041 O, O, O 2 2059, 0. 91079632- 2468 270941 913S3O41 chr17 353.09.283 35369792 chr17: 999 -- 35309283 35369792 O, O, O 2 12375, 0. 35309283- 17869 42640 35369792 chr17 3S300042 35369792 chr17: 999 -- 353OOO42 35369792 O, O, O 2 7232, 0. 3S3OOO42- 17869 S1881 35369792 chr17 35712435 35863083 chr17 999 -- 35712435 35863083 O, O, O 2 3047, 0. 35712435- 2345 1483O3 35863O83 chr17 3577O148 35873306 chr17: 999 -- 3577O148 35873306 0, 0, 0 2 3903, 0. 3577O148- 971.6 93442 35873.306 chr16 91079632 9136SO59 chr16: 999 -- 91079632 91365,059 O, O, O 2 2059, 0. 91079632- 7174 278253 9136SO59 chr17 35782.938 35876753 chr17: 999 -- 35782.938 35876753 O, O, O 2 5265, 0. 35782938- 2748 91067 35876753 chr16 911594O7 91.315585 chr16: 999 -- 911594O7 91315585 0, 0, 0 2 2410, 0. 911594O7- 6123 150055 91.315585

TABLE S4A TABLE S4A-continued Super-enhancer Domains Super-enhancer Domains

Chr Start End Chr Start End

chr4 137282987 137358228 chr1 137O67918 137266493 chr4 141 062149 141133641 chr4 154502843 154625262 chr9 121.156481 121284.570 chr1 3696O319 37190152 chr3 34536922 34661110 chr13 98OO4422 98.29.1871 chr12 11169.1772 11183 0585 chr14 64117842 64155310 chr19 5817540 S852134 chr6 14242SSS4 1424684.51 chr12 11169.1772 111729264 chr1 72810841 72902355 chrf 3096.5803 30984969 chr19 5792497 S886724 chr1 1691SO382 16923736S chirf 873OO923 87362292

US 2017/O 130247 A1 May 11, 2017 51

TABLE S4A-continued TABLE S4A-continued Super-enhancer Domains Super-enhancer Domains

Chr Start End Chr Start End

chrf 31224743 312S8292 chr11 97505763 976O1227 chr1 1828O3710 1828.32479 chr) S804.8791 S81301.91 chr3 12922.2980 129325895 chr2 162873366 16291 6341 chr6 1226.18790 122670811 chr11 98813654 988.28524 chr1 120667669 12098.1204 chr1O 44047591 44482O37 chr16 8685,741 8795721 chr1 13041583 13125339 chr11 69467645 69529968 chr12 8813436S 883.16531 chrf 1197871.92 119837366 chr8 87162269 87234127 chr4 1268324O2 126893.727 chr1 12098.9689 121341353 chr11 11693.2982 11703964.8 chirf 14022374.1 140326676 chr14 652443SO 65372358 chr4 1408.12787 14O920O32 chr11 69494.914 69529968 chr12 88211553 88248387 chr18 61707865 61826914 chr17 37130666 3722246S chr17 100861.76 10938877 chr11 8887662 903,7343 chrs 12002112S 12O139928 chr1 138669873 13890 1249 chr4 1474S1592 147468949 chrs 11888.2848 119045654 chr3 9S436247 9S482321

TABLE S4B Super-enhancers and their associated Super-enhancer Domains Chr Start End SE Rank in SDS chirf 3193004 3218183 NT D 3467 1 TRUE chr3 34633687 3466O7OS NT ST ED 1482 2 TRUE chr12 87839385 87846.192 NT STITCHED 6113 3 TRUE chr) 11084.9422 110863371 NT STITCHED 4748 4 TRUE chr3 37602064 37613850 INT STITCHED 4033 6 TRUE chr4 140826O72 14084O922 NT STITCHED 2291 8 TRUE chr3 9S4SSO34 95468269 NT STITCHED 1626 9 TRUE chr2 154242651 154254374 INT STITCHED 1210 1O TRUE chr17 371102O2 37134996 NT 7792 12 TRUE chr6 122290093 122293O17 NT STITCHED 3342 13 TRUE chr) 782O7143 782234.42 NT 4657 14 TRUE chr8 91514.813 91S4O176 NT 4179 15 TRUE chr19 S835881 S847014 INT D 8324 16 TRUE chr17 66818723 66836409 NT 7876 17 TRUE chr2 162877048 162893.236 NT D 1257 18 TRUE chr) S8119837 S8128504 INT 4555 19 TRUE chr6 64.961359 649851.61 NT 3120 2O TRUE chirf 1S2O36872 152050716 NT D 3914 21 TRUE chr12 87807046 8782O319 NT 6112 22 TRUE chr4 118743867 118745786 NT 2152 23 TRUE chr11 97517673 975.24159 NT D 5711 24 TRUE chr2 205746O2 205.91747 NT 746 25 TRUE chr17 711773O2 71179956 NT D. 7886 26 TRUE chr14 76894682 76915946 NT 6904 27 TRUE chr17 31939569 31956756 NT 7767 29 TRUE chr1 1304.9615 13094765 NT 45 3O TRUE chr2 320O8891 32O3O736 NT 812 31 TRUE chr6 12264O118 226,57871 NT D 3348 32 TRUE chr3 34544904 3455.3511 NT 1480 34 TRUE chrS 118884660 18896412 NT 2745 35 TRUE chr8 37642S21 37671979 NT D 4034 36 TRUE chr12 1279.0432 1279.5881 NT 5875 38 TRUE chr4 14112O768 41126477 NT 2292 39 TRUE chr6 12271.4316 2272O862 NT 3349 40 TRUE chrS 123584659 23590728 NT 2770 41 TRUE chr17 71.213804 71222433 NT D. 7887 42 TRUE chr19 23139991 231701.89 NT D 8385 43 TRUE chr3 1352O8956 352.10744 INT D 1749 44 TRUE chr12 12933,791 12950936 NT 5880 45 TRUE chr17 71096.763 711 OO905 NT D. 7884 46 TRUE chr11 117833.701 17838253 NT D 5831 47 TRUE chr11 33427175 33451476 NT D 5427 48 TRUE Cl 3703.9139 370454.11 NT 108 49 TRUE chirf 1403O4156 40307245 NT ED 3856 51 TRUE chr4 141616653 41627603 NT 2295 52 TRUE chr16 84769.173 84780686 NT ED 7597 53 TRUE Cl 1828S4S21 828.64307 NT ED 611 S4 TRUE US 2017/O 130247 A1 May 11, 2017 52

TABLE S4B-continued Super-enhancers and their associated Super-enhancer Domains Chr Start End SE Rank in SDS chr1O 795.08474 79515168 INT STITCHED 5111 55 TRUE chr2 33282O29 3330O860 INT STITCHED 817 56 TRUE chr11 116943O2S 11695.3583 INT STITCHED 5819 57 TRUE chr11 77697704 77718786 INT ST 5597 58 TRUE Cl 1828.18684 1828 19554 INT ST D 610 59 TRUE chr11 8466451 8486876 INT ST 5325 61 TRUE chr14 64118817 64131901 INT ST 6859 62 TRUE chr14 77O15215 77O3O315 INT ST D 6906 63 TRUE chr14 71022659 71035930 INT ST 6887 66 TRUE chr11 12357626 12370205 INT ST D 5340 67 TRUE chr1O 21700576 21708946 INT ST 4893 68 TRUE chr3 9638O383 96382115 INT ST 1629 69 TRUE chr11 69517O60 695228O3 INT ST D 5565 7O TRUE chrS 53933177 53947327 INT ST 2510 71 TRUE chirf 147131117 147136231 INT ST D 3890 72 TRUE Cl 120971968 120973737 INT ST D 372 73 TRUE chr11 66824791 66838230 INT ST 5555 74 TRUE chr3 133181431 133197648 INT ST D 1744 75 TRUE chr19 25553498 25564092 INT ST 8399 77 TRUE chr18 61787544 617884.00 INT ST 8209 78 TRUE chr8 87174O72 87174643 INT ST D 4163 79 TRUE chr17 37209046 37217726 INT ST 7794 8O TRUE chr13 96295094 96306119 INT ST D 6544 81 TRUE chr19 232O7455 23208806 INT ST 8386 82 TRUE chr18 4O467587 40468.140 INT ST 8148 83 TRUE chrS 65.255735 65256794 INT ST D 2535 84 TRUE chr15 77168852 77187251 INT ST 7248 85 TRUE chr11 8848.1360 884.91812 INT ST D 5666 86 TRUE chirf 3.1248315 31250619 INT ST 3525 87 TRUE chr12 SS4O7498 554.15046 INT ST 5995 89 TRUE chr12 88239069 88245155 INT ST D 6118 90 TRUE chr18 7552O332 75527 277 INT ST 8261 91 TRUE chr11 9015537 9017663 INT ST D 5331 92 TRUE Cl 137071028 137096.284 INT ST D 449 93 TRUE chr6 393955.71 393.96779 INT ST 3045 94 TRUE chr2 168589688 16861717O INT ST D 1300 95 TRUE Cl 120538712 120545414 INT ST 368 96 TRUE Cl 168054897 168073079 INT ST 556 97 TRUE Cl 72839563 72858.199 INT ST D 237 99 TRUE chr4 985.07649 98514709 INT ST 2076 OO TRUE chr16 230993.73 23103471 INT ST D 7434 O1 TRUE chirf 7109.2246 71102481 INT ST 3601 O2 TRUE chr19 21858770 21866770 INT ST 8378 O3 TRUE chr2 S2002668 52003777 INT ST D 1196 O4 TRUE chr4 373294.36 37357766 INT ST 2268 O8 TRUE chrS 20O29649 20037063 INT ST D 2752 09 TRUE chr14 O62964.86 O6304433 INT ST D 6982 10 TRUE chr4 41721916 41726166 INT ST 2297 11 TRUE chr3 29.247.012 29261362 INT ST D 1732 12 TRUE chr) 14458126 14474355 INT ST 4766 13 TRUE chr17 10549089 10570838 INT ST 7680 14 TRUE chr2 71488O13 71494.617 INT ST D 928 15 TRUE chr3 96479158 96484.864 INT ST 1630 16 TRUE chr12 1172592O 11743677 INT ST D 6188 17 TRUE chr14 22293688 223O8989 INT ST 6709 18 TRUE chr6 25383,335 25398O24 INT ST 3360 19 TRUE chr2 6598.1373 65983444 INT ST D 1279 21 TRUE Cl 83948.212 83961841 INT ST 615 23 TRUE chr6 9164O161 91661247 INT ST D 3217 24 TRUE chr8 124994.68 12504771 INT ST 3947 25 TRUE chr11 O2190649 O21936.92 INT ST 5741 26 TRUE chr6 67O61148 67O64202 INT ST 3130 27 TRUE chr1O 85.002060 85006553 INT ST 5140 29 TRUE chr8 87996475 87997654 INT ST 4167 3O TRUE chr8 933S1924 933.55292 INT ST D 4190 31 TRUE chr1O 7S33S464 75345568 INT ST 5091 32 TRUE chr12 111709296 111710794. INT ST D 61.87 33 TRUE chr1O 66546199 665.64235 INT ST 5059 34 TRUE chr1O 21546SO2 215496.91 INT ST 4891 35 TRUE chirf 38812914 3881.6123 INT ST D 3550 36 TRUE Cl 121295,085 121296.031 INT ST 376 37 TRUE chirf 863SS826 86368339 INT STITCHED 3652 38 TRUE chr17 26631721 26648689 INT STITCHED 7728 39 TRUE US 2017/O 130247 A1 May 11, 2017 53

TABLE S4B-continued Super-enhancers and their associated Super-enhancer Domains Chr Start End SE Rank in SDS chr14 1062SO319 10626O753 INT STITCHED 6981 40 TRUE chr17 47640414 47649043 INT STITCHED 7822 41 TRUE chrS 1354.17523 135421698 INT STITCHED 2830 42 TRUE chr4 12S211671 125223450 INT ST 2192 43 TRUE chr11 107296669 107310982 INT ST D 5768 44 TRUE chr14 SS704349 557.05463 INT ST 6815 45 TRUE chr2 162856904 1628.60933 INT ST 1256 46 TRUE chr8 74.834685 7484.0663 INT ST D 4116 47 TRUE chr15 971986OS 97227633 INT ST 7317 48 TRUE chr18 36412873 36414154 INT ST D 8124 49 TRUE chr) S638.2386 56395769 INT ST 4546 51. TRUE Cl 36111164 36118698 INT ST 101 52 TRUE chrS 116845,764 116860853 INT ST D 2736 53 TRUE chirf 30982397 30983.339 INT ST 3523 S4 TRUE Cl 138841643 138850970 INT ST D 466 56 TRUE chr17 29587776 29588.942 INT ST D 7752 57 TRUE Cl 138586629 138593131 INT ST 464 58 TRUE chr19 S3523440 53535319 INT ST D 8519 59 TRUE chr8 444.05736 44406755 INT ST 4046 61 TRUE chirf 4772296 4777612 INT ST 3475 62 TRUE chr6 838.39914 83844315 INT ST D 3184 63 TRUE chr1O 6638O3S1 66383761 INT ST 5054 64 TRUE chr12 565873.47 566O7146 INT ST D 6000 65 TRUE chirf 11983.1735 1198.35688 INT ST 3765 66 TRUE chr18 352O2713 352O3454 INT ST 8114 69 TRUE Cl 1212O1424 1212O2481 INT ST D 374 7O TRUE chrS 118.951444 11896O269 INT ST 2746 71 TRUE chirf 135993.34 1360O325 INT ST D 3481 72 TRUE chr8 3SO23426 35O27483 INT ST 4014 73 TRUE chr3 9641461 9655131 INT ST 1392 74 TRUE chr11 66733372 667.46990 INT ST D 5553 77 TRUE chr2 152552277 152563676 INT ST 1198 78 TRUE Cl 341301.07 34134640 INT ST D 88 79 TRUE chr1O S942O365 59437537 INT ST D 5021 81 TRUE chr13 982O2400 98225162 INT ST 6559 82 TRUE chr1O 754.00370 754O1358 INT ST D 5092 83 TRUE chr11 S4767341 547858.32 INT ST 5499 84 TRUE chr12 128101.77 12811020 INT ST 5876 85 TRUE chr17 3563921.1 35642435 INT ST D 7784 86 TRUE chr1O 76655655 766.62360 INT ST 5100 87 TRUE chr1O 4411 O139 44112766 INT ST D 4981 88 TRUE Cl 7226OS28 72261272 INT ST 230 89 TRUE Cl 1692O1106 1692.20423 INT ST 559 90 TRUE chr14 49273113 49283200 INT ST D 6789 91 TRUE chr4 126875757 126879027 INT ST 2205 92 TRUE chr3 8837S442 88380O83 INT ST D 1607 93 TRUE chr11 98823S11 98826466 INT ST D 5719 94 TRUE chr4 147459254 147463850 INT ST 2317 96 TRUE chr12 10394O487 103953 004 INT ST D 6151 97 TRUE chr18 75504155 75505202 INT ST 8260 98 TRUE chr17 292.09618 29218426 INT ST 7747 2OO TRUE chirf S2806853 52814768 INT ST D 3568 201 TRUE chr17 71.241991 71250610 INT ST 7888 2O3 TRUE chr16 8758173 8779472 INT ST D 7359 204 TRUE chr13 110418702 110442750 INT ST 6615 206 TRUE chr1O 208021.31 20830236 INT ST 4885 207 TRUE chr18 38538325 38551037 INT ST D 8136 208 TRUE chirf 8733342O 87345334 INT ST 3662 209 TRUE chr4 15456,3584 154564383 INT ST D 2355 21 O TRUE chr) 1212445O1 121254102 INT ST 4802 21.3 TRUE Cl 36O701.90 36074608 INT ST 100 215 TRUE chr2 30913257 30925299 INT ST 803 216 TRUE chr13 98OS2S62 98062842 INT ST 6557 217 TRUE chr11 52173182 52184686 INT ST 5484 218 TRUE chr14 65251303 6526.9514 INT ST D 6864 219 TRUE chr4 154537213 154538.078 INT ST 2354 22O TRUE chr12 111655417 111656,705 INT ST D 6186 221 TRUE chr6 142458.188 142461905 INT ST 3429 222 TRUE chr2 71575856 71583914 INT ST 931 223 TRUE chr6 39370384 39371286 INT ST D 3044 225 TRUE chr12 57385208 57400114 INT ST 6004 226 TRUE chr15 103349226 103353500 INT STITCHED 7343 227 TRUE chr4 138OOOSS4 1380.06368 INT STITCHED 2273 228 TRUE