bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1

2

3 L1 and B1 repeats blueprint the spatial organization of chromatin

4

5 J. Yuyang Lu1#, Lei Chang2#, Tong Li1#, Ting Wang1,3, Yafei Yin1,3, Ge Zhan1,3, Ke Zhang1,3,

6 Michelle Percharde4, Liang Wang1, Qi Peng1, Pixi Yan1, Hui Zhang1, Xue Han1, Xianju Bi1, Wen

7 Shao1, Yantao Hong1, Zhongyang Wu1, Peizhe Wang1, Wenzhi Li1, Jing Zhang1, Zai Chang1,

8 Yingping Hou2, Pilong Li1, Miguel Ramalho-Santos5, Jie Na1, Wei Xie1, Yujie Sun2*, Xiaohua

9 Shen1*

10

11 1Tsinghua Center for Life Sciences, School of Medicine and School of Life Sciences, Tsinghua

12 University, Beijing 100084, China

13 2 State Key Laboratory of Membrane Biology, School of Life Sciences, and Biomedical

14 Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China

15 3 These authors contributed equally to this work.

16 4 MRC LMS & Imperial College London, London W12 0NN, UK

17 5 Lunenfeld-Tanenbaum Research Institute, University of Toronto, Ontario M5T 3H7, Canada

18 # These authors share the co-first authorship.

19

20 * Leading correspondence should be addressed to X.S. ([email protected]).

21 Correspondence may also be addressed to Y.S. ([email protected]).

22

23

24

25 Keywords: , L1 repeats, B1/Alu repeats, higher-order chromatin structure 1

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1

2

3 SUMMARY

4 Despite extensive mapping of three-dimensional (3D) chromatin structures, the basic

5 principles underlying folding remain unknown. Here, we report a fundamental role

6 for L1 and B1 retrotransposons in shaping the macroscopic 3D genome structure. Homotypic

7 clustering of B1 and L1 repeats in the nuclear interior or at the nuclear and nucleolar

8 peripheries, respectively, segregates the genome into mutually exclusive nuclear

9 compartments. This spatial segregation of L1 and B1 is conserved in mouse and human cells,

10 and occurs dynamically during establishment of the 3D chromatin structure in early

11 embryogenesis and the cell cycle. Depletion of L1 transcripts drastically disrupts the spatial

12 distributions of L1- and B1-rich compartments. L1 transcripts are strongly associated with

13 L1 DNA sequences and induce phase separation of the heterochromatin protein HP1α. Our

14 results suggest that genomic repeats act as the blueprint of chromatin macrostructure, thus

15 explaining the conserved higher-order structure of chromatin across mammalian cells.

16

17

2

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 INTRODUCTION

2 The mammalian genome is folded nearly one million-fold within the nucleus of a cell

3 (Annunziato, 2008). Hierarchical organization of genome folding at different scales is central to

4 gene expression, cellular function and development, and alterations in genome organization often

5 lead to disease (Dekker et al., 2017; Zane et al., 2014). Microscopic and 3C-based approaches

6 reveal that chromatin forms a "fractal globule" with compartments, domains, and loop interactions

7 (Boettiger et al., 2016; Dekker et al., 2002; Mirny, 2011; Wang et al., 2016). At the megabase scale,

8 chromatin is subdivided into two spatially segregated compartments, arbitrarily labeled as A and B,

9 with distinct transcriptional activity (Lieberman-Aiden et al., 2009). The A and B compartments

10 also differ in other features, including gene density, CpG frequency, modifications, and

11 DNA replication timing (Bonev and Cavalli, 2016; Dekker et al., 2013; Gibcus and Dekker, 2013;

12 Rivera-Mulia and Gilbert, 2016). Simulation of three-dimensional (3D) genome structures

13 predicted that the euchromatic A compartment adopts a central position, whereas the

14 heterochromatic B compartment moves towards the nuclear periphery and perinucleolar regions

15 (Buchwalter et al., 2019; Stevens et al., 2017). Within compartments at the kilobase-to-megabase

16 scale, chromatin is organized in topologically associated domains (TADs), which serve as

17 functional platforms for physical interactions between co-regulated genes and regulatory elements

18 (Dixon et al., 2012). At a finer scale, TADs are divided into smaller loop domains, in which distal

19 regulatory elements such as enhancers come into direct contact with their target genes via

20 chromatin loops (Kadauke and Blobel, 2009).

21 The organization of the nuclear genome into euchromatin and heterochromatin, which closely

22 match the A and B compartments, respectively, appears to be conserved from ciliates to humans

23 and has been maintained in eukaryotes over 500 million years of evolution (Solovei et al., 2016).

24 Intriguingly, most A/B compartments and TADs seem to be invariant in different mouse and human

25 cell types or during cell-fate transition, whereas sub-TAD loops are more variable to facilitate

26 differential gene expression (Dixon et al., 2015; Stadhouders et al., 2018). The insulator protein 3

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 CTCF (CCCTC-binding factor) and the ring-shaped cohesin complex, considered as the weavers

2 of genome organization, co-localize at the boundaries and anchor regions of contact domains and

3 loops (Rao et al., 2014; Splinter et al., 2006). However, depletion of CTCF or cohesin disrupts

4 some sub-TAD loop contacts, but fails to affect the formation and maintenance of A/B

5 compartments in cells (Bintu et al., 2018; Gassler et al., 2017; Haarhuis et al., 2017; Nora et al.,

6 2017; Nuebler et al., 2018; Rao et al., 2017; Schwarzer et al., 2017; Wutz et al., 2017). The direct

7 molecular drivers, other than CTCF and cohesin, that govern compartmental domains are yet to be

8 defined. A few mechanisms have been proposed for the formation of nuclear compartments, such

9 as anchoring heterochromatin to the nuclear lamina (Falk et al., 2019; Guelen et al., 2008; van

10 Steensel and Belmont, 2017; Wijchers et al., 2015; Zullo et al., 2012), preferential attraction of

11 chromatin harboring similar histone modifications and regulators (Jost et al., 2014; Strom et al.,

12 2017; Wang et al., 2016), and hypothetical models involving pairing of homologous sequences

13 mediated by active transcription or phase separation (Cook, 2002; Cook and Marenduzzo, 2018;

14 Rowley et al., 2017; Tang, 2017; van de Werken et al., 2017). A recent study based on imaging and

15 polymer simulations proposes that interactions between heterochromatic regions are crucial for

16 compartmentalization of the genome in inverted and conventional nuclei (Falk et al., 2019).

17 However, the molecular basis that drives heterochromatin compartmentalization and segregation

18 from active domain remains unclear (Bouwman and de Laat, 2015; Pueschel et al., 2016).

19 Repetitive elements comprise about half of human and mouse genomes (Biemont, 2010).

20 Once regarded as genomic parasites (Orgel and Crick, 1980), retrotransposons have been recently

21 implicated in playing active roles in re-wiring the genome and gene expression programs in diverse

22 biological processes (Bourque et al., 2008; Chuong et al., 2016; Durruthy-Durruthy et al., 2016;

23 Grow et al., 2015; Liu et al., 2018; Lynch et al., 2011; Rebollo et al., 2012; Wang et al., 2014).

24 Repetitive DNA is speculated to have a role in nuclear organization (Falk et al., 2019; Solovei et

25 al., 2016; Tang, 2017). Considering the diverse nature and widespread distribution of repeats in

26 mammalian genomes (Sundaram and Wang, 2018), it is imperative to determine whether and how 4

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 repetitive sequences are involved in this process. Long and short interspersed nuclear elements

2 (LINEs and SINEs, respectively) are the two predominant subfamilies of retrotransposons in most

3 mammals (Mandal and Kazazian, 2008). L1 (also named as LINE1 or LINE-1) is the most abundant

4 subclass of LINEs, and also constitutes the most abundant type of all repeat subclasses, making up

5 to 18.8% and 16.9% (0.9~1.0 million copies) of the genome in mouse and human, respectively

6 (Taylor et al., 2013). B1 in mouse and its closely related, primate-specific Alu elements in human

7 are the most abundant subclass of SINEs, constituting 2.7%~10.6% (0.6~1.3 million copies) of

8 mouse and human genomes (Lander et al., 2001; Mouse Genome Sequencing et al., 2002). L1 and

9 B1/Alu have distinct nucleotide compositions and sequence lengths. While Alu elements are ~300

10 bp long and rich in G and C nucleotides, L1 elements are 6–7 kb long and AT-rich (Jurka et al.,

11 2004). A previous study of metaphase chromosome banding showed roughly inverse distributions

12 of L1 and Alu elements in chromosomal regions with distinct biochemical properties (Korenberg

13 and Rykowski, 1988). Further studies reported that Alu/B1 elements are preferentially enriched in

14 regions that are generally gene-rich, whereas L1 elements are enriched in gene-poor regions, which

15 tend to be located in lamina-associated heterochromatin (Deininger, 2011; Meuleman et al., 2013;

16 Wijchers et al., 2015). However, systematic mapping of L1 and B1 elements in the genome and the

17 nuclear space is still lacking.

18 Here, we report that L1 and B1/Alu repeats harbor the basic structural information in their

19 DNA sequences to instruct the 3D organization of mammalian genomes. Co-segregation of B1/Alu

20 and L1 repeats with the A and B compartments, respectively, is highly conserved in different mouse

21 and human cell types. B1/Alu-rich regions cluster in the interior of the nucleus, while L1-rich

22 regions cluster at the nuclear and nucleolar periphery. This pattern of homotypic clustering and

23 segregation is dynamically established in the cell cycle and in early embryogenesis. L1 RNA is

24 critically required for the formation and maintenance of compartment segregation and the higher-

25 order chromatin structure. Most likely, L1 RNA transacts the structural information in L1 DNA

26 repeats by acting upon its own DNA sequences to promote phase separation of HP1, thereby 5

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 inducing heterochromatin formation. Our findings uncover a fundamental principle of 3D genome

2 organization that is rooted in genomic DNA sequences, and also provide an answer to the

3 perplexing phenomenon of the conservation and robustness of chromatin compartmentalization

4 across mammalian species.

5

6 RESULTS

7 Homotypic clustering of B1 and L1 repeats segregates the A/B compartments

8 We analyzed the genomic positions of the major repeat subfamilies and observed positive

9 correlations within the L1 and SINE B1 subfamilies, but strong inverse correlations between them

10 (Figure S1A). This observation suggests that L1 and SINE B1 elements tend to be positioned away

11 from each other in the genome, while repeats from the same subfamily tend to be clustered. The

12 non-random positioning of repeat sequences in the genome prompted us to examine their relative

13 distributions in the A/B compartments. Analysis of Hi-C data in mouse embryonic stem cells

14 (mESCs) (Fraser et al., 2015) showed that dense L1 and B1 repeats appear to reside in distinct and

15 mutually exclusive compartments across the mouse genome, whereas other types of

16 retrotransposons such as ERV1, ERVK, and L2 tend to be randomly distributed (Figures 1A and

17 S1B). The compartments marked by B1 repeats show enrichment of active histone marks

18 (H3K4me3, H3K9ac, H3K27ac, H3K36me3), strong binding of RNA polymerase II (Pol II), and

19 high levels of chromatin accessibility and transcription activity. In contrast, the compartments

20 marked by L1 repeats show signatures of heterochromatin, including enrichment of the repressive

21 H3K9me2 and H3K9me3 marks, and strong binding of heterochromatin proteins such as HP1

22 and the nuclear corepressor KRAB-associated protein-1 (KAP1 or TRIM28) (Figures 1A and S1B).

23 Quantitative sequence analysis of annotated A/B compartments revealed high levels of SINE

24 repeats (including B1, B2 and B4) in the A compartments and L1 repeats in the B compartments,

25 whereas repeats in other subclasses showed no specific enrichment (Figures S1C, D). This

26 opposing distribution pattern of B1/Alu and L1 in the A and B compartments was also detected in 6

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 five other cell types (Figure 1B): human embryonic stem cells (hESCs), lymphoblastoid cells

2 (GM12878) and erythroleukemic cells (K562), and mouse neural progenitor cells (NPCs) and

3 neurons (Fraser et al., 2015; Rao et al., 2014). Furthermore, unsupervised clustering revealed that

4 the genomic positions of A/B compartments in six distinct mouse and human cell types are highly

5 similar to each other, with an average Spearman correlation coefficient >0.73 within species

6 and >0.52 between species (Figure S1E). Compared to other subclass repeats, L1 and B1 are most

7 strongly related to the high-order chromatin structures, and their distributions appear to be

8 conserved in homologous regions of the mouse and human genomes (Figure S1E). For example, a

9 region in mouse chromosome 2 (chr2: 120-140 Mb) and its syntenic region in human chromosome

10 20 (chr20: 6-50 Mb) show similar patterns of Hi-C contact probabilities, and gene and repeat

11 compositions and distributions in the corresponding A and B compartments along the DNA

12 sequences (Figure 1C). Thus, co-segregation of B1/Alu and L1 repeats with the A and B

13 compartments appears to be invariable in different cell types in mouse and human.

14 We then took mouse chromosome 17 (chr17, 95 Mb in length) as an example to overlay L1

15 and B1 repeat features on the Hi-C interaction matrix of mESCs (Figures 1D-F). Interestingly, the

16 plaid pattern of enriched and depleted interaction blocks in the Hi-C map is largely correlated with

17 the compositions and distributions of B1 and L1 along the whole chr17 (Figure 1D). In a 42-Mb

18 region of chr17, four L1-enriched B compartments (denoted by c, e, g and i) and three B1-enriched

19 A compartments (denoted by D, F and H) are alternately positioned along the linear DNA sequence

20 (Figure 1E). Strong interactions were observed between L1-enriched compartments (represented

21 by ce, cg, eg, ei and gi) or B1-enriched compartments (represented by DF, DH and FH), but not

22 between these two compartments (Figure 1E, dotted boxes). The interaction frequencies between

23 D and F (DF) or between c and e (ce) are much stronger than those of D or F with c or e (cD, De,

24 eF), despite the fact that these regions are closer in the linear sequence.

25 Conversion of Hi-C contact frequencies into Pearson correlation coefficients sharpened our

26 view of the long-range chromatin interactions (Figure 1F). Strikingly, the plaid pattern of the Hi-C 7

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 correlation map appears to perfectly match the distribution and interaction status of L1 and B1. L1-

2 rich or B1-rich regions show strong enrichment of contacts with regions containing the same repeat

3 type (red blocks in Figure 1F). We refer to these as homotypic contacts. Contacts between regions

4 containing the other repeat type (heterotypic interactions) are strongly depleted (blue blocks in

5 Figure 1F). For example, in one region of chr17 (35 to 95 Mb), L1-enriched segments (from e to

6 u) and B1-enriched segments (from F to T) exhibit high frequencies of homotypic contacts (Figure

7 1F, highlighted by arrows), but strong depletion of heterotypic contacts. Similarly, homotypic

8 contacts between L1-rich regions or B1-rich regions were also observed between chromosomes, as

9 illustrated by chromosomes 17 and 19 (Figure 1G). These results indicate that genomic regions

10 containing B1 or L1 repeats tend to interact with genomic regions containing repeat sequences

11 from similar subfamilies, but not from different subfamilies, regardless of linear proximity. Thus,

12 homotypic clustering of regions rich in B1 or L1 repeats partitions the genome into distinct A and

13 B compartments, respectively.

14

15 DNA FISH reveals the spatial segregation of L1 and B1 compartments

16 To visualize the nuclear localization of B1 and L1 repeats in cells we performed dual-color

17 DNA fluorescence in situ hybridization (FISH) using fluorescence-tagged oligonucleotide probes

18 that specifically target the consensus sequences of B1 and L1 elements. Consistent with the Hi-C

19 analysis, L1 and B1 exhibit distinct and complementary nuclear localizations in mESCs (Figures

20 2A and 2B). While B1 DNA shows punctate signals in the nuclear interior, L1 DNA exhibits highly

21 organized and concentrated signals that line the periphery of the nucleus and nucleolus. Weak L1

22 signals were also detected in a few areas of the nuclear interior subregions where B1 signals were

23 absent. Both B1 and L1 signals are absent from DAPI-dense regions, which likely represent

24 satellite repeat-enriched chromocenters (Mateos-Langerak et al., 2007). To confirm the L1

25 localization at the nucleolar periphery in mESCs, we performed immuno-FISH, which combines

26 DNA FISH with immunofluorescence staining using an antibody against the nucleolar marker 8

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Nucleolin (NCL). Indeed, L1 signals surround and partially overlap with the ring-shaped signals

2 of NCL at the nucleolar periphery (Figure 2C).

3 To ask whether L1 and B1 localizations might vary with cell type, we performed immuno-

4 FISH in four other representative mouse and human cell lines: neural stem cells (NSC), fibroblasts

5 (NIH3T3), myoblasts (C2C12) and Hela cells (Figure 2C). Similar to mESCs, all these cells show

6 non-overlapping localizations of B1/Alu repeats, which are present in the nuclear interior, and L1

7 repeats, which are mainly located at the nuclear and nucleolar peripheries. The distributions of

8 B1/Alu and L1 are reminiscent of the nuclear localization of euchromatin and heterochromatin,

9 respectively. Together, the results from Hi-C and imaging analysis demonstrate that homotypic

10 B1/Alu or L1 DNA sequences associate together to form larger clusters, which divide the nucleus

11 into distinct territories of A/B compartments, and nuclear co-segregation of B1/Alu and L1 repeats

12 with the A and B compartments is conserved across different cell types in mouse and human.

13

14 Dynamic establishment of L1 and B1 segregation during the cell cycle and embryonic

15 development

16 As chromatin organization undergoes dynamic changes during mitosis and after fertilization,

17 we asked whether L1 and B1 compartments are established and re-constructed in these two

18 biological processes. DNA FISH analysis of synchronous mESCs showed that L1 and B1

19 localizations change dramatically at different cell-cycle stages (Figures 3A and S2A). S-phase cells

20 show non-overlapping and complementary localizations of L1 and B1 repeats (Figures 2 and 3A).

21 This is similar to the pattern we observed previously in asynchronous mESCs, most of which are

22 in the S phase of the cell cycle (Figure S2A). However, L1 and B1 DNA signals are mixed on

23 mitotic chromosomes in metaphase (M phase, including prophase and anaphase), when the nuclear

24 membrane and nucleoli are disassembled. As the cell cycle progresses into the G1 phase, L1 and

25 B1 DNA start to segregate again. To quantify the degree of segregation, we defined a FISH-based

26 segregation index as the negative value of Pearson’s correlation coefficient of L1 and B1 DNA 9

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 signals in the nucleus. The FISH segregation index is lowest in M-phase cells, but increases

2 significantly in the G1 phase and peaks in the S phase (Figure 3B).

3 To provide further molecular evidence for segregation of repeats during the cell cycle, we

4 analyzed the published Hi-C data from cell-cycle synchronized mESCs and HeLa cells (Nagano et

5 al., 2017; Naumova et al., 2013). In both cell types, G1-phase cells exhibit a classic plaid pattern

6 of hierarchical interactions, with enriched and depleted interaction blocks outside of the diagonal

7 region of the Hi-C interaction heatmap (Figures 3C, 3D, S2B, and S2C). In contrast, M-phase cells

8 exhibit stronger signals along the diagonal, which represent the linearly organized, longitudinally

9 compressed array of consecutive chromatin loops. To quantify this difference, we defined a Hi-C-

10 based segregation index by calculating the ratio of homotypic versus heterotypic interaction

11 frequencies between L1 and B1/Alu subfamilies. Indeed, the Hi-C segregation index is

12 significantly higher in G1-phase cells than in M-phase cells (Figures 3C and S2B). These results

13 indicate that segregation of B1/Alu and L1 repeats is dispersed by mitosis, and is re-established

14 when the higher-order chromatin structure forms during each cell cycle. This finding agrees with

15 previous reports that in metaphase, chromosome folding becomes homogeneous and large

16 megabase-scale A and B compartments are lost, whilst in interphase, chromosomes return to a

17 highly compartmentalized state (Nagano et al., 2017; Naumova et al., 2013).

18 After fertilization, the chromatin undergoes extensive reprogramming from a markedly relaxed

19 state in zygotes to fully organized structures in blastocysts (Du et al., 2017; Flyamer et al., 2017;

20 Ke et al., 2017). We performed time-course DNA FISH analysis of L1 and B1 in early mouse

21 embryos (Figure 3E). L1 and B1 signals are largely overlapping in zygotes and early 2-cell embryos,

22 and become progressively more segregated during embryonic divisions (Figure 3E). A plot of the

23 FISH segregation indexes along the course of blastocyst development showed that the greatest

24 change (steepest trend-line) occurred between the zygote and the late 2-cell stage (Figure 3F). This

25 observation implies that the initiation of B1 and L1 compartmentalization may coincide with the

26 zygotic genome activation (ZGA), during which massive transcription switches on (Qiu et al., 10

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 2003). We then analyzed the published Hi-C data from early mouse embryos (Du et al., 2017).

2 Consistent with our results, we detected a gradual increase of the segregation index (Figure 3G).

3 Early 2-cell embryos exhibit prevalent cis-chromosomal contacts along the diagonal of the Hi-C

4 interaction map, whereas the plaid patterns of Hi-C interactions become readily detectable in late

5 2-cell embryos and are fully established in the inner cell mass (ICM) cells of blastocysts (Figure

6 3H). Thus, in early embryos, compartmentalization of B1- and L1-rich regions appears to be

7 established in a stepwise manner, coincident with de novo establishment of higher-order chromatin

8 structures.

9

10 Repeat RNA and transcription promote L1 and B1 compartmentalization

11 Transcription and nuclear architecture are closely interdependent and influence each other

12 (Dowen et al., 2014; Rowley and Corces, 2018; Rowley et al., 2017). We sought to test whether

13 RNA transcribed from L1 and B1 repeats mediates the function of their DNA sequences. We first

14 analyzed the effects of inhibition of L1 and B1 expression in early mouse embryos, as both repeats

15 are activated and highly expressed in two-cell embryos (Fadloun et al., 2013; Jachowicz et al., 2017;

16 Rothstein et al., 1992). We recently showed that that L1 RNA can be efficiently knocked down with

17 antisense morpholinos (AMO) in mouse embryos and mESCs (Percharde et al., 2018). We

18 successfully depleted L1 and B1 transcripts by injecting antisense morpholinos or oligonucleotides

19 (ASO) into mouse zygotes (Figures S3A-B). Embryos depleted of L1 or B1 RNA were able to pass

20 the first embryonic division and their development was indistinguishable from control embryos at

21 the 2-cell stage; however, they failed to divide further and became arrested at the 2-cell stage

22 (Figure 4A). We collected these embryos for DNA FISH analysis when the control group injected

23 with scramble AMO or ASO had grown to the late 2-cell stage. In embryos depleted of L1 or B1

24 RNA, L1/B1 segregation is impaired, as indicated by significantly lower L1/B1 segregation

25 indexes compared to scramble control and noninjected late 2-cell embryos (Figures 4B and S3C-

26 E). This result suggests that formation of L1 and B1 compartments is delayed upon depletion of L1 11

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 or B1 transcripts, which is consistent with the observed embryonic arrest.

2 It was reported that inhibition of Pol II transcription by -amanitin causes embryonic arrest at

3 the late 2-cell stage (Du et al., 2017; Qiu et al., 2003). We analyzed the Hi-C data obtained by Du

4 et al and found that -amanitin-treated embryos exhibit predominantly cis-chromosomal contacts

5 along the diagonal of the Hi-C map, in a pattern similar to that of early 2-cell embryos, while the

6 control group shows the plaid-like pattern of late 2-cell embryos (Figures S4A). Consistent with

7 this, treating mESCs with the transcription inhibitors Actinomycin D (ActD) or DRB (5,6-dichloro-

8 1-b-D-ribofuranosylbenzimidizole) also altered L1/B1 compartmentalization as shown by loss of

9 L1 perinucleolar localization and gain of mixed L1 and B1 signals (Figures 4C, 4D and S4B).

10 Notably, inhibition of both Pol I and II by ActD caused a more severe effect than inhibition of Pol

11 II alone by DRB. These results indicate that transcription facilitates the nuclear organization.

12 Next, we sought to deplete repeat transcripts in mESCs in order to dissect the effects on

13 chromatin organization independent of embryonic progression. B1/Alu repeats are known to be

14 broadly involved in diverse cellular processes, including transcription and RNA processing and

15 nuclear export (Chen et al., 2009; Dominissini et al., 2011; Lubelsky and Ulitsky, 2018; Polak and

16 Domany, 2006). Treatment of mESCs with B1 ASO led to severe cell death within hours of

17 transfection (data not shown), which precluded direct assessment of the function of B1 transcripts

18 in organizing the nuclear structure. L1 repeats, the most abundant of all repeat subclasses, are

19 predominantly enriched in heterochromatic B compartments. It was suggested that heterochromatin

20 interactions play a more important role in nuclear compartmentalization than euchromatin

21 interactions (Falk et al., 2019; Houda Belaghzal, 2019). To reveal a direct role of repeat RNA in

22 the nuclear organization, we focused on L1 in subsequent investigations.

23 Treatment of mESCs with L1 AMOs drastically altered the formation of L1/B1 compartments

24 as shown by the appearance of the overlapping, yellowish FISH signals of B1 and L1 DNA (Figures

25 4E, 4F, and S5A). In particular, when L1 RNA was depleted by AMO treatment for 12 or 36 hours,

12

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 perinucleolar L1 signals were diffuse or absent and nucleoplasmic signals were dramatically

2 increased at both time-points. In contrast to the punctate pattern of B1 DNA in the control mESCs,

3 B1 signals became more uniformly dispersed in the nucleoplasm, which is indicative of decreased

4 clustering. In comparison, treatment with the drug azidothymidine (AZT), which blocks L1

5 retrotransposition activity (Xie et al., 2010), failed to affect L1 RNA levels and the nuclear

6 localization of L1 and B1 (Figures 4E, 4F, and S5B). In addition, statistical analysis indicated that

7 the FISH segregation indexes were significantly lower in L1-depleted mESCs than in mESCs

8 treated with scramble AMO or AZT. These results suggest a role of L1 transcripts in regulating the

9 nuclear localization and segregation of L1 and B1 DNA repeats, independently of L1

10 retrotransposition activity, in mESCs.

11

12 L1 RNA regulates the higher-order chromatin structure in mESCs

13 For in-depth characterization of the molecular defects caused by L1 RNA depletion, we

14 performed in situ Hi-C analysis at 36 hours after transfecting L1 AMO into mESCs. Direct

15 visualization of Hi-C interaction maps revealed obvious differences in the plaid pattern of L1-

16 depleted and control mESCs, as illustrated by chr17 (Figure 5A). In L1-depleted mESCs, Hi-C

17 contact signals were abnormally increased along the diagonal line, whereas the plaid signals outside

18 of the diagonal regions became fuzzy or even lost (Figure 5A, panel i). Accordingly, comparison

19 of Hi-C contact frequencies of the control versus L1-depleted cells showed decreased ratios (in

20 blue) across the diagonal and increased ratios (in red) in the periphery (Figure 5A, panel ii). These

21 changes indicate enhanced local chromosomal contacts but decreased long-range interactions in

22 L1-depleted mESCs.

23 A zoomed-in view of a 42-Mb region of chr17 in L1-depleted mESCs further shows decreased

24 homotypic chromatin contacts between L1-rich or B1-rich compartments, but abnormally

25 increased heterotypic contacts (Figure 5B, right; and Figure 5C). For example, B1-B1 interactions

26 (represented by DF, DH and FH) or L1-L1 interactions (represented by ce, cg, ci, eg, ei and gi) 13

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 were downregulated in L1-depleted cells, whereas aberrant B1-L1 contacts (represented by cD, De,

2 Dg, Di, eF, Fg, and Hi) were increased. Globally, L1-depleted cells show significantly lower Hi-C

3 segregation indexes compared to control mESCs (Figure 5A). Consistent with the overlapping

4 signals of L1 and B1 DNA FISH (Figure 4E), these results revealed defective nuclear segregation

5 of L1 and B1 compartments after depletion of L1 RNA (Figure 5B).

6 Next, to directly visualize changes in spatial contacts at specific loci, we performed oligopaint

7 dual-color DNA FISH (Figure 6A). We labeled four representative genomic regions on chr17, a

8 B1-enriched region (F) and three L1-enriched regions (e, g and q), with a set of 500 ~ 4,500 DNA

9 probes for each region at a density of 200 bp per probe, and measured the spatial distances between

10 the probe signals within mESC nuclei. In control mESCs, the two linearly adjacent L1- and B1-

11 enriched regions e and F are positioned away from each other in the nuclear space with a median

12 distance of 1.72  0.55 µm. Depletion of L1 RNA significantly shortened the distance between

13 them to 1.27  0.47 µm (Figures 6B and 6C, left). In comparison, the two L1-enriched

14 compartments g and q reside in close spatial proximity (1.08  0.45 µm) in the nuclei of control

15 cells even though they are separated by multiple L1 and B1 compartments in the linear genomic

16 sequence. L1 depletion significantly increased the nuclear distance between g and q to 1.81  0.82

17 µm (Figure 6C, right). Together, DNA FISH and Hi-C results demonstrate that L1 RNA is critical

18 in maintaining the high-order chromatin structure of segregated L1- and B1-enriched compartments

19 in mESCs.

20

21 L1 repeats promote phase separation of HP1

22 Recent evidence suggests that the formation of heterochromatin domains entails a phase-

23 separation mechanism (Larson et al., 2017; Li et al., 2012; Strom et al., 2017; Zwicker et al., 2014).

24 HP1, a known component of heterochromatin, forms phase-separated droplets in the presence of

25 DNA or nucleosomes in vitro (Larson et al., 2017; Strom et al., 2017). We showed that HP1 binds 14

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 strongly to L1-enriched but not B1-enriched genomic regions (Figure 7A). In addition, L1 RNA

2 specifically binds L1 DNA sequences throughout the genome, and is significantly enriched in L1

3 repeat-associated compartments, but is depleted in B1-associated compartments (Figures 1A and

4 7A), as shown by chromatin isolation by RNA purification followed by sequencing of L1 RNA

5 (ChIRP-seq) (Lu et al., paper submitted). Given the abundance and co-residence of L1 repeats with

6 HP1 in similar heterochromatin contexts within the nucleus, we hypothesized that L1 RNA may

7 facilitate heterochromatin formation by promoting HP1 phase separation.

8 We firstly tested the direct protein-RNA interaction between the recombinant human HP1

9 protein and L1 RNA fragments that were transcribed in vitro. Interestingly, the L1 RNA mix

10 captures HP1 protein in extremely high stringent washes (up to 1 M of salt and urea), indicating

11 that HP1 has a strong RNA-binding activity towards L1 (Figure 7B). Droplet formation assays

12 showed that HP1 alone fails to phase separate (Figures 7C), consistent with a previous report

13 (Larson et al., 2017). Intriguingly, upon addition of L1 RNA, HP1 forms spherical droplets with

14 the properties of liquid-like condensates, such as fusion of droplets and rapid recovery after photo-

15 bleaching (Figures 7C and S6A, B). HP1-L1 RNA droplets increase in size when increasing

16 concentrations of components are added into the assay (Figure 7C). L1 DNA also promotes the

17 phase separation of HP1 (Figure S6C). These results suggest that L1 repeats play an active role

18 in heterochromatin formation.

19

20 Discussion

21 Although tremendous efforts have been dedicated to studies of structural chromatin proteins

22 and cataloging chromatin maps, the role of DNA sequences in 3D genome organization has been

23 largely ignored. It has been speculated that genome organization may occur through polymer phase

15

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 separation mediated by homologous sequence pairing (Cook, 2002; Cook and Marenduzzo, 2018;

2 Falk et al., 2019; Rowley et al., 2017; Tang, 2017). However, it was unclear which types of repeats

3 are involved in this process, and direct evidence was still lacking. In addition, the molecular

4 determinants and mechanisms that drive formation of the A/B compartments remain unclear. In

5 this study, we provide experimental evidence to support a fundamental role of L1 and B1/Alu

6 repeats in organizing chromatin macrostructure.

7 Complementary results based on genomics, Hi-C and imaging analysis revealed that L1 and

8 B1/Alu repeats tend to cluster with sequences from their own repeat subfamily and form mutually

9 exclusive domains in the nuclear space that are highly correlated with A/B compartments (Figures

10 1 and 2). The nuclear segregation of B1/Alu-enriched A compartments in the nuclear interior and

11 L1-enriched B compartments in the nuclear and nucleolar peripheries is relatively conserved and

12 stable across a variety of mouse and human cells, and re-occurs during the cell cycle (Figure 2). In

13 addition, de novo establishment of the segregated pattern of L1- and B1-rich compartments is

14 coincident with the formation of higher-order chromatin structures during early embryogenesis,

15 and appears to be critically regulated by RNA transcripts produced from L1 and B1 DNA sequences.

16 Depletion of L1 RNA by AMOs in mESCs drastically disrupts the nuclear localization of L1 repeats,

17 and leads to impaired segregation of L1 and B1 compartments and altered higher-order chromatin

18 structures. Moreover, L1 RNA binds to its own DNA sequences in cells, and promotes HP1 phase

19 separation in vitro. Together, our findings suggest that repeat transcripts may transact the structural

20 information embedded in their DNA sequences and facilitate formation of the nuclear architecture

21 with assistance from protein partners, such as HP1 (Figures 4-7).

22 Based on our observations together with previous reports in the literature, we propose a

23 hypothetical model in which repetitive elements organize the macroscopic structure of the genome

24 at three hierarchical levels (Figure 7D). First, L1 and B1/Alu repeats serve as the genetic basis for

25 A and B compartments. The abundance and scattering of these repeat elements in the genome

16

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 provide numerous nucleation points or "structural codes" to seed the formation of nuclear

2 subdomains. Homotypic clustering of Ll-rich regions or B1-rich regions initiates genome folding.

3 Second, the structural information embedded in linear genomic DNA repeats is in part transacted

4 by their transcripts, particularly L1 RNA as demonstrated in this study, into spatially ordered

5 chromatin in the nucleus. Molecular crowding and specific interactions among similar molecules,

6 including DNA sequences and RNAs containing multiple copies of the same repeat type, and their

7 associated proteins, promote clustering into larger subdomains (Bergeron-Sandoval et al., 2016;

8 Cook, 2002; Misteli, 2007). Phase separation of individual subdomains may also lead to their

9 segregation in the nuclear space and the formation of distinct L1-enriched heterochromatin and

10 B1/Alu-enriched euchromatin domains. Third, chromatin compartmentalization may be further

11 reinforced and stabilized through attachment of repeat DNA sequences to subnuclear structures

12 such as the nucleolus and nuclear speckles (Quinodoz et al., 2018). Evidence to support this notion

13 includes our observation that L1 repeats are preferentially localized at the nuclear and nucleolar

14 peripheries, and depletion of L1 RNA disrupts the localization of L1 DNA at these sites.

15 In the absence of zygotic transcription, chromatin folding continues to occur in a significantly

16 delayed manner (Figures 4 and S4), and the higher-order chromatin structure still partially forms

17 (Du et al., 2017). This observation implies that the 3D organization of chromatin is an autonomous

18 process that is greatly facilitated by transcription. We surmise that homotypic clustering of L1-rich

19 or B1-rich regions is autonomous but dynamically regulated. L1/B1-driven nuclear

20 compartmentalization is reminiscent of the self-assembly of sequences such as

21 ribosomal DNA (rDNA) and satellite repeats, which promotes high-order assemblages of the

22 nucleolus and pericentromeric domains, respectively (Falahati et al., 2016; Tjong et al., 2016).

23 Consistent with our model, in mouse ESCs carrying a human artificial chromosome, human L1

24 sequences spatially interact with mouse L1-enriched genomic regions but avoid the SINE-enriched

25 regions, and vice versa for human SINEs (van de Werken et al., 2017). Specific DNA-RNA-protein

26 interactions at homologous L1 or B1 chromosomal segments not only enhance the formation of 17

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 subdomains through phase separation, but may also promote their segregation in the nucleus. A

2 recent study reported that there does not appear to be cross-talk between different transcription

3 factors; instead, they interact among themselves via sequence-specific protein-protein interactions

4 to form concentrated hubs on synthetic lacO DNA arrays (Chong et al., 2018). L1 or B1/Alu repeats

5 tend to associate with distinct sets of transcription and chromatin regulators (Lu et al., paper

6 submitted). For example, heterochromatin proteins such as HP1, KAP1, and SETDB1 are

7 specifically enriched on L1 elements, whereas general transcription factors such as GTF3C2 and

8 CEBPB and RNA Pol II subunits show enriched binding on B1/Alu. In addition, different

9 physiochemical properties of phase-separated L1 or B1 subdomains, for example chromatin

10 compactness, may further promote the segregation of L1 and B1 compartments.

11 By focusing on L1 transcripts, we demonstrate an essential role for L1 repeat RNA in

12 promoting the formation of segregated higher-order chromatin structure in mESCs (Figures 4-6).

13 Heterochromatin domains appear to be more stable than euchromatin domains (Houda Belaghzal,

14 2019). In support of a heterochromatin-dominated model (Falk et al., 2019), our data further

15 suggest that L1-mediated heterochromatin formation may drive nuclear compartmentalization.

16 This function of L1 RNA might be achieved by stabilizing the binding of chromatin proteins such

17 as HP1 to L1 DNA sequences, and by recruiting additional RNA-binding proteins to increase

18 local protein concentrations and multivalent interactions for efficient phase separation. It was

19 reported that the RNA-binding activity of mammalian HP1 is required for its localization to

20 pericentromeric chromatin (Muchardt et al., 2002; Stunnenberg et al., 2015). On the other hand,

21 heterochromatic RNA reduces the mobility and spreading of the fission yeast HP1 protein in

22 heterochromatic domains (Keller et al., 2012; Muchardt et al., 2002; Stunnenberg et al., 2015).

23 From an evolutionary perspective, the properties of co-localization, abundance, and sequence

24 preference of L1 and HP1 might have co-evolved to ensure regulated formation of L1-enriched

25 heterochromatin in organizing the nucleus. L1 RNA binds NCL, and its depletion causes relocation

18

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 of DUX genes from the perinucleolar heterochromatin to the nucleoplasm (Percharde et al., 2018).

2 Abnormal activation or silencing of L1 in mouse zygotes led to altered chromatin accessibility and

3 compaction, which failed to be rescued by synthetic L1 RNA (Jachowicz et al., 2017). Together

4 with previous reports, our results suggest a model where L1 RNA acts in cis to anchor L1 repeat-

5 enriched chromosomal segments to the nucleolar and nuclear peripheries. This scaffold function is

6 reminiscent of reported roles for several chromatin-bound noncoding RNAs in organizing

7 subnuclear domains. For example, Xist RNA binds the lamin B receptor in the inner nuclear

8 membrane to initiate and maintain the silencing of the inactive X chromosome (McHugh et al.,

9 2015). Transcription at major satellite repeats precedes chromocenter formation, and satellite

10 transcripts help to recruit the H3K9me3 methyltransferase SUV39H to centromeric DNA

11 sequences (Probst et al., 2010; Velazquez Camacho et al., 2017). However, the exact mechanism

12 underlying the function of L1 RNA remains to be fully elucidated by future studies.

13 In summary, our study starts to unravel the fundamental principle of 3D genome organization,

14 in which the distribution of L1 and B1 transposable elements provides the blueprint for chromatin

15 folding. As discovered by Anfinsen in the late 1950s, the central pillar of protein science tells us

16 that the amino acid sequence of a protein determines its structure and function (Anfinsen, 1972;

17 Anfinsen et al., 1954). Analogously, we propose that the primary DNA sequences determine how

18 the genome folds and functions. We believe that genome folding occurs autonomously, through a

19 process that is driven by homotypic clustering of regions containing L1 or B1 repeat sequences,

20 and is further facilitated by transcription and transcripts produced at these repeat elements.

21 Structural information embedded in L1 and SINE repeats may be universally recognizable, thus

22 contributing to the high degree of stability and conservation in compartmental and TAD

23 organization that is observed across mammalian species and cell types. We want to note that L1

24 and B1/Alu compartmental domains represent a structural and functional ground state of chromatin

25 organization, on which subsequent regulatory features, such as dynamic enhancer-

26 interactions, are overlaid. Nevertheless, the same principle of homotypic clustering, phase 19

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 separation and spatial segregation of chromosomal segments may be reiterated at different genomic

2 scales, consequently folding the genome.

3

4 ACKNOWLEDGMENTS

5 We thank the Shen Laboratory members for insightful discussions, and we thank Dr. Hongxia Lv

6 at the core imaging facility of the School of Life Sciences, Peking University, for imaging support.

7 This work was supported in part by the National Basic Research Program of China

8 (2018YFA0107604, 2017YFA0504204 to X.S.); the National Natural Science Foundation of China

9 (31630095, 31471219 to X.S.; 21573013, 21825401 to Y.S.); the National Key R&D Program of

10 China (2017YFA0505300 to Y.S.); and the Center for Life Sciences (CLS) at Tsinghua University

11 (to X.S).

12

13 AUTHOR CONTRIBUTIONS

14 X.S. conceived of and supervised the study. X.S., Y.S., J.Y.L. and L.C. designed the experiments.

15 J.Y.L. performed bioinformatics analysis. L.C. conducted all imaging experiments with help from

16 Y.Y., G.Z., T.W., X.H., and J.Z. T.L. performed phase separation and RNA pull-down experiments.

17 K.Z., Q.P. and P.Y. prepared the Hi-C library. T.W., W.L. and P.W. did embryo injections. P.L. and

18 L.W. provide HP1 proteins. M.P and M.R.-S. contributed to early experiments of L1 RNA

19 knockdown. W.X., J.N., H.Z., X.B., W.S., Z.W. and Y.H. contributed technical

20 assistance/suggestions. X.S., J.Y.L. and L.C. wrote the manuscript with input from all authors.

21

22 DECLARATION OF INTERESTS

23 The authors declare no competing interests.

24

25

20

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 REFERENCES

2 Anfinsen, C.B. (1972). The formation and stabilization of protein structure. Biochem J 128, 737-749. 3 Anfinsen, C.B., Redfield, R.R., Choate, W.L., Page, J., and Carroll, W.R. (1954). Studies on the gross structure, 4 cross-linkages, and terminal sequences in ribonuclease. J Biol Chem 207, 201-210. 5 Annunziato, A.T. (2008). DNA Packaging: Nucleosomes and Chromatin. Nature Education 1(1):26. 6 Bergeron-Sandoval, L.P., Safaee, N., and Michnick, S.W. (2016). Mechanisms and Consequences of 7 Macromolecular Phase Separation. Cell 165, 1067-1079. 8 Biemont, C. (2010). A Brief History of the Status of Transposable Elements: From Junk DNA to Major Players in 9 Evolution. 186, 1085-1093. 10 Bintu, B., Mateo, L.J., Su, J.H., Sinnott-Armstrong, N.A., Parker, M., Kinrot, S., Yamaya, K., Boettiger, A.N., and 11 Zhuang, X. (2018). Super-resolution chromatin tracing reveals domains and cooperative interactions in single 12 cells. Science 362. 13 Boettiger, A.N., Bintu, B., Moffitt, J.R., Wang, S., Beliveau, B.J., Fudenberg, G., Imakaev, M., Mirny, L.A., Wu, C.T., 14 and Zhuang, X. (2016). Super-resolution imaging reveals distinct chromatin folding for different epigenetic 15 states. Nature 529, 418-422. 16 Bonev, B., and Cavalli, G. (2016). Organization and function of the 3D genome. Nat Rev Genet 17, 661-678. 17 Bourque, G., Leong, B., Vega, V.B., Chen, X., Lee, Y.L., Srinivasan, K.G., Chew, J.L., Ruan, Y., Wei, C.L., Ng, H.H., et 18 al. (2008). Evolution of the mammalian transcription factor binding repertoire via transposable elements. 19 Genome Res 18, 1752-1762. 20 Bouwman, B.A., and de Laat, W. (2015). Getting the genome in shape: the formation of loops, domains and 21 compartments. Genome Biol 16, 154. 22 Buchwalter, A., Kaneshiro, J.M., and Hetzer, M.W. (2019). Coaching from the sidelines: the nuclear periphery in 23 genome regulation. Nature Reviews Genetics 20, 39-50. 24 Chen, C., Ara, T., and Gautheret, D. (2009). Using Alu elements as polyadenylation sites: A case of 25 exaptation. Mol Biol Evol 26, 327-334. 26 Chong, S., Dugast-Darzacq, C., Liu, Z., Dong, P., Dailey, G.M., Cattoglio, C., Heckert, A., Banala, S., Lavis, L., 27 Darzacq, X., et al. (2018). Imaging dynamic and selective low-complexity domain interactions that control gene 28 transcription. Science 361. 29 Chuong, E.B., Elde, N.C., and Feschotte, C. (2016). Regulatory evolution of innate immunity through co-option 30 of endogenous retroviruses. Science 351, 1083-1087. 31 Cook, P.R. (2002). Predicting three-dimensional genome structure from transcriptional activity. Nat Genet 32, 32 347-352. 33 Cook, P.R., and Marenduzzo, D. (2018). Transcription-driven genome organization: a model for chromosome 34 structure and the regulation of gene expression tested through simulations. Nucleic Acids Res 46, 9895-9906. 35 Deininger, P. (2011). Alu elements: know the SINEs. Genome Biol 12, 236. 36 Dekker, J., Belmont, A.S., Guttman, M., Leshyk, V.O., Lis, J.T., Lomvardas, S., Mirny, L.A., O'Shea, C.C., Park, P.J., 37 Ren, B., et al. (2017). The 4D nucleome project. Nature 549, 219-226. 38 Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three-dimensional organization of 39 genomes: interpreting chromatin interaction data. Nat Rev Genet 14, 390-403. 40 Dekker, J., Rippe, K., Dekker, M., and Kleckner, N. (2002). Capturing chromosome conformation. Science 295, 41 1306-1311. 42 Dixon, J.R., Jung, I., Selvaraj, S., Shen, Y., Antosiewicz-Bourget, J.E., Lee, A.Y., Ye, Z., Kim, A., Rajagopal, N., Xie, 43 W., et al. (2015). Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331-336. 44 Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Ren, B. (2012). Topological domains

21

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376-380. 2 Dominissini, D., Moshitch-Moshkovitz, S., Amariglio, N., and Rechavi, G. (2011). Adenosine-to-inosine RNA 3 editing meets . Carcinogenesis 32, 1569-1577. 4 Dowen, J.M., Fan, Z.P., Hnisz, D., Ren, G., Abraham, B.J., Zhang, L.N., Weintraub, A.S., Schujiers, J., Lee, T.I., Zhao, 5 K., et al. (2014). Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. 6 Cell 159, 374-387. 7 Du, Z., Zheng, H., Huang, B., Ma, R., Wu, J., Zhang, X., He, J., Xiang, Y., Wang, Q., Li, Y., et al. (2017). Allelic 8 reprogramming of 3D chromatin architecture during early mammalian development. Nature 547, 232-235. 9 Durruthy-Durruthy, J., Sebastiano, V., Wossidlo, M., Cepeda, D., Cui, J., Grow, E.J., Davila, J., Mall, M., Wong, 10 W.H., Wysocka, J., et al. (2016). The primate-specific noncoding RNA HPAT5 regulates pluripotency during 11 human preimplantation development and nuclear reprogramming. Nat Genet 48, 44-52. 12 Fadloun, A., Le Gras, S., Jost, B., Ziegler-Birling, C., Takahashi, H., Gorab, E., Carninci, P., and Torres-Padilla, M.E. 13 (2013). Chromatin signatures and profiling in mouse embryos reveal regulation of LINE-1 by 14 RNA. Nat Struct Mol Biol 20, 332-338. 15 Falahati, H., Pelham-Webb, B., Blythe, S., and Wieschaus, E. (2016). Nucleation by rRNA Dictates the Precision 16 of Nucleolus Assembly. Curr Biol 26, 277-285. 17 Falk, M., Feodorova, Y., Naumova, N., Imakaev, M., Lajoie, B.R., Leonhardt, H., Joffe, B., Dekker, J., Fudenberg, 18 G., Solovei, I., et al. (2019). Heterochromatin drives compartmentalization of inverted and conventional nuclei. 19 Nature 570, 395-399. 20 Flyamer, I.M., Gassler, J., Imakaev, M., Brandao, H.B., Ulianov, S.V., Abdennur, N., Razin, S.V., Mirny, L.A., and 21 Tachibana-Konwalski, K. (2017). Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to- 22 zygote transition. Nature 544, 110-114. 23 Fraser, J., Ferrai, C., Chiariello, A.M., Schueler, M., Rito, T., Laudanno, G., Barbieri, M., Moore, B.L., Kraemer, D.C., 24 Aitken, S., et al. (2015). Hierarchical folding and reorganization of chromosomes are linked to transcriptional 25 changes in cellular differentiation. Mol Syst Biol 11, 852. 26 Gassler, J., Brandao, H.B., Imakaev, M., Flyamer, I.M., Ladstatter, S., Bickmore, W.A., Peters, J.M., Mirny, L.A., and 27 Tachibana, K. (2017). A mechanism of cohesin-dependent loop extrusion organizes zygotic genome 28 architecture. Embo J 36, 3600-3618. 29 Gibcus, J.H., and Dekker, J. (2013). The hierarchy of the 3D genome. Mol Cell 49, 773-782. 30 Grow, E.J., Flynn, R.A., Chavez, S.L., Bayless, N.L., Wossidlo, M., Wesche, D.J., Martin, L., Ware, C.B., Blish, C.A., 31 Chang, H.Y., et al. (2015). Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent 32 cells. Nature 522, 221-225. 33 Guelen, L., Pagie, L., Brasset, E., Meuleman, W., Faza, M.B., Talhout, W., Eussen, B.H., de Klein, A., Wessels, L., de 34 Laat, W., et al. (2008). Domain organization of human chromosomes revealed by mapping of nuclear lamina 35 interactions. Nature 453, 948-951. 36 Haarhuis, J.H.I., van der Weide, R.H., Blomen, V.A., Yanez-Cuna, J.O., Amendola, M., van Ruiten, M.S., Krijger, 37 P.H.L., Teunissen, H., Medema, R.H., van Steensel, B., et al. (2017). The Cohesin Release Factor WAPL Restricts 38 Chromatin Loop Extension. Cell 169, 693-707 e614. 39 Houda Belaghzal, T.B., Andrew D. Stephens , Denis L. Lafontaine , Sergey V. Venev , Zhiping Weng , John F. 40 Marko, Job Dekker (2019). Compartment-dependent chromatin interaction dynamics revealed by liquid 41 chromatin Hi-C. bioRxiv. 42 Jachowicz, J.W., Bing, X.Y., Pontabry, J., Boskovic, A., Rando, O.J., and Torres-Padilla, M.E. (2017). LINE-1 43 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nature Genetics 44 49, 1502-1510. 45 Jost, D., Carrivain, P., Cavalli, G., and Vaillant, C. (2014). Modeling epigenome folding: formation and dynamics 22

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 of topologically associated chromatin domains. Nucleic Acids Res 42, 9553-9561. 2 Jurka, J., Kohany, O., Pavlicek, A., Kapitonov, V.V., and Jurka, M.V. (2004). Duplication, coclustering, and selection 3 of human Alu retrotransposons. Proc Natl Acad Sci U S A 101, 1268-1272. 4 Kadauke, S., and Blobel, G.A. (2009). Chromatin loops in gene regulation. Biochim Biophys Acta 1789, 17-25. 5 Ke, Y., Xu, Y., Chen, X., Feng, S., Liu, Z., Sun, Y., Yao, X., Li, F., Zhu, W., Gao, L., et al. (2017). 3D Chromatin 6 Structures of Mature Gametes and Structural Reprogramming during Mammalian Embryogenesis. Cell 170, 7 367-381 e320. 8 Keller, C., Adaixo, R., Stunnenberg, R., Woolcock, K.J., Hiller, S., and Buhler, M. (2012). HP1(Swi6) mediates the 9 recognition and destruction of heterochromatic RNA transcripts. Mol Cell 47, 215-227. 10 Korenberg, J.R., and Rykowski, M.C. (1988). organization: Alu, lines, and the molecular structure 11 of metaphase chromosome bands. Cell 53, 391-400. 12 Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., 13 FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. 14 Larson, A.G., Elnatan, D., Keenen, M.M., Trnka, M.J., Johnston, J.B., Burlingame, A.L., Agard, D.A., Redding, S., 15 and Narlikar, G.J. (2017). Liquid droplet formation by HP1alpha suggests a role for phase separation in 16 heterochromatin. Nature 547, 236-240. 17 Li, P., Banjade, S., Cheng, H.C., Kim, S., Chen, B., Guo, L., Llaguno, M., Hollingsworth, J.V., King, D.S., Banani, S.F., 18 et al. (2012). Phase transitions in the assembly of multivalent signalling proteins. Nature 483, 336-340. 19 Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., 20 Sabo, P.J., Dorschner, M.O., et al. (2009). Comprehensive mapping of long-range interactions reveals folding 21 principles of the human genome. Science 326, 289-293. 22 Liu, N., Lee, C.H., Swigut, T., Grow, E., Gu, B., Bassik, M.C., and Wysocka, J. (2018). Selective silencing of 23 euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature 553, 228-232. 24 Lubelsky, Y., and Ulitsky, I. (2018). Sequences enriched in Alu repeats drive nuclear localization of long RNAs in 25 human cells. Nature 555, 107-111. 26 Lynch, V.J., Leclerc, R.D., May, G., and Wagner, G.P. (2011). Transposon-mediated rewiring of gene regulatory 27 networks contributed to the evolution of pregnancy in mammals. Nat Genet 43, 1154-1159. 28 Mandal, P.K., and Kazazian, H.H., Jr. (2008). SnapShot: Vertebrate transposons. Cell 135, 192-192 e191. 29 Mateos-Langerak, J., Brink, M.C., Luijsterburg, M.S., van der Kraan, I., van Driel, R., and Verschure, P.J. (2007). 30 Pericentromeric heterochromatin domains are maintained without accumulation of HP1. Molecular Biology of 31 the Cell 18, 1464-1471. 32 McHugh, C.A., Chen, C.K., Chow, A., Surka, C.F., Tran, C., McDonel, P., Pandya-Jones, A., Blanco, M., Burghard, 33 C., Moradian, A., et al. (2015). The Xist lncRNA interacts directly with SHARP to silence transcription through 34 HDAC3. Nature 521, 232-236. 35 Meuleman, W., Peric-Hupkes, D., Kind, J., Beaudry, J.B., Pagie, L., Kellis, M., Reinders, M., Wessels, L., and van 36 Steensel, B. (2013). Constitutive nuclear lamina-genome interactions are highly conserved and associated with 37 A/T-rich sequence. Genome Res 23, 270-280. 38 Mirny, L.A. (2011). The fractal globule as a model of chromatin architecture in the cell. Chromosome Res 19, 39 37-51. 40 Misteli, T. (2007). Beyond the sequence: cellular organization of genome function. Cell 128, 787-800. 41 Mouse Genome Sequencing, C., Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P., 42 Agarwala, R., Ainscough, R., Alexandersson, M., et al. (2002). Initial sequencing and comparative analysis of the 43 mouse genome. Nature 420, 520-562. 44 Muchardt, C., Guilleme, M., Seeler, J.S., Trouche, D., Dejean, A., and Yaniv, M. (2002). Coordinated methyl and 45 RNA binding is required for heterochromatin localization of mammalian HP1 alpha. Embo Reports 3, 975-981. 23

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Nagano, T., Lubling, Y., Vaarnai, C., Dudley, C., Leung, W., Baran, Y., Cohen, N.M., Wingett, S., Fraser, P., and 2 Tanay, A. (2017). Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61- 3 67. 4 Naumova, N., Imakaev, M., Fudenberg, G., Zhan, Y., Lajoie, B.R., Mirny, L.A., and Dekker, J. (2013). Organization 5 of the mitotic chromosome. Science 342, 948-953. 6 Nora, E.P., Goloborodko, A., Valton, A.L., Gibcus, J.H., Uebersohn, A., Abdennur, N., Dekker, J., Mirny, L.A., and 7 Bruneau, B.G. (2017). Targeted Degradation of CTCF Decouples Local Insulation of Chromosome Domains from 8 Genomic Compartmentalization. Cell 169, 930-944. 9 Nuebler, J., Fudenberg, G., Imakaev, M., Abdennur, N., and Mirny, L.A. (2018). Chromatin organization by an 10 interplay of loop extrusion and compartmental segregation. Proc Natl Acad Sci U S A 115, E6697-E6706. 11 Orgel, L.E., and Crick, F.H. (1980). Selfish DNA: the ultimate parasite. Nature 284, 604-607. 12 Percharde, M., Lin, C.J., Yin, Y., Guan, J., Peixoto, G.A., Bulut-Karslioglu, A., Biechele, S., Huang, B., Shen, X., and 13 Ramalho-Santos, M. (2018). A LINE1-Nucleolin Partnership Regulates Early Development and ESC Identity. Cell 14 174, 391-405 e319. 15 Polak, P., and Domany, E. (2006). Alu elements contain many binding sites for transcription factors and may 16 play a role in regulation of developmental processes. Bmc Genomics 7, 133. 17 Probst, A.V., Okamoto, I., Casanova, M., El Marjou, F., Le Baccon, P., and Almouzni, G. (2010). A strand-specific 18 burst in transcription of pericentric satellites is required for chromocenter formation and early mouse 19 development. Dev Cell 19, 625-638. 20 Pueschel, R., Coraggio, F., and Meister, P. (2016). From single genes to entire genomes: the search for a function 21 of nuclear organization. Development 143, 910-923. 22 Qiu, J.J., Zhang, W.W., Wu, Z.L., Wang, Y.H., Qian, M., and Li, Y.P. (2003). Delay of ZGA initiation occurred in 2- 23 cell blocked mouse embryos. Cell Res 13, 179-185. 24 Quinodoz, S.A., Ollikainen, N., Tabak, B., Palla, A., Schmidt, J.M., Detmar, E., Lai, M.M., Shishkin, A.A., Bhat, P., 25 Takei, Y., et al. (2018). Higher-Order Inter-chromosomal Hubs Shape 3D Genome Organization in the Nucleus. 26 Cell 174, 744-757 e724. 27 Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., 28 Omer, A.D., Lander, E.S., et al. (2014). A 3D map of the human genome at kilobase resolution reveals principles 29 of chromatin looping. Cell 159, 1665-1680. 30 Rao, S.S.P., Huang, S.C., St Hilaire, B.G., Engreitz, J.M., Perez, E.M., Kieffer-Kwon, K.R., Sanborn, A.L., Johnstone, 31 S.E., Bascom, G.D., Bochkov, I.D., et al. (2017). Cohesin Loss Eliminates All Loop Domains. Cell 171, 305-320. 32 Rebollo, R., Romanish, M.T., and Mager, D.L. (2012). Transposable Elements: An Abundant and Natural Source 33 of Regulatory Sequences for Host Genes. Annual Review of Genetics, Vol 46 46, 21-42. 34 Rivera-Mulia, J.C., and Gilbert, D.M. (2016). Replication timing and transcriptional control: beyond cause and 35 effect-part III. Curr Opin Cell Biol 40, 168-178. 36 Rothstein, J.L., Johnson, D., Deloia, J.A., Skowronski, J., Solter, D., and Knowles, B. (1992). Gene-Expression during 37 Preimplantation Mouse Development. Gene Dev 6, 1190-1201. 38 Rowley, M.J., and Corces, V.G. (2018). Organizational principles of 3D genome architecture. Nat Rev Genet 19, 39 789-800. 40 Rowley, M.J., Nichols, M.H., Lyu, X.W., Ando-Kuri, M., Rivera, I.S.M., Hermetz, K., Wang, P., Ruan, Y.J., and Corces, 41 V.G. (2017). Evolutionarily Conserved Principles Predict 3D Chromatin Organization. Mol Cell 67, 837-852. 42 Schwarzer, W., Abdennur, N., Goloborodko, A., Pekowska, A., Fudenberg, G., Loe-Mie, Y., Fonseca, N.A., Huber, 43 W., C, H.H., Mirny, L., et al. (2017). Two independent modes of chromatin organization revealed by cohesin 44 removal. Nature 551, 51-56. 45 Solovei, I., Thanisch, K., and Feodorova, Y. (2016). How to rule the nucleus: divide et impera. Current Opinion in 24

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Cell Biology 40, 47-59. 2 Splinter, E., Heath, H., Kooren, J., Palstra, R.J., Klous, P., Grosveld, F., Galjart, N., and de Laat, W. (2006). CTCF 3 mediates long-range chromatin looping and local histone modification in the beta-globin . Genes Dev 20, 4 2349-2354. 5 Stadhouders, R., Vidal, E., Serra, F., Di Stefano, B., Le Dily, F., Quilez, J., Gomez, A., Collombet, S., Berenguer, C., 6 Cuartero, Y., et al. (2018). Transcription factors orchestrate dynamic interplay between genome topology and 7 gene regulation during cell reprogramming. Nature Genetics 50, 238-249. 8 Stevens, T.J., Lando, D., Basu, S., Atkinson, L.P., Cao, Y., Lee, S.F., Leeb, M., Wohlfahrt, K.J., Boucher, W., 9 O'Shaughnessy-Kirwan, A., et al. (2017). 3D structures of individual mammalian genomes studied by single-cell 10 Hi-C. Nature 544, 59-64. 11 Strom, A.R., Emelyanov, A.V., Mir, M., Fyodorov, D.V., Darzacq, X., and Karpen, G.H. (2017). Phase separation 12 drives heterochromatin domain formation. Nature 547, 241-245. 13 Stunnenberg, R., Kulasegaran-Shylini, R., Keller, C., Kirschmann, M.A., Gelman, L., and Buhler, M. (2015). H3K9 14 methylation extends across natural boundaries of heterochromatin in the absence of an HP1 protein. Embo J 15 34, 2789-2803. 16 Sundaram, V., and Wang, T. (2018). Mediated Innovation in Gene Regulatory Landscapes 17 of Cells: Re-Visiting the "Gene-Battery" Model. Bioessays 40. 18 Tang, S.J. (2017). Potential Role of Phase Separation of Repetitive DNA in Chromosomal Organization. Genes 19 (Basel) 8. 20 Taylor, M.S., LaCava, J., Mita, P., Molloy, K.R., Huang, C.R., Li, D., Adney, E.M., Jiang, H., Burns, K.H., Chait, B.T., 21 et al. (2013). Affinity proteomics reveals human host factors implicated in discrete stages of LINE-1 22 retrotransposition. Cell 155, 1034-1048. 23 Tjong, H., Li, W.Y., Kalhor, R., Dai, C., Hao, S.L., Gong, K., Zhou, Y.G., Li, H.C., Zhou, X.J., Le Gros, M.A., et al. 24 (2016). Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. 25 P Natl Acad Sci USA 113, E1663-E1672. 26 van de Werken, H.J.G., Haan, J.C., Feodorova, Y., Bijos, D., Weuts, A., Theunis, K., Holwerda, S.J.B., Meuleman, 27 W., Pagie, L., Thanisch, K., et al. (2017). Small chromosomal regions position themselves autonomously 28 according to their chromatin class. Genome Res 27, 922-933. 29 van Steensel, B., and Belmont, A.S. (2017). Lamina-Associated Domains: Links with Chromosome Architecture, 30 Heterochromatin, and Gene Repression. Cell 169, 780-791. 31 Velazquez Camacho, O., Galan, C., Swist-Rosowska, K., Ching, R., Gamalinda, M., Karabiber, F., De La Rosa- 32 Velazquez, I., Engist, B., Koschorz, B., Shukeir, N., et al. (2017). Major satellite repeat RNA stabilize 33 heterochromatin retention of Suv39h enzymes by RNA-nucleosome association and RNA:DNA hybrid 34 formation. Elife 6. 35 Wang, J., Xie, G., Singh, M., Ghanbarian, A.T., Rasko, T., Szvetnik, A., Cai, H., Besser, D., Prigione, A., Fuchs, N.V., 36 et al. (2014). Primate-specific -driven transcription defines naive-like stem cells. Nature 37 516, 405-409. 38 Wang, S., Su, J.H., Beliveau, B.J., Bintu, B., Moffitt, J.R., Wu, C.T., and Zhuang, X. (2016). Spatial organization of 39 chromatin domains and compartments in single chromosomes. Science 353, 598-602. 40 Wijchers, P.J., Geeven, G., Eyres, M., Bergsma, A.J., Janssen, M., Verstegen, M., Zhu, Y., Schell, Y., Vermeulen, C., 41 de Wit, E., et al. (2015). Characterization and dynamics of pericentromere-associated domains in mice. Genome 42 Res 25, 958-969. 43 Wutz, G., Várnai, C., Nagasaka, K., Cisneros, D.A., Stocsits, R.R., Tang, W., Schoenfelder, S., Jessberger, G., Muhar, 44 M., Hossain, M.J., et al. (2017). Topologically associating domains and chromatin loops depend on cohesin and 45 are regulated by CTCF, WAPL, and PDS5 proteins. The EMBO journal 36, 3573-3599. 25

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Xie, Y., Rosser, J.M., Thompson, T.L., Boeke, J.D., and An, W. (2010). Characterization of L1 retrotransposition 2 with high-throughput dual-luciferase assays. Nucleic acids research 39, e16-e16. 3 Zane, L., Sharma, V., and Misteli, T. (2014). Common features of chromatin in aging and cancer: cause or 4 coincidence? Trends Cell Biol 24, 686-694. 5 Zullo, J.M., Demarco, I.A., Pique-Regi, R., Gaffney, D.J., Epstein, C.B., Spooner, C.J., Luperchio, T.R., Bernstein, 6 B.E., Pritchard, J.K., Reddy, K.L., et al. (2012). DNA sequence-dependent compartmentalization and silencing of 7 chromatin at the nuclear lamina. Cell 149, 1474-1487. 8 Zwicker, D., Decker, M., Jaensch, S., Hyman, A.A., and Julicher, F. (2014). Centrosomes are autocatalytic droplets 9 of pericentriolar material organized by centrioles. P Natl Acad Sci USA 111, E2636-E2645. 10 11

26

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 FIGURE LEGENDS

2 Figure 1. B1- and L1-rich genomic regions homotypically interact and correlate with A and

3 B compartments.

4 (A) Heatmaps of the distribution densities of B1, L1, L2, and ERV1 repeats and random genomic

5 regions (panel i), DNase I hypersensitive sites (DHS) and ChIP-seq signals of Pol II, H3K4me3,

6 HP1α, and H3K9me3 (panel ii), and ChIRP-seq signals of Malat1 and L1 RNA (panel iii), and

7 RNA-seq (panel iv) in mESCs across two adjacent compartments (Cn, Cn+1). All signals in 696

8 compartments annotated in mESCs were sorted according to the B1 distributions shown in

9 panel (i).

10 (B) Relative contents of various repeats across the A and B compartments annotated in various cell

11 types in mouse (top) and human (bottom). Three repeat subclasses are shown (B1 or Alu, L1,

12 and ERV1), together with random genomic regions as the negative control.

13 (C) Genome browser shots showing conserved domain structures as indicated by heatmaps of the

14 Hi-C contact matrix over a syntenic region in mouse (top) and human (bottom) ESCs. The B1

15 and L1 repeat densities, the A/B compartments shown by eigenvalues of the Hi-C contact

16 matrix, and Refseq gene annotations are shown underneath each heatmap.

17 (D) Heatmap of normalized interaction frequencies at 100-kb resolution on chromosome 17 in

18 mESCs. Genomic densities of B1 and L1 repeats are shown at the left and bottom.

19 (E) A zoomed-in view of the interaction matrix of the genomic region from 18 Mb to 60 Mb on

20 mouse chr17 (40-kb resolution). Genomic densities of B1 and L1 repeats, eigenvalues of the

21 Hi-C matrix representing A/B compartments from mouse mESCs, neural progenitor cells (NPC)

22 and neurons, and Pol II ChIP-seq signals in mESCs are also shown underneath. B1-enriched

23 regions are arbitrarily labeled as D, F, and H in uppercase. L1-enriched regions are labeled as

24 c, e, g, and i in lowercase. Some strong homotypic interactions between compartments rich in

25 the same repeat subfamily (for example, between the B1-rich regions DF, DE and FH, and

26 between the L1-rich regions ce and cg), are highlighted by dotted boxes. 27

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 (F) Correlation heatmap showing Pearson correlation coefficients of the interaction frequencies of

2 any two paired regions in a sub-region on chr17 (500-kb resolution). B1-enriched regions are

3 labeled in uppercase as F, H … R, T. L1-enriched regions are labeled in lowercase as e, g … q,

4 u. Dotted boxes and arrows highlight positive correlations (in red) of the anchor region F

5 (indicated by *) with other B1-enriched genomic regions (horizontal), and of the anchor region

6 g (indicated by *) with other L1-enriched genomic regions (vertical).

7 (G) Correlation heatmap showing Pearson correlation coefficients of inter-chromosomal interaction

8 frequencies between chromosomes 17 and 19 at 500-kb resolution. B1-enriched regions on

9 chr19 are arbitrarily labeled in uppercase as V, W, X, Y, and Z. L1-enriched regions on chr17

10 are labeled as shown in panels (E-F) from c, e … q, to u. The horizontal dotted box and arrows

11 highlight positive correlations of an L1-enriched region on chr19 with other L1-enriched

12 regions on chr17. The vertical dotted box and arrows show positive correlations of a B1-

13 enriched region on chr17 with other B1-enriched regions on chr19.

14

28

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 2. DNA FISH reveals the spatial segregation of L1 and B1 compartments.

2 (A) Representative images of L1 (green) and B1 (red) repeats revealed by DNA FISH in mESCs.

3 DNA is labeled by DAPI (blue). All scale bars, 5 μm.

4 (B) Normalized fluorescence intensity profiles along the white line in the merged B1/L1 image

5 shown in panel (A). The L1 channel shows four intensity peaks that surround the two intensity

6 peaks in the B1 channel. These two peaks in the B1 channel correspond to valleys in the L1

7 channel.

8 (C) Representative images of immuno-DNA FISH analysis of L1 (green) and B1 (red) DNA repeats,

9 and NCL protein (purple) in mESCs (n=37), NSC (n=23), NIH3T3 (n=18), C2C12 (n=11) and

10 Hela cells (n=15).

11

29

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 3. Dynamic segregation of L1 and B1 compartments during the cell cycle and

2 embryonic development.

3 (A) and (B) DNA FISH analysis of synchronized mESCs at different cell cycle stages.

4 Representative images and the scatterplot of the segregation index of L1 and B1 DNA are shown

5 in panels (A) and (B), respectively. Data are presented as mean ± standard deviation (SD). N,

6 number of nuclei analyzed. M phase and G1 phase data are compared to S phase using the two-

7 tailed Student’s t-test. p values are shown at the top.

8 (C) Boxplot analysis of the ratio of homotypic interactions versus heterotypic interactions for B1

9 and L1 based on single-cell Hi-C data from mESCs. The y-axis shows the segregation index (Hi-

10 C) of L1 and B1, which represents the ratio of average interaction frequency between

11 compartments containing similar repeats (B1.B1 and L1.L1) to that between compartments

12 containing different repeats (B1.L1) for all chromosomes (except X and Y). Larger values

13 indicate a higher degree of homotypic interaction between B1- or L1-enriched compartments. p

14 values are calculated with the two-tailed Student’s t-test.

15 (D) Heatmaps of normalized interaction frequencies at 500-kb resolution on chromosome 17 in

16 mESCs at M (left) and G1 (middle) phase. A comparison of contact frequencies between M and

17 G1 phase [log2(G1/M)] for the whole of chromosome 17 is shown on the right. Genomic

18 densities of B1 and L1 repeats are shown at the bottom.

19 (E) and (F) DNA FISH analysis of early embryos. Representative FISH images and the scatterplot

20 of the segregation index of L1 and B1 DNA are shown in panels (E) and (F), respectively. Data

21 are presented as mean ± SD (embryos were collected and processed in two independent

22 experiments). N, number of nuclei analyzed. Each sample is compared to the blastocyst stage

23 and p values are calculated with the two-tailed Student’s t-test. Dotted lines show the trend-line.

24 (G) Boxplot showing a gradual increase of homotypic versus heterotypic interactions between L1-

25 rich and B1-rich regions based on Hi-C analysis of early embryos from zygotes (including PN3

26 and PN5 stages) to 8-cell embryos and pluripotent mESCs (in vitro equivalent of the inner cell 30

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 mass cells of blastocysts). Each sample is compared to mESCs and p values are calculated with

2 the two-tailed Student’s t-test.

3 (H) The upper panels show heatmaps of normalized interaction frequencies at 500-kb resolution

4 on chromosome 17 in mouse embryos at the early 2-cell, late 2-cell and inner cell mass (ICM)

5 stages. The lower panels show comparisons of the contact frequencies between Late 2-cell and

6 Early 2-cell (left), and ICM and Early 2-cell (right). Genomic densities of B1 and L1 repeats are

7 also shown.

8

31

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 4. Repeat RNA and transcription promote the spatial segregation of L1 and B1

2 compartments.

3 (A) Developmental analysis of embryos microinjected with scramble or L1 AMO (left) or with

4 scramble or two different B1 ASOs (ASO-1 and ASO-2) (right) at the zygote stage. N, number

5 of embryos analyzed.

6 (B) Embryos depleted of L1 RNA by AMO show poorer L1/B1 segregation, as indicated by

7 significantly lower L1/B1 segregation indexes compared to the scramble AMO control and

8 noninjected late 2-cell embryos. Each dot represents an embryo analyzed. Black and grey dots

9 represent injected and noninjected embryos, respectively. Data were collected from three

10 independent experiments. p values are calculated with the two-tailed Student’s t-test.

11 (C) and (D) DNA FISH analysis of L1 (green) and B1 (red) repeats in mESCs treated with

12 transcription inhibitors for 3 hours. Representative images and the scatterplot of the segregation

13 index of L1 and B1 DNA are shown in (C) and (D), respectively. DMSO: mock control; DRB

14 (100 μM): inhibits the release and elongation of Pol II; ActD (1 μg/ml): inhibits both Pol I and

15 II. Both drug treatments disrupt the perinucleolar staining of L1 DNA and induce mixing of the

16 L1 and B1 DNA signals. Treatment with ActD elicits a stronger mixing effect than DRB.

17 (E) and (F) DNA FISH analysis of L1 (green) and B1 (red) repeats in mESCs transfected with

18 scramble AMO for 36 hours, or L1 AMO for 12 hours and 36 hours, or treated with the drug

19 AZT for 12 hours. Representative images and the scatterplot of the segregation index of L1 and

20 B1 DNA are shown in (E) and (F), respectively.

21 All scale bars, 5 μm. N, number of nuclei analyzed. Data are presented as mean ± SD (two

22 independent experiments), and p values are calculated with two-tailed Student’s t-test.

23

24

32

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 5. L1 RNA is required for the formation and maintenance of higher-order chromatin

2 structure.

3 (A-B) Hi-C analysis of mESCs treated with scramble control (SCR) and L1 AMOs for 36 hours.

4 Two independent experiments of AMO transfection followed by Hi-C were performed. Panel A

5 shows Hi-C heatmaps of contact frequencies for SCR and L1 AMO (panel i) and the comparison

6 of contact frequencies (panel ii) between scramble and L1 AMOs [log2(SCR/L1)]. The whole

7 of chromosome 17 (chr17) is shown on the left (at 500-kb resolution) and the boxed sub-region

8 (18-60 Mb) of chr17 is shown enlarged on the right. To better orient the visualization and

9 comparison of these three sub-region heatmaps, representative homotypic interactions (ce, DF,

10 and eg) are labeled with black boxes in the right panels. Relevant B1- and L1-enriched

11 compartments are labeled as in Figure 1. Panel B shows a boxplot comparing the Hi-C-based

12 segregation indexes. L1 AMO led to decreased ratios of homotypic interaction versus

13 heterotypic interaction between L1-rich and B1-rich regions compared to SCR AMO. The p

14 value was calculated by the two-tailed Student’s t-test.

15 (C) Schematic representation of compartmental interactions before and after depletion of L1 RNA

16 based on Hi-C data shown in the panel (A). The control mESCs show strong homotypic

17 interactions (indicated by blue solid lines), whereas L1-depleted cells show weakened

18 homotypic interactions (dotted blue lines) and abnormal increases of heterotypic interactions

19 (indicated by brown solid lines).

20

33

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 6. L1 RNA is required for spatial segregation of L1 and B1 compartments.

2 Oligopaint DNA FISH analysis of four representative loci before and after depletion of L1 RNA

3 for 36 hours in mESCs. (A) Scheme of relative chromosomal localizations and spatial distances

4 of the four compartments labeled by Oligopaint DNA FISH probes. B1-enriched domain: F. L1-

5 enriched domains: e, g, and q. Representative images and quantification of the spatial distances

6 between two labeled compartments are shown in panels (B) and (C), respectively. Each dot in

7 panel (C) represents a nucleus. In cells treated with L1 AMO, the median distance between the

8 two B1 and L1 compartments e and F is significantly decreased compared to the control, while the

9 median distance between the two L1 compartments g and q is significantly increased. Data are

10 presented as mean ± SD. N, number of nuclei analyzed. p values are calculated by the two-tailed

11 Student’s t-test. Scale bars, 5 μm.

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26 34

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 7. L1 RNA binds to its own DNA and promotes the phase separation of HP1α.

2 (A) Genome browser view of mouse chr17: 18 - 60 Mb (left) and chr19: 3 - 12 Mb (right). The

3 first five rows show the genomic density of B1 repeats, B1-related sequencing data (ChIRP-

4 seq signals of Malat1, ChIP-seq signals of Pol II, and H3K27ac), and RNA-seq in mESCs. The

5 lower tracks show the genomic density of L1 repeats (highlighted by beige shading) and L1-

6 related sequencing data, including ChIRP-seq signals of L1 RNA, ChIP-seq signals of HP1α

7 and H3K9me3. Refseq gene annotations are also included.

8 (B) Biotinylated L1 RNA pulls down recombinant HP1α protein. Reactions without addition of L1

9 RNA were served as mock control.

10 (C) Representative images of droplet formation at different concentrations of HP1α protein and L1

11 RNA. Concentrations of HP1α and RNA are indicated at the top and left of the images,

12 respectively. Scale bars, 10 μm. Data are representative of 3 independent experiments.

13 (D) The model. First, the intrinsic self-assembly property of L1 and B1/Alu repeats provides

14 numerous nucleation points to seed the formation of nuclear subdomains. Repetitive DNA

15 sequences also serve as anchor sites for transcription machinery, regulatory proteins and RNAs.

16 Second, the embedded structural information in DNA repeats may be translated by their RNA

17 transcripts, together with interacting DNA and/or RNA-binding proteins. Molecular crowding

18 generated by interactions among DNA sequences, RNAs, and proteins in subnuclear domains

19 seeded by individual clusters of L1 or B1/Alu, may drive the aggregation of compartments

20 containing the same repeat type through a phase-separation mechanism, which consequently

21 folds the genome. Third, the nuclear segregation of L1-enriched B compartments and B1/Alu-

22 enriched A compartments may be further reinforced by attaching their DNA sequences to

23 subnuclear structures such as nuclear speckles and the nucleolus, respectively, which serve as

24 scaffolds to stabilize the nuclear architecture.

35

bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1

A i) Genomic distribution ii) chromatin state iii) ChIRP iv) RNA C

B A B A

B1 L1 A/B 696 696 compartmentsin mESC Gene ISM1 FOXA2 EIF6 TOP1 PLTP Cn Cn+1 Low High (abundance) Gene B/A B (i) Mouse 0.2 mESC 0.2 NPC 0.2 Neuron L1 B1 0.1 0.1 B A B A 0 0 0 -0.1 -0.1 B1/Alu Relativecontent High -0.2 -0.2 -0.2 L1 hESC K562 (ii) Human 0.06 0.06 GM128780.06 ERV1 Low 0.04 0.04 Random 0.02 0.02 0 0 0 -0.02 -0.02 -0.04 -0.04

Relativecontent mESC: intra-chromosome correlation -0.06 A -0.06 B A -0.06 B A B F (chr17:35 - 95 Mb) D compartments mESC: intra-chromosome interaction [chr17: Log2 (interaction matrix)] k m o q u i H J L N P R T F H J L N P R T F* e g L1 B1 B1 B1 L1

0 20 40 60 80 95 (Mb) L1 e g* i k m o q u B1 L1 B1 L1 G mESC: inter-chromosome correlation E (Fraser et al., 2015) c e g i k m o q u

cg chr17:18 - 60 Mb DH ce DF ei cD De eg FH 61 Mb)

gi - c D e F g H i B1 D F H

c g Chr19 (0 L1 e i mESC

NPC W X Y Z V W Y X Z Neuron V Pol II B1 L1 B1 (mESC) B1 A compartment B compartment L1 L1 c e g i k m o q u Chr17 (0 - 95 Mb) bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2 A B

B1 L1 DAPI

B1 L1 1.0

0.8

0.6

B1/L1 L1/DAPI B1/L1/DAPI 0.4

0.2

Fluorescent Fluorescent intensity (A.U.) 0.0 0 5 10 15 20 Distance (μm)

DNA FISH IF Merge C DNA FISH IF Merge BB11 ((Alu)Alu) LINEL1 1 NCLNCL DAPIDAPI LINE L1/B11/B1 ( Alu ) LINE L1/1/NCLNCL B1 (B1/L1/NCLAlu)/LINE1/ NCL

mESC

NSC

NIH3T3

C2C12

Hela Hela Hela C2C12 NIH3T3 NSC ESC Early Late 4 2 2 cell stage Blastocyst G cell stage cell stage Anaphase Prophase S phase 1 1 Zygote E A Morula phase

Blastocyst Morula 4-cell Late 2-cell Early 2-cell PN4 Zygote S phase G1 phase Anaphase Prophase ♀ bioRxiv preprint B1 L1 Merge L1 B1 B B1 L1 Merge L1 B1 B 1 1 was notcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. doi: LINE LINE https://doi.org/10.1101/802173 1 1 Merge Merge H F D B

Segregation index (FISH) ; Log2 (interaction matrix) Log2 (interaction matrix) Log2 (interaction matrix) chr17 this versionpostedOctober13,2019. of L1 and B1 - - -

0 25 50 75 100 2 (Late Log2 0 25 50 75 95 (Mb) 0.9 0.6 0.3 0.0 0.3 0.6 0 25 50 75 95 (Mb) -

1.0 1.0 1.0 Segregation index (FISH) N=11

2.4E of L1 and B1 0.0 - - 0.4 0.2 0.0 0.2 0.4 - Early 2 Early 15 Based on DNA FISH DNA on Based N=15 Based on DNA FISH DNA on Based M phase G1 phase log phase G1 phase M - 4.5 ( cell / Early Early 2 / cell N=12 2.6E p p E value vs Blastocyst) vsvalue value vs Sphase vsvalue - N=21 - 18 cell Late 2 Late cell - 10 4.0E 10 6.2 N=12 E N=17 4.0 Single cell Hi cell Single - 15 - - 04 1 cell) Log2 (ICM / Early Early 2 / (ICM Log2 cell) A/B .6E N=21 B1 N=21 L1 - 09 5.0E N=22 - - C analysis of the cell cycle (mESC) cycle cell of the C analysis 4 C The copyrightholderforthispreprint(which G

- Segregation index (Hi-C) cell ICM ICM cell of L1 and B1 0.5 1.0 1.5 2.0 2.5 Segregation index (Hi-C) 3 4 5 of L1 and B1 - cell) cell) 1.0 1.5 2.0 2.5 3.0 9.5E - 11

B1 Hi on Based Based on Hi on Based ( 0.0002 p 7.6E

L1 mESC) vsvalue (Nagano et al, 2017) et(Nagano - 0.0 - 2.0 2.0 2.0 A/B 08 5.9E (Du et al, 2017) etal, (Du - C - 06 - (G1/ M) (G1/ C Figure 3 Figure 5.0 bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4

A N=52 N=46 N=38 N=31 N=48 B n.s. 1 100 0.4

80 0.2 2.5E-05 n.s . 0.0013

60 0.0 0.5 40 -0.2 of L1 and of L1 and B1 Developmental progression (%) 20 -0.4 0 0 Segregation index (FISH) -0.6

Embryos beyond 2-cell stage 2-cell stage

C B1B1 LINEL1 1 DAPIDAPI MergeMerge D 0.0002 0.4 0.008 DMSO 0.2

0.0 DRB N=12 -0.2 of L1 and of L1 and B1 N=15 N=15 -0.4 Segregation Segregation index (FISH)

ActD ActD DRB DMSO

E F B1 L1 DAPI Merge 0.20 3.0E-12 0.4 2.5E-07 Control 0.2

0.0 N=29 N=14 -0.2 of L1 and of L1 and B1

L1 L1 AMO -0.4 Segregation Segregation index (FISH) N=12 N=29 -0.6 AZT Treatment Treatment 36 h 12 h SCR AMO bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5

A i) Log2 (interaction matrix) of chr17 Zoom-in views of chr1718-60 Mb a) SCR AMO

cF Dg ce DF cD De eg eF c *D *e F g H B1 D F H c e g i L1

0 25 50 75 95 A/B A compartment B compartment 3.0 6.0 b) L1 AMO

ce DF eg

c *D *e F g H B1 D F H c e g i L1 A/B 0 25 50 75 95 3.0 6.0 A compartment B compartment ii) Log2 (SCR/L1) of chr17

ce DF eg

c D *e *F g H B1 D F H c e g L1 i

A/B A compartment B compartment

Mb: 0 25 50 75 95 -1.0 1.0 B C chr17:18 - 60 Mb 6.2E-15 B1-B1: Strong 2.5 B1-L1: D F H absent

C) c e g i - L1-L1:strong

2.0 L1 AMO of L1 and B1

B1-B1: weak Segregation index (Hi 1.5 B1-L1 D F H increased #1 #2 #1 #2 c e g i L1-L1: weak SCR AMO L1 AMO bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 6

Oligopaint FISH A chr17:35 - 95 Mb C F (i) e and F (ii) g and q B1 m)

μ 2.3E-04 5 5.9E-06 L1 4 4 e g q 3 3 2 e F g q 2 1 L1 AMO 1 0 0 Distance Distance of two closest alleles ( e F g q A1A and1 and B1 B1 B (i) e in green and F in red (ii)B g2 inB greenand2 and and B q3 inB red3 SCR AMO SCR AMO

SCRSCR AMO AMO 36 36h h

e, F g, q

L1 AMO L1 AMO

LINELINE1 AMO1 AMO 36 36h h

e, F g, q bioRxiv preprint doi: https://doi.org/10.1101/802173; this version posted October 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 7 A C HP1α (μM)

B1 DNA 4.8 6.25 12.5 25 50

Malat1 ChIRP-seq 6.0 200 Pol II 1.0 seq - H3K27ac 1.8 100 ChIP RNA-seq 40

L1 DNA 2.5 75

L1 ChIRP-seq 25

0.2 50

HP1α ng/μL) seq ( - H3K9me3 0.5 ChIP 25

Refseq genes L1 RNA chr17:18 - 60 Mb chr19:3- 12 Mb 12.5

B L1 RNA (biotinylated) Mock 6.25 NaCl (M) 0.15 0.3 0.075 0.15 0.3 0.5 1 1 +Urea

Elution 0

Input

anti-HP1α western blot

D Nucleus LAD L1-enriched compartments NAD

Nuclear speckle Nucleolus

lamina DNA: L1 SINE (B1/Alu) RNA: L1 others (eg: Malat1) Proteins: Nucleolin (NCL) B1-enriched Heterochromatic, eg: HP1α compartments Euchromatic, eg: Pol II Biophysical forces: Self-aggregation Phase separation