bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

DNA-RNA hybrid (R-loop):

from a unified picture of the mammalian telomere to the genome-wide

Minoo Rassoulzadegan1*, Ali Sharifi-Zarchi2 and Leila Kianmehr1

(1) Université de Nice, INSERM-CNRS, France

(2) Royan Institute, Tehrān, Iran

Running title: The areas in R-loop : from the telomere to the paternal genome

10

*corresponding author: [email protected]

Summary

Local three-stranded DNA-RNA hybrid regions of the genomes (R-loops) have been detected either by binding of a monoclonal antibody (DRIP assay) or by enzymatic recognition by RNaseH. Such a structure has been postulated for the mouse and human telomeres, clearly suggested by the identification of the complementary RNA TERRA. It was, however, evidenced by DRIP exclusively in human cancer and not in normal human nor in any mouse cell. Based on the observation that in several fractionation procedures, DNA-RNA hybrids copurify with double- stranded DNA, we developed a preparative approach that allows detection of stable DNA-RNA triplexes in a 20 complex genome, their physical isolation and their RNA nucleotide sequence. We then define in the normal mouse and human genomes the notion of terminal R-loop complexes including TERRA molecules synthesized from local promoters of every . In addition to the telomeric structures since, the finding of the RNA peak, applied in addition to the whole murine sperm genomes, has highlighted a reproducible profile of the R-loop complexes from entire genome suggesting a defined profile for a future generation.

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Introduction

In the recent period, increased attention has been drawn to R-loop complexes, which appear as frequent features of the eukaryotic genomes. These three-stranded structures are made of the two strands of a DNA molecule, one of them displaced by hybridization with a complementary RNA. In addition to short-lived intermediates generated during transcription, detrimental effects of stable local R-loops were documented in several instances, while, on 30 the other hand, they have been considered as potentially important players in genome biology1-2. The multiplicity of possible functions is, however, in contrast with the limited number of analytical methods, essentially based on the recognition of the DNA-RNA hybrid either by the monoclonal antibody S9.6, or a catalytic deficient RNase H, coupled with DNA-seq and less frequently, with RNA-seq analysis (reviewed in refs2-7). Methodological problems are illustrated by the variety of conclusions reached in different studies2.

One physiological instance of such a three-stranded hybrid structure has been proposed for the mouse and human telomeres6-9. Following early reports of a positive DRIP assay10,11, a consensus seemed established of an R-loop structure involving the terminal DNA repeats of the and the complementary TERRA RNA. Evidence was, however, limited to the telomerase-negative human tumor cells (ALT tumors) without any experimental evidence in a healthy human cell, nor in any mouse cell9. We developed an antibody-independent 40 assay for detection of stable R-loops. Based on the isolation and sequencing of complementary RNAs copurified with the DNA backbone it can be applied to any biological cell or tissue. First tested on the human and mouse telomeres, it established that in both species, TERRA molecules transcribed from every adjacent subtelomeric promoters are engaged in terminal R-loops. How TERRA example might reflect a stable R-loop structure in general is currently unknown. Here, we study similar research applied to mouse genomes at multiple sites reveals that DRNAs are tightly linked to DNA in complexes sensitive to RNaseH. We integrate the search of enriched peaks from RNA-seq data to discover reproducible DNA/RNA regions throughout the genome. These results also highlight the pairing region of the two sex chromosome (X and Y) which are also present as a DNA-RNA hybrid.

Results

Antibody-independent profiling of R-loop complexes: a general approach

50 We devised a simplified, quick and inexpensive method for detection of RNA-DNA hybrid regions in a complex genome (Supplementary Fig.-1). Unlike the current DRIP assay, it is not based on antigenic recognition of the hybrid, with the advantages of avoiding possible artifact at this level. A variant method was developed starting from the observation that in the most generally used TRIzol chloroform extraction procedure3,4, a small but reproducible amount of RNA is retained, together with DNA, at the chloroform-water interface. This material as a whole was dissolved in Tris-HCl 10mM, EDTA 1 mM and ethanol precipitated. The pellet was dissolved in proteinase K hydrolysis buffer and DNA and DNA-RNA complexes were recovered by chromatography on a DNA binding “Zymo-SpinTM” column (Zymo-Research Corp Irvine CA, USA). As illustrated below in the case of the telomeric complex, a specified genomic area could be separately analyzed by the proper DNA cleavage procedure. Alternatively, whole preparations of the DNA-associated RNA were obtained by extensive pancreatic 60 DNase hydrolysis.

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

The telomeric TERRA complex

The ends of the human, mouse and yeast linear DNA molecules constitutive of individual chromosomes are long TTAGGG/CCCTAA repeats. After discovery of the homologous UUAGGG-repeated TERRA RNA5-6, the DNA telomeric repeats were thought to be engaged in three-stranded R-loops with TERRA. Attempts in normal tissues to directly evidence the R-loop structures have been so far only partially successful. In cell culture mostly from two human pathologies could R-loops be evidence by DRIP assay, namely the telomerase-negative (ALT) cancers7 and the ICF syndrome cells8. Structure of the telomeres of healthy human cells, as well as of any murine cell, remained to be directly evidenced. One main site of TERRA transcription was identified in mouse and human cells9,10 and it has been even considered that transcription from individual subtelomeric promoters on every 70 chromosome is a unique characteristic of a class of malignancies (ALT cancers).

Results were generated in parallel on human and mouse sperm cells. Sperm cells were chosen because the majority of cytoplasmic RNAs is removed at the compacting stage, so that the risk of contamination by cytoplasmic RNAs is minimized. The total nucleic acid fraction recovered from the TRIzol interface after purification was subjected to Msp1 cleavage for resection of the DNA telomeric sequences up to the first upstream CCGG site, less than 2,000 bp from the repeats on every chromosome (see Supplementary Figure 1). After gel electrophoresis in either 8 per cent acrylamide or 1.5 per cent agarose followed by transfer (without denaturation of the nucleic acids) to nitrocellulose for binding of the single-stranded RNA and DNA material, hybridization with oligonucleotide probes complementary to the telomeric repeats revealed a unique and homogeneous signal. Identical results were generated in every cell type tested, exemplified in Figure 1 (Fig.1a-c) for three mouse healthy tissues, brain (B), 80 total testis (T) and purified epididymal sperm (S). In Fig.1a the Telomere probe and Supplementary Table-2 shows the electrophoretic pattern of RNA recovered in the TRIzol aqueous fraction and in Fig.1b, the pattern of the material from the interface DNA-associated fraction (“Tatar-blot” for Telomere associated TERRA RNA). The same gel electrophoretic profiles were generated by probes against the CCCTAA and TTAGGG motifs (Fig. 1b, 1c and Supplementary Table-2). DNA denaturation by NaOH after electrophoresis, a step of the classical Southern blot (see below), is obviously omitted in the Tatar-blot.

Specificity was determined by hybridization of the transfers to radioactive probes for other repetitive (SINE, Xist- A) or single copy genomic sequences (Supplementary Table-2), none of them generating a significant signal (data not shown). Electrophoretic migration of the complexes, equivalent to that of a high molecular weight linear DNA molecule (Fig.1 panel b), was clearly not informative as to the actual size. When the same extracts were further 90 processed for Southern blot analysis by denaturation in 0.5 N sodium hydroxide, transfer and hybridization (Fig. 1d) revealed molecules with a smeary profile starting with the expected large size of the telomere (>12 kb).

Complexes with the same electrophoretic profiles were identified in comparable amounts in every mouse cell tested, from embryonic fibroblasts to adult tissues (testis, brain, liver, kidney) as well as cultivated cell lines and short-term cultures of differentiated cells, as exemplified by mouse brain extracts in Fig. 1b. This was also the case of human tissues (Supplementary Table 1), including saliva, blood and sperm (Fig. 1e). Identical electrophoresis profiles were generated for murine and human sperm extracts by probes against the CCCTAA motive (Fig. 1e). These complexes do not appear to depend on the telomerase, as they were observed in three generations of telomerase-negative mouse cells (both Tert-/- and Terc-/-, Fig. 1e). In the case of the Tert-/- and

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Terc-/- mutant, they could also be evidenced in embryo fibroblasts culture from homozygous crosses during the 100 first three generations in which the mutants maintain telomeres11,12 (data not shown) and after immortalization in culture of mutant cell lines (MR, unpublished).

Properties of the complexes were compatible with the R-loop structure schematically drawn in Figure 2 (Fig. 2a), the G-rich strand of telomeric DNA displaced by the hybrid of TERRA (Fig. 1b and d). Nucleolytic attack was performed by incubating extracts (20 µl) for 30 to 60 min with either RNaseA 0.5 µg/µl, RNaseH 1 u/µl, or DNase 10 u/µl (Fig. 2b). The electrophoretic signal disappeared after incubation with DNase, pancreatic RNase, formamide, or RNaseH, while it was not affected by RNase A (Fig. 2b). Another feature of the complex was indicated by a slower migration rate in gel electrophoresis performed in the presence of 20µM PhenDC3(4), a reactant of G quadruplexes13 (Fig. 2b). Visualization of the complex requires resection from chromosomal sequences. The clearest results were generated by cleavage with Msp1 at sites (CCGG) present at least once in the 110 immediate subtelomeric 1-2Kb of every mouse and human chromosome, so that the contribution of chromosomal sequences is minimal (facilitated transfer on the nitro cellulose membrane). Other restriction endonucleases tested (BamH1, Bgl2, Alu1, Mse1, and Dpn1) did not generate comparable results, most probably because they cleave at greater and variable distances from the telomeres, thus generating larger double stranded fragments that are not retained efficiently on nitrocellulose. In agreement with previous observations of increased methylation of CpG sites in subtelomeric regions14, cleavage with Hpa2, a methylation-sensitive isoschizomer of Msp1, failed to evidence the complex (not shown).

The same results were also generated in parallel on a series of human and mouse cells, ex vivo organ extracts most of them normal cells, but including cultured cell lines (Supplementary Table-1), and, as a positive control, the human U-2 OS cancer cell line (ATCC® HTB-96™) in which DNA-RNA telomeric hybrids have been previously 120 identified by DRIP7.

Transcription of individual telomeres from local subtelomeric promoters

A key point was to establish whether the observed structure is present in only one or a few telomeres, or, to the contrary, is a general feature of the genome. It has been generally assumed that R-loops are generated by transcription from adjacent promoters. On the other hand, a major part of TERRA has been reported to be transcribed from in mouse cells and from chromosome 20q in human9,10 so that our data so far are compatible either with a unique R-loop on these chromosomes or with integration in trans in other chromosomes of RNAs from the mouse 18 and human 20q loci. To the contrary, however, it appeared that unique subtelomeric sequences of different chromosomes are represented in the DNA-associated fraction. We generated probes for sequences between the first Msp1 site and the G-rich repeats of the telomeres of chromosomes 9, 17, 18, 19, X 130 and Y telomeres and probes for the complementary strand up to the C-rich repeats (indicated “xxx / x’x’x’” in Fig. 2a). By using UCSC genome browser Blat tool15, we ensured that these flanking regions are unique in the whole genome. All the probes (Supplementary Table 2) generated positive hybridization signals in the complex as illustrated in Fig. 3a for the subtelomeric sequences of chromosomes 17, 18 and Y. As controls (not shown), none of the probes for sequences on the centromere side of the terminal Msp1 sites generated any signal.

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

We further ascertained by Illumina high-throughput analysis of the DNA-bound RNA molecules that their sequences are compatible with transcription in each telomere from a local promoter Figure 3 (Fig. 3 and Supplementary File 2). Starting from purified mouse sperm, we first confirmed by RNA-seq that a fraction of the DNA-associated RNA molecules amounting to 0.5 to 2 per cent of the reads showed the characteristic UUAGGG TERRA repeats (minimum four repeats, Supplementary Fig.3). We then searched the RNA sequence libraries for 140 chromosomal sequences next to the repeats and extended the search in the 5’ direction to collect upstream flanking regions. We again ensured that these flanking regions are unique in the whole genome. As exemplified for chromosomes 2 and 10 in Fig. 3b, their 5’ extension in the mouse genome indicated that TERRA expression originates in each case from promoters in the immediate subtelomeric region. Complementary evidence for the synthesis of these RNAs from distinct promoters is shown in Fig. 3c with the heatmap showing the percent identity matrix between each pair of sequences of chromosomes 2 and 10. The maximum identity is 93% making thus impossible that these reads could originate from a unique chromosome. The available sequence data and libraries allowed us to reach the same conclusion for chromosomes 2, 3, 5, 10, 12, 13, 14, 16, 19 and X see Supplementary Fig. 2. R-loop structures involving the products of local transcription therefore appear as a general feature – especially taking into account some constraints in the analysis, for instance the fact that in the latest mouse 150 assembly (mm10), chromosomes 4, 6, X and Y remain to be sequenced up to the repetitive telomeric tract. Testes samples TD1 and TD3 also contain TERRA in their DRNA fractions as shown in Supplementary Fig. 2.

Extension to the telomeres in human sperm: TERRA transcripts in the individual telomeres from local subtelomeric promoters

We then asked whether subtelomeric transcription may occurs also from every chromosome in human. Then, we assessed RNA sequencing from human sperm of healthy unknown donors, see methods for preparation RNAs from attached to DNA and free fractions. Interestingly, DRNA fraction from human sperm as in mouse show hybrids telomeric TERRA RNAs Fig. 3d RNA-seq signals over TERRA sequences at p- and q-arm telomeric regions of several human chromosomes. Each track shows a different sample (hD_RNA_1-4 are DNA-bound RNA samples, and hRNA_1-4 are cytoplasmic RNA samples). The heights of the peaks show normalized 160 expression level (number of reads per 10^7 reads) over each genomic region, with equal 0-50 scale.

TERRA RNAs resulted from every telomeric end in mouse and human normal cells is consistent with idea that TERRA RNAs is expressed from subtelomeric promoters and remained associated to the telomeres ends. Our results has to be confronted with the previous reports of major TERRA promoters in C18 (mouse) and 20q (human) chromosomes9,10. There is in fact but an apparent contradiction since these promoters were identified in experiments in which TERRA was prepared by standard TRIzol extraction from abnormal cells such as cancers cells, and thus corresponds to the nucleoplasmic RNA fraction thought to be involved in other, non telomeric functions16,17. Thus, it is important to recover TERRA RNAs from both fraction nucleoplasmic and DNA associated.

Extension analysis to other DRNAs with UUAGGG motif repetition

170 Since the association of TERRA transcripts with telomeres could be evidenced by the proposed method, we also asked if TERRA hybrids located at defined non-telomeric sites could be detected. In order to try to distinguish

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

stable complexes of “R-loop” 18,19 and the transient association of the nascent RNAs the framework of the process of transcription, we carried out a possible verification in the data from DRNA-seq of mice as above (mouse sperm and testes). In Supplementary Figure 3 for a general evaluation of the RNA molecules associated with DNA (indicated “DRNAs”), the RNA-seq analysis has been extended to the entire genomic UUAGGG repetition containing transcripts in mouse sperm, in the total testicular tissues (indicated D1 and D3, and TD1 and TD3 in the Supplementary Figure 3).

Analysis of entire genome heatmap in Figure 4a shows representative landscapes of RNA-seq of chromosomes 2, 10, 12 and 18. For each chromosome, the green strip shows the location of TERRA-homologous regions 180 (minimum 4 consecutive TTAGGG repeats) which are present at several chromosomal locations in DRNA fractions.

In addition, research on TERRA has identified a fraction of TERRA transcripts near the sex chromosome that undergo pairing in both sex, the inactive X chromosome (Xi) of female cells and the Y 17 chromosome (see Supplementary Figure 4 a and b). Exemplified transcripts are visualized (Figure 4b) within the pairing center in the testes and sperm RNAs data. As shown in the Figure 4b Erdr1 locus with UUAGGG repeats was present from the Y chromosome. On the other hand, the Asmt transcripts are present on X chromosome. These results indicate a sample retention of PAR-TERRA transcripts in pseudoautosomal regions of X (Asmt) and Y (Erdr1) in the DRNA fraction of the mouse sperm and testes.

DRNA is not restricted to the UUAGGG repeats

190 RNA-seq analysis of sperm and testicles of mice reveals transcripts from the whole genome ref. Here we have analyzed data only from mice because DRNA-seq data is available for two types of samples: sperm (haploid) and testis containing both haploid and diploid cells (germ and somatic cells). A general overview of RNA associated with DNA revealed, in addition to TERRA, a wide variety of sequences, as shown for mouse sperm and testes in Fig. 4. In the sperm DRNA fraction only a few significant peaks are observed at several sites in the genome and are also present in all samples (R and TD) with high reproducibility while a number of low signals are present along of all chromosomes, specific to the types of preparation (homologue between D or homologue between the R fractions). To further characterize these relevant peaks we have performed the search for , these sequences are transcribed from different genomic regions see Supplementary Table 3 with an average of 1kb and do not contain sequences or homologous motifs.

200 To further test the possibilities offered by the analysis, we compared the distribution between the DRNA of the main hybrid regions and the unbound rRNA between sperm and somatic (testis) cells. The annotated positions of the promoter-TSS (transcription start site), TTS (transcription termination site), exons, introns and other characteristics were based on mm10 transcripts. In Figure 6 (Fig.6a) we see a higher level of intergenic transcripts in sperm DRNA samples compared to all others. While sperm-free RNA and testes DRNA are distinguished in the exon regions. In the intergenic region, only a single repetition is observed (Fig.6b). In Supplementary Table 4 the enriched transcripts are listed in different regions. These results clearly reveal that in addition to TERRA, other transcripts are associated in the structure of the R-loop hybrids. Preferential intergenic transcripts retention in sperm DRNA fractions are still unknown but cannot be ignored in future research plan.

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Discussion

210 Here, we have (i) demonstrated that a fraction of TERRA RNA is stored packed invivo with the genome in the form of DNA/RNA hybrids in the telomeres of each chromosome in sperm (mouse and human) and others mouse organs (testes, brain, and liver) (ii) found that non telomeric TERRA RNA repeats at different genomic level are also attached to the DNA (iii) identified that a fraction of the TERRA repeat containing transcripts near the sex chromosome are attached as DRNA to chromosome Y or X and comes from separate (iv) define a general profile of transcripts attached to DNA throughout the mouse genome and discover especially important difference in intergenic transcripts retention in sperm DRNA fraction and the different proportion of groups of transcripts attached to DNA between sperm and testicular cells.

We propose that the hybrid DNA/RNA interaction in semen would serve to preserve RNA signalling and could easily be extracted to be identified by RNA-seq analysis and further tested in functional tests. DNA associated 220 RNA (DRNA) extraction is straight forward without need of a supplementary interacting guide molecules (antibody, product), cheap, highly reproducible and efficient with bioinformatics analysis to generate genome-wide picture of DNA/RNA hybrids transcripts. Our data shows that all of these stable R loops could provide insight into the amount of RNA maintained in a silent state in the sperm genome. These transcripts are of interest in view of the various reports highlighting a role for sperm RNAs in the epigenetic control of gene expression20-21.

The parallel is possible. With S9.6, RNase H and more recently MapR22 established that the GC content is important in the recognition of the R-loop by antibodies and in particular by RNaseH23. Alternatively, here unlike the previous methods, R-loop detection without of DNA/RNA hybrids in normal cells (sperm and testes) reveals, the stable structure of DNA-RNA hybrids which are not particularly rich in CpG. The two types of 230 interaction are not mutually exclusive. However, additional factors must be involved to select or keep the signals. Our study suggests that the difference between antibody-directed (complex) and DNA-RNA hybrids is that, although both appear to be R-loop detection, the latter apparently despite the fact that are highly specific does not involve any other mechanisms, these are direct DNA/RNA interactions.

In the testes, several studies are already reported genome-wide transcripts and it is also assumed that from last stage of elongated spermatids, with elimination of the majority of cytoplasmic RNAs, genome-wide transcripts remain in the sperm. We have recently reported profiles of DRNA-seq mouse sperm transcripts from all coding and non-coding regions of the genome24. Contrary to the analysis of the previous individual transcripts, we have here carried out a peak annotation in order to associate the location of peaks/region. This was to assign maps in terms of genomic characteristics.

240 Genomic profiles

The purpose of the region in the R-loop is very variable and debated. Our current data provide clear evidence of stable region of the R-loop. In addition, our study shows that sperm (silent cell) in the DRNA fraction reserve transcripts ratios different from the free fraction in sperm and testes DRNA (active cell). Although spermatozoa do not express transcripts they must still, in a manner still unknown select transcripts to be transported. Sperm

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(DRNAs fraction) have a significantly higher level of intergenic transcripts than exon, intron, 3’, 5’, promoter- TSS (transcription start site), TTS (transcription termination site) and non-coding transcripts. These DNA-RNA hybrids regions detected here consist mainly of short, simple repeat sequences. Others repeated elements such as LIN, SIN LTR and satellite are also present but less frequently. Intergenic transcripts have different characteristics from mRNA and are often involved in chromatin remodelling, transcription control, including enhancer or 250 silencing activities. Therefore, a difference in the level of DNA/RNA hybrid in different regions in particular a higher level in the intergenic region compared to other regions, can be determined by the state of the R-loop of the spermatozoa. Although the accumulation of R-loops is described with genomic instability, here, the DNA-RNA hybrid detected at the physiological level could have a role in cellular characteristics, serving as signal to keep cells in the active or inactive stage. In human spermiogenesis, the large abundance of intergenic RNAs is reported25 but now to clarify their roles more functional research is needed. In addition, a recent study suggests that R-loops are involved in determining cell fate 26.

Genomic profiles of the TERRA (telomeric and non-telomeric)

DRNA-seq analysis applied to TERRA complexes of the human and in particular mouse telomeres provides significant new information. The fraction of the telomere associated with DNA synthesized from promoters 260 adjacent to all chromosomes end, is demonstrated in human cells and in a variety of mouse cells and tissues, a conclusion that could not be completely reached with DRIP-seq analysis. Molecular analysis evidences DNA- RNA hybrids at the level of telomeres are maintained in all types of cells. In addition, RNA-seq analysis in the testes, sperm of mice and human sperm confirmed that the DNA-RNA hybrids (telomeres) transcribed in the testes of all chromosomes are maintained until the final stage of maturation in the epididymis. TERRA RNA of the sperm is thus transferred to the offspring with genomic DNA.

The feasibility tested on the analysis of TERRA associated with telomere is also used to identify from the sex chromosomes PAR-TERRA (UUAGGG)n containing transcripts previously established in ES cells17 here in mouse DNA-RNA hybrids sperm fraction. The difference in the DRNA transcripts stored between Y and X in the PAR-TRERRA linked to sex is intriguing. One wonders why sperm contain DRNA from PAR-TERRA transcripts. 270 From now on, one objective will be to understand that the DRNA linked to sex chromosome can influence the activities of the X and Y chromosome reaching the oocyte.

Evidence of DNA-RNA hybrids for both TERRA and generalization to genome-associated transcripts in semen underscore ongoing questions about how these hybrids might influence genetic and/epigenetic memory? Among several interpretations, one possibility is a phenomenon unique to sperm: the RNA produced by the spermatid at the time of conversion of the chromatin to its final compact structure can remain associated with the locus, perhaps only the last oligonucleotide segment synthesized, giving the artefactual image of a complete RNA taking into account the sequencing procedure. However, in a way in active testicular cells DNA/RNA hybrids are also observed. In addition, a noticeable difference in the peak ratios of different region between sperm and testes are visible. More than a still preliminary observation against this interpretation is the fact that our less advanced 280 analysis of somatic cells also shows a hybrid DNA/RNA structure detectable with the same method at the telomeres. These studies demonstrate that, RNA appears strongly maintained in a stable complex DNA-RNA

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

hybrid structure and other studies may reveal that its meaning and function may be intentional and selective for a given cell.

In conclusion, we believe that the approach designed to identify the telomeric TERRA has wider applications. These results are clearly encouraging and should be extended to different processes and diseases. Future research is needed to reveal specificities, biogenesis and functions of RNA associated with DNA.

Acknowledgments

We are grateful to E. Gilson, C. Gunes and A. Londono for the gift of biopsies of telomerase-negative mice and to S. Muller (Paris) for the gift of PhenDC3(4). This work is supported by grant 2019-2020 of La Fondation Nestlé 290 France to Minoo Rassoulzadegan.

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure Legends

Fig. 1 TERRA in sperm and somatic cells

a. Northern blot analysis of RNA processed from TRIzol-chloroform aqueous phase, low level in sperm and high level detection of TERRA in Brain and testes, b and c. DNA-RNA complexes (Tatar-blot for Telomeres associated TERRA RNA). Whole nucleic acid preparations were analyzed after Msp1-cleavage by electrophoresis on 8% acrylamide gels, nitrocellulose transfer and hybridization with a 32P-labelled probe against the (TT/UU)AGGG telomeric motif (b) and complementary sequences (c). d. Agarose gel (0,8%) same samples as in b and c. except for a 45 min exposure to 0.5M NaOH and renaturation (standard Southern blot and same probe as in b). In addition to large fragment, a smear of shorter fragments is detected. e. Complexes in extracts of human sperm same 300 procedure as in b. Wild type and telomerase-negative (Terc-/-) mouse mutants of the first three generations, which exhibit progressively shortened telomeres. Embryos, testis and sperm biopsies of Terc-/- animals were kindly provided by Drs C. Gunes (Germany) and A. Londono (Paris) and the same results not shown for Tert-/- animals JAX mice (Jax strain B6, 129S-terttm1Yjc/J, stock# 005423) were purchased from The Jackson laboratory, Bar Harbor.

Fig. 2 The telomeric TERRA complex

a. A simplified R-loop model. TERRA RNA (red line) is assumed to originate from a subtelomeric promoter (red arrow); blue lines: telomere and sub-telomere DNA; “xxx”: 5’ TERRA sequence from subtelomeric DNA (xxx/x’x’x’); arrows: Msp1 cleavage.

b. Complex detection Tatar blot (Telomeres associated TERRA RNA) performed as in Fig. 1b on the indicated 310 mouse cells (T: testis and S: sperm). Left to right: detection of the (TT/UU)AGGG and CCCTAA motifs (see Fig. 1b). Then complex evidenced in several conditions (control), in extracts treated with pancreatic (DNase free RNase ref. 11119915001) RNase 0,5 µg/µl, RNaseH 1u/µl (ref. M0297S), RNaseA 1u/µl (ref. 101091690 01) and DNase 10u/µl, respectively (all provided by Roche Life Science), and (right) electrophoresis after addition of either 50% formamide (denaturation conditions+/ or -) or 20 µM phenylDC(3)4 (right).

Fig. 3. TERRA transcripts originate from subtelomeric promoters

a. After electrophoresis and transfer of complex of mouse sperm performed as in Fig. 1b Tatar blot (Telomeres associated TERRA RNA), hybridization with probes for (top to bottom) UUAGGG repeat (8-fold), subtelomeric sequences of chromosomes 18, 17 and Y (schematized as xxx in Fig. 2a), and S6 ribosomal RNA as a negative control. The chromosomal sequences were randomly taken from the mm10 mouse genome assembly and checked 320 in each case to be unique in the whole genome. Left to right: analysis performed on males of three different strains, C57, Balb/c and B6D2.

DNA-bound RNAs were prepared from epididymal sperm and from the total testis (estimated 60 per cent of diploid cells) of a 6 month, a 14 month-old male and two sperm samples from man at reproductive age, all showing identical electrophoretic profiles of the DNA-RNA complexes (Tatar-blot not shown). Mouse spermatozoa were collected from the cauda epididymis and ejaculated human sperm kindly provided by the Fertility Clinics of the

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

University. In both cases, standard protocol is already reported previously and Kianmehr & al. 24 summarized: motile spermatozoa were recovered and washed MEM buffer (1 mM Na pyruvate, 0.5 mM EDTA, 50 U/ml penicillin, 50 mg/ml streptomycin, and 0.1% BSA) by centrifugation at 3000 rpm. Sperm pellets were resuspended in phosphate-buffered saline (PBS) and centrifuged again. The pellets were washed twice in 50 mM HEPES buffer 330 pH 7.5, 10 mM NaCl, 5 mM Mg acetate, and 25% glycerol. Samples of human sperm were identically processed. TRIzol extracted interface materials were ethanol precipitated and followed by overnight incubation at 56°C in Tris buffer 20 mM pH8, EDTA 50 mM, with 0.5% SDS, 20 µM dithiothreitol and 400 µg/ml Proteinase K. Total nucleic acids after enzymatic removal of the was followed by fractionation of RNA and DNA-bound RNA. Extracts were fractionated by the ZR-DuetTM DNA/RNA MiniPrep Plus protocol according to the specifications of the manufacturer (www.zymoresearch.com) see Supplementary Fig. 1. In all cases, the fraction that remains bound to the DNA further purified after DNase digestion.

About 10-100 ng of RNA were analysed by high throughput sequencing on Illumina HiSeq 2500 or Illumina MiSeq (Eurofins Medigenomix GmbH, Ebersberg, Germany). They (Eurofins) generated libraries designated D1 and D3 from the DRNAs of a 6 month (D1) and a 14 month-old male (D3), with R1 and R3 the corresponding 340 nucleoplasmic sperm RNAs. Totally, 66 millions (m) reads were mapped to unique sites in the mouse genome (D1), 69m (D3), 77m (R1) and 87m (R3). From the total testes it was prepared only the DRNA libraries TD1, (70m reads) and TD3 (82m), and from human sperm of a healthy unknown donor, the hD 1 and 4, 50m (reads) and hR 1 and 4 (88m). Average length of the reads was about 100 nt. All the primary sequence characteristics for sample libraries of the DNA-bound RNA and free-RNA sequences of germ cells are summarized in Supplementary file 2 (Table A). Sperm and testes transcripts profiling and alignment-based visualization were performed for analysis. Comparison of transcript abundance between samples as measured by RNA-seq yielded TPM (transcript per million) values for the different samples.

With FastQC version 0.11.5 quality control of the RAW sequencing reads was performed. Trimming of bad- quality reads was performed using Trimmomatic version 0.3627. Based on the FastQC results 10 nucleotides were 350 cropped from the 5' end of each read, and trimmed bases with Phred score lower than 20 from heading and trailing of each read. Also the Illumina adapter sequences were removed, and the trimmed reads with a size less than 30 bp were removed. Hisat2 was used to align trimmed reads to the reference assembly (GRCh38 for human and GRCm38 for mouse)27, with quality, alignment and plot-bamstats utilities of samtools28. HOMER version 4.9 was used for peak finding and generating bedGraph files for visualization of alignment29. The bedGraphs were cross- sample normalized to have 107 reads per sample in order to make visualization of different samples comparable. Visualization of alignment was performed using IGV version 2.3.6730. Genome-wide locations of TERRA repeats were found by aligning sequences of length 24 and 48 composed of 4 and 8 copies of TERRA to the reference genome using bowtie2 using “-a” argument to report all alignment31. We also detected statistically significant peaks of expression using HOMER (4-fold expression over local region, Poisson p-value over local region < 360 0.0001, FDR adjusted p-value < 0.001). We retrieved the sequences of those peaks located in telomeric regions using UCSC genome browser. Multiple sequence alignment of these peaks was performed using EMBL-EBI Clustal Omega35 and using R package heatmap was visualized percent identity of the alignment on matrix.

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

b. Expression patterns of mice DNA-bound RNA samples over q-arm telomeric regions of chromosomes 2 and 10. The upper track of each panel shows the whole chromosome, and the tiny red bar to the right is the focused genomic region. The blue tracks show expression signals in 3 samples (normalized read frequencies per 107 reads). The green track shows location of TERRA sequences in mice genome (minimum four consecutive TTAGGG repeats). The nucleotide sequence is also shown at the bottom of each panel with color bars (the grey bars show undetermined “N” nucleotides).

c. Heatmap showing the percent identity matrix between each pair of sequences of chromosomes12 and 10.

370 d. Sperm RNA-seq signals over TERRA sequences at p- and q-arm telomeric regions of several human chromosomes. Each track shows a different sperm sample (D_RNA_1 & 4 are DNA-bound RNAs, and RNA_1- 4 are free cytoplasmic RNA samples). The heights of the peaks show normalized expression level (number of reads per 107 reads) over each genomic region, with equal 0-50 scale. The green track shows location of TERRA telomeric repeats in (minimum four consecutive TTAGGG repeats). All human telomeric regions with a sequenced TERRA sequence in human reference assembly GRCh38 are shown.

Fig. 4. Examples of expression landscapes of the whole mice chromosomes 2, 10, 12 and 18 and from the pseudoautosomal regions of the two sex chromosomes.

For each chromosome, the green track shows the location of TERRA-homologous regions (minimum 4 consecutive TTAGGG repeats). Figure 4a, the following tracks show different sperm samples (D1-D3 are DNA- 380 bound RNA, R1-R3 are cytoplasmic RNA, and TD1-TD3 are from testes DNA-bound RNAs). The heights of the peaks show normalized expression level (number of reads per 10^7 reads) over each genomic region, with equal 0-50 scale. Known genes in RefSeq database are located at the bottom of each panel. Figure 4a, TERRA RNA spots are present in sex chromosomes (see Supplementary Figure 4). The number of TERRA RNA foci colocalized with pseudoautosomal regions for instance, Asmt and Erdr1 according to Lee et al. 17.

Fig. 5. Total transcripts analysis for peak finding on entire genome.

a.A false discovery rate (FDR) cutoff of 0.001 was used for identifying significant peaks. Peak annotation was performed using Homer annotate_peaks to associate each peak with nearby genes and genomic features. Annotated positions for promoter-TSS, exons, introns and other features was based on the mm10 transcripts. Enrichment analysis of associated genes was also performed using Enrichr for identifying Gene Ontologies (GOs) and 390 pathways significantly over-represented by associated gene sets. For each sample group (D=DNA bound RNA in Sperm, R=free RNA in Sperm, TD=DNA bound RNA in Testis), we measured percent of significant peaks that were located in each genomic region. Y-axis shows average percentage in biological replicates of each sample group. Error-bars represent standard deviations of replicates. B. Short list of various motif repeat.

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplementary material

Supplementary Table 1. A sample of a total of more than 20 cell lines tested for detection of telomeric TERRA complexes showing different species and telomerase status.

Supplementary Table 2. Deoxyribonucleotide probes.

Supplementary Table 3. Characterize feature name, length, position and retrieved sequence of different transcripts for a single region of each rare high-peak signals which are presented in the sperm DRNA (blue star peak in Fig. 400 4) fraction as well in free-RNA fraction and the testes DRNA for sequence homology.

Supplementary Table 4. Gene set analysis of different regions using Enrichr tool in DRNA from sperm and testes.

Supplementary Fig. 1. Schema of the nucleic acid extraction protocol.

Supplementary Fig. 2. RNA-seq signals over TERRA sequences at q-arm telomeric regions of several mice chromosomes. Each track shows a different sample 1 (male of six months) and 3 (male of 14 month) D1-D3 are DNA-bound RNA, R1-R3 are cytoplasmic RNA from sperm, and TD1-TD3 are DNA-bound RNA from testes. The heights of the peaks show normalized expression level (number of reads per 107 reads) over each genomic region, with equal 0-50 scale. The green track shows location of TERRA sequences in mice genome (minimum four consecutive TTAGGG repeats).

Supplementary Fig. 3.

410 Part of Bioinformatics method for Supplementary Fig. 3 description:

Reads with a length of less than 20 bp were discarded. Samples were then directly quantified using Salmon v0.12.032 on an index built from the GRCm38 genome using GENCODE vM18. RNA composition estimation (Supplementary Fig. 3) were created by summing the reads of transcripts from the same RNA biotype and dividing by the total number of assigned reads in each sample33.

a.RNA composition revealed by high-throughput RNA sequencing in sperm. Sperm transcriptome of free-RNA and DNA-bound RNA, resulting from individual mice (6 and 14 months old) reveals variable levels of transcripts mapping to protein-coding RNA, microRNA (miRNA), miscRNA, long intergenic non-coding RNA (lincRNA), sn/snoRNA and processed-transcript*. Y-axis shows the proportion of assigned reads in biological replicates of each sample group. b. RNA composition revealed by high-throughput RNA sequencing in testis. Transcriptome 420 of testis of DNA-bound RNA (TD), resulting from individual mice (6 and 14 months old) reveals levels of transcripts mapping to protein-coding RNA, microRNA (miRNA), miscRNA, long intergenic non-coding RNA (lincRNA), sn/snoRNA and processed-transcript*. Y-axis shows the proportion of assigned reads in biological replicates of each sample group. c. Percent of TERRA* (minimum four consecutive CCCTAA repeats) revealed by high-throughput RNA sequencing of sperm of mouse DNA-bound RNA (D-M) and sperm human DNA-bound RNA (D-H) and free-RNA (R-H), Y-axis shows the TERRA percent of assigned reads.

Supplementary Fig. 4.

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

TERRA RNA spots are present in sex chromosomes (Fig 4a and b). Number of TERRA peaks in ChrX (Fig. 4a) is more than ChrY (Fig. 4b), however, the number of TERRA RNA foci are colocalized with pseudoautosomal regions. For instance, Asmt and Erdr1 according to Lee et al. Significant TERRA prominent peaks over ChrY in 430 mouse sperm cells are associated with the ends of chromosome.

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

References

1. Costantino, L. & Koshland, D. The Yin and Yang of R-loop biology. Current Opinion in Cell Biology (2015) doi:10.1016/j.ceb.2015.04.008.

2. Boguslawski, S. J. et al. Characterization of monoclonal antibody to DNA · RNA and its application to immunodetection of hybrids. J. Immunol. Methods (1986) doi:10.1016/0022-1759(86)90040-2.

3. Chomczynski, P. & Sacchi, N. Single-step method of RNA isolation by acid guanidinium thiocyanate- phenol-chloroform extraction. Anal. Biochem. (1987) doi:10.1016/0003-2697(87)90021-2.

440 4. Rio, D. C., Ares, M., Hannon, G. J. & Nilsen, T. W. Purification of RNA using TRIzol (TRI Reagent). Cold Spring Harb. Protoc. (2010) doi:10.1101/pdb.prot5439.

5. Azzalin, C. M., Reichenbach, P., Khoriauli, L., Giulotto, E. & Lingner, J. Telomeric repeat-containing RNA and RNA surveillance factors at mammalian chromosome ends. Science (80-. ). (2007) doi:10.1126/science.1147182.

6. Schoeftner, S. & Blasco, M. A. Developmentally regulated transcription of mammalian telomeres by DNA-dependent RNA polymerase II. Nat. Cell Biol. (2008) doi:10.1038/ncb1685.

7. Arora, R. et al. RNaseH1 regulates TERRA-telomeric DNA hybrids and telomere maintenance in ALT tumour cells. Nat. Commun. (2014) doi:10.1038/ncomms6220.

8. Sagie, S. et al. Telomeres in ICF syndrome cells are vulnerable to DNA damage due to elevated 450 DNA:RNA hybrids. Nat. Commun. (2017) doi:10.1038/ncomms14015.

9. De Silanes, I. L. et al. Identification of TERRA locus unveils a telomere protection role through association to nearly all chromosomes. Nat. Commun. (2014) doi:10.1038/ncomms5723.

10. Montero, J. J., López De Silanes, I., Granã, O. & Blasco, M. A. Telomeric RNAs are essential to maintain telomeres. Nat. Commun. (2016) doi:10.1038/ncomms12534.

11. Blasco, M. A. et al. Telomere shortening and tumor formation by mouse cells lacking telomerase RNA. Cell (1997) doi:10.1016/S0092-8674(01)80006-4.

12. Lee, H. W. et al. Essential role of mouse telomerase in highly proliferative organs. Nature (1998) doi:10.1038/33345.

13. Müller, S. & Rodriguez, R. G-quadruplex interacting small molecules and drugs: From bench toward 460 bedside. Expert Review of Clinical Pharmacology (2014) doi:10.1586/17512433.2014.945909.

14. Buxton, J. L. et al. Human leukocyte telomere length is associated with DNA methylation levels in multiple subtelomeric and imprinted loci. Sci. Rep. (2014) doi:10.1038/srep04954.

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

15. Kent, W. J. BLAT---The BLAST-Like Alignment Tool. Genome Res. (2002) doi:10.1101/gr.229202.

16. Chu, H. P. et al. TERRA RNA Antagonizes ATRX and Protects Telomeres. Cell (2017) doi:10.1016/j.cell.2017.06.017.

17. Chu, H. P. et al. PAR-TERRA directs homologous sex chromosome pairing. Nat. Struct. Mol. Biol. (2017) doi:10.1038/nsmb.3432.

18. Sanz, L. A. et al. Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals. Mol. Cell (2016) doi:10.1016/j.molcel.2016.05.032.

470 19. Crossley, M. P., Bocek, M. & Cimprich, K. A. R-Loops as Cellular Regulators and Genomic Threats. Molecular Cell (2019) doi:10.1016/j.molcel.2019.01.024.

20. Rassoulzadegan, M. et al. RNA-mediated non-mendelian inheritance of an epigenetic change in the mouse. Nature (2006) doi:10.1038/nature04674.

21. Rassoulzadegan, M. & Cuzin, F. From paramutation to human disease: RNA-mediated heredity. Seminars in Cell and Developmental Biology (2015) doi:10.1016/j.semcdb.2015.08.007.

22. Yan, Q., Shields, E. J., Bonasio, R. & Sarma, K. Mapping Native R-Loops Genome-wide Using a Targeted Nuclease Approach. Cell Rep. (2019) doi:10.1016/j.celrep.2019.09.052.

23. Chen, L. et al. R-ChIP Using Inactive RNase H Reveals Dynamic Coupling of R-loops with Transcriptional Pausing at Gene Promoters. Mol. Cell (2017) doi:10.1016/j.molcel.2017.10.008.

480 24. Kianmehr et al. Genome-Wide Distribution of Nascent Transcripts in Sperm DNA, Products of a Late Wave of General Transcription. Cells (2019) doi:10.3390/cells8101196.

25. Estill, M. S., Hauser, R. & Krawetz, S. A. RNA element discovery from germ cell to blastocyst. Nucleic Acids Res. (2019) doi:10.1093/nar/gky1223.

26. Yan, P. et al. Genome-wide R-loop Landscapes during Cell Differentiation and Reprogramming. Cell Rep. (2020) doi:10.1016/j.celrep.2020.107870.

27. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics (2014) doi:10.1093/bioinformatics/btu170.

28. Li.H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (2011).

490 29. Heinz, S. et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis- Regulatory Elements Required for Macrophage and B Cell Identities. Mol. Cell (2010) doi:10.1016/j.molcel.2010.05.004.

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

30. Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): High- performance genomics data visualization and exploration. Brief. Bioinform. (2013) doi:10.1093/bib/bbs017.

31. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods (2012) doi:10.1038/nmeth.1923.

32. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods (2017) doi:10.1038/nmeth.4197.

500 33. Soneson, C., Love, M. I. & Robinson, M. D. Differential analyses for RNA-seq: Transcript-level estimates improve gene-level inferences [version 2; referees: 2 approved]. F1000Research (2016) doi:10.12688/F1000RESEARCH.7563.2.

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figures bioRxiv preprint doi: https://doi.org/10.1101/2020.08.24.264424; this version posted August 24, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a b c d

3 kB 12 kB

B S T B S T B S T B S T

e

Terc +/+ Terc -/- G1 Terc -/- G2 Terc -/- G3 Human sperm Mouse sperm

Figure 1 a

x’x’x’....x’aatcccaatcccaatcccaat... 5’ 5’ 3’ x x x.....x uuaggguuaggguuaggguua...

3’ 5’ 3’ x x x.... ttagggttagggttagggtta...

(Msp1) b Formamide control RNase RNaseH RNaseA DNase - + - + PhenDC3(4)

T S T S T S T S T S T T S S T S

Figure 2 a Mouse Probe to 1 2 3

UUAGGG

C18 subtelomere

C17 subtelomere

Y subtelomere

S6 chromosomal

Figure 3a b

chr2 chr12 chr10 chr14 chr19 chrX chr2 chrX qA1 qA2qA1.1 qA3 qA1qA1.2 qA1.3qB qA2 qA2qA3qC1.1qA3qA1qC1.2qA4 qB1 qC2qA2qB1qC3qB3qA1.1 qA3qB2qAqDqA1.2 qC1qB3qE1qA2 qBqA3.1qE2qB4qC2qA3.3qE3 qB5.1qE4qA4qC1qC3qE5qB5.3qBqA5qC2qF1qD1qC3qA6qF2qD2qC1qF3qA7.1qD1qD3qG1 qA7.3qC1qG3qD2qC2 qBqEqH1qC3qH2qC1qD3 qC2qC2qH3qD1qF1qC3qE1 qH4qF2qDqE2.1qD2qC3 qE2.2qE1 qE2.3qE2qD3 qE3qD1 qE4qF1 qD2qF2qE5 qF3 qD3qF4 qF5 qA1 qA2 qA3 qB qC1.1 qC1.2 qC2 qC3 qD qE1 qE2 qE3 qE4 qE5 qF1 qF2 qF3 qG1 qG3 qH1 qH2 qH3 qH4 Sequence qA1.1 qA1.2 qA2 qA3.1 qA3.3 qA4 qA5 qA6 qA7.1 qA7.3 qB qC1 qC2 qC3 qD qE1 qE2 qE3 qF1 qF2 qF3 qF4 qF5

Color: 9,56050 bp kb 9,985 bp 26 kb 9,985 bp 966 bp 12 kb 9,985 bp 182,012,000 bp 120,020182,013,000130,594,000 kb bp bp 182,014,000130,595,000124,800 bp bp kb 61,330120,030 kb182,015,000 kb130,596,000 bp170,880,000 bp bp 182,016,000130,597,00061,332 bp170,881,000 bpkb 120,040 bp kb182,017,000130,598,000170,882,000 bp 61,334 bp 124,810 130,599,000kb182,018,000 kb 170,883,000 bp bp 120,050 bp kb 130,600,000182,019,00061,336170,884,000 bp bp kb bp 130,601,000182,020,000170,885,000 bp 120,060bp bpkb61,338130,602,000 kb182,021,000170,886,000 bp bp 124,820 bp 130,603,000kb 170,887,000 61,340bp kb bp 170,888,000 bp 61,342 kb 170,889,000 bp 170,890 a c g t 182,012,300 bp 182,012,400 bp 182,012,500 bp 182,012,600 bp 182,012,700 bp 182,012,800 bp 182,012,900 bp 182,013,000 bp 182,013,100 bp 182,013,200 bp 170,880,000 bp 170,881,000 bp 170,882,000 bp 170,883,000 bp 170,884,000 bp 170,885,000 bp 170,886,000 bp 170,887,000 bp 170,888,000 bp 170,889,000 bp 170,890

[0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D1_L1D1_L1 D1_L1 D1_L1[0 - 50] D1_L1 D1_L1 D1_L1 D1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D3_L1D3_L1 D3_L1 D3_L1[0 - 50] D3_L1 D3_L1 D3_L1 D3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50]

D3_L2D3_L2 Chr2 D3_L2 D3_L2 D3_L2 D3_L2 D3_L2 D3_L2 terra_mm10.tdf [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R1_L1R1_L1SequenceR1_L1Sequence R1_L1[0 - 50] R1_L1 R1_L1 R1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] chr2 [0 - 50] R1_L2R1_L2 R1_L2chr12 chr10 chr10R1_L2 R1_L2 R1_L2chr14 chr19 chrX R1_L2 chrX qA1 qA2qA1.1 qA3 qA1.2 qA1.3qB qA2 qC1.1qA3qA1qC1.2 qB1 qC2qA2 qC3qB3 qA3qD qC1qE1 qBqE2 qC2qE3 qE4qC1qC3qE5 qC2qF1qD1qC3qF2qD2 qF3qD1qD3qG1 qG3qD2 qEqH1 qH2qD3 qH3qF1 qE1 qH4qF2qE2.1 qE2.2 qE2.3 qE3 qE4 qE5 [0 - 50] [0 - 50] qA1 [0 - 50] qA1qA2 qA3qA2 [0qA3 -qA4 50] [0qA4 - 50]qB1 [0 - 50]qB1qB2qA1.1 qB2qAqB3qA1.2 qB3qA2qB4 qA3.1qB5.1qB4qA3.3 qB5.3qB5.1qA4 qB5.3qBqA5 qC1 qA6 qC1qA7.1qC2 qA7.3qC1qC3qC2 qB qC3 qC1qD1 qC2qC2qD1qC3 qD2qD qD2qC3 qE1qD3 qE2qD3 qE3qD1 qF1 qD2qF2 qF3 qD3qF4 qF5 R3_L1R3_L1 R3_L1 R3_L1[0 - 50] qA1.1R3_L1qA1.2R3_L1qA2 qA3.1 qA3.3 qA4 qA5 qA6 qA7.1 qA7.3 qB qC1 qC2 qC3 qD qE1 qE2 qE3 qF1 qF2 qF3 qF4 qF5 R3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] 9,56050 bp kb 691 bp 9,985 bp 26 kb 12 kb 9,985 bp [0 - 50] 9,985 bp R3_L2R3_L2 R3_L2182,012,000 bp 120,020182,013,000R3_L2130,594,000 kb bp bp R3_L2182,014,000130,595,000R3_L2124,800 bp bp kb 120,030182,015,000 kb130,596,000 bp170,880,000 bp bp 182,016,000130,597,000 bp170,881,000 bp 120,040 bp kb182,017,000130,598,000170,882,000 bp bp 124,810130,599,000182,018,000 kb 170,883,000 bp bp 120,050 bp kb 130,600,000182,019,000170,884,000 bp bp bp 130,601,000182,020,000170,885,000 bp 120,060bp bpkb 130,602,000182,021,000170,886,000 bp bp 124,820 bp 130,603,000kb 170,887,000 bp bp 170,888,000 bp 170,889,000 bp 170,890 130,594,400 bp 130,594,50061,330 kb bp 130,594,60061,332 kb bp 130,594,70061,334 kb bp 130,594,80061,336 bp kb 130,594,90061,338 bp kb 130,595,00061,340 bp kb 61,342 kb R3_L2 170,880,000 bp 170,881,000 bp 170,882,000 bp 170,883,000 bp 170,884,000 bp 170,885,000 bp 170,886,000 bp 170,887,000 bp 170,888,000 bp 170,889,000 bp 170,890 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L1TD1_L1 TD1_L1[0 - 50] [0 - 50] [0TD1_L1 - 50] TD1_L1TD1_L1[0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L1 [0 - 50] D1_L1D1_L1D1_L1 D1_L1[0 - 50] [0 - 50] [0D1_L1 - 50] D1_L1 D1_L1[0 - 50] [0 - 50] [0 - 50] D1_L1 [0 - 50] TD1_L2TD1_L2 TD1_L2[0 - 50] [0 - 50] [0TD1_L2 - 50] TD1_L2TD1_L2[0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L2 [0 - 50] D3_L1D3_L1D3_L1 D3_L1[0 - 50] [0 - 50] [0D3_L1 - 50] D3_L1 D3_L1[0 - 50] [0 - 50] [0 - 50] D3_L1 [0 - 50] TD3_L1TD3_L1 TD3_L1[0 - 50] [0 - 50] [0TD3_L1 - 50] TD3_L1TD3_L1[0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] Chr10 TD3_L1 D3_L2D3_L2D3_L2 D3_L2[0 - 50] [0 - 50] [0D3_L2 - 50] D3_L2 D3_L2[0 - 50] [0 - 50] [0 - 50] D3_L2 [0 - 50] TD3_L2TD3_L2terra_mm10.tdf TD3_L2[0 - 50] [0 - 50] [0TD3_L2 - 50] TD3_L2TD3_L2[0 - 50] [0 - 50] [0 - 50] TD3_L2 [0 - 50] terra_mm10.tdfR1_L1R1_L1terra_mm10.tdfSequenceR1_L1terra_mm10.tdfSequence R1_L1terra_mm10.tdf R1_L1terra_mm10.tdf R1_L1terra_mm10.tdf terra_mm10.tdfR1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R1_L2R1_L2 R1_L2 Chr2R1_L2[0 - 50] R1_L2Chr10R1_L2Chr12 Chr14 Chr19 ChrX RefseqRefseq Refseqgenes genes genesRefseqR1_L2 genes Refseq genesRefseq genesRefseq genes Refseq[0 - 50] [0 - 50]Tmem196 genes [0 - 50] [0 - 50] [0 - 50] [0 - 50] R3_L1R3_L1 R3_L1 R3_L1[0 - 50] R3_L1 R3_L1 R3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R3_L2R3_L2 R3_L2 R3_L2[0 - 50] R3_L2 R3_L2 R3_L2 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L1TD1_L1 TD1_L1 TD1_L1[0 - 50] TD1_L1TD1_L1 TD1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L2TD1_L2 TD1_L2 TD1_L2[0 - 50] TD1_L2TD1_L2 TD1_L2 [0 - 50] [0 - 50] [0 - 50] Figure 3b [0 - 50] [0 - 50] [0 - 50] TD3_L1TD3_L1 TD3_L1 TD3_L1[0 - 50] TD3_L1TD3_L1 TD3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD3_L2TD3_L2 TD3_L2 TD3_L2[0 - 50] TD3_L2TD3_L2 TD3_L2 terra_mm10.tdfterra_mm10.tdf terra_mm10.tdf terra_mm10.tdf terra_mm10.tdf terra_mm10.tdf terra_mm10.tdf Chr2 Chr10 Chr12 Chr14 Chr19 ChrX RefseqRefseq Refseqgenes genes genesRefseq genes Refseq genesRefseq genesRefseq genes RefseqTmem196 genes c

100 100 79 75 81 70 70 72 49 41 35 41 chr13 90 79 100 87 92 74 72 76 56 49 45 47 chr16 80

75 87 100 93 76 79 84 59 47 47 50 chr2 70 60 81 92 93 100 79 78 81 57 44 42 44 chr10 50 70 74 76 79 100 74 80 61 46 43 49 chr12 40 70 72 79 78 74 100 81 55 46 45 49 chr14

72 76 84 81 80 81 100 61 47 45 50 chrX

49 56 59 57 61 55 61 100 42 43 47 chr3

41 49 47 44 46 46 47 42 100 43 47 chr19

35 45 47 42 43 45 45 43 43 100 55 chr5

41 47 50 44 49 49 50 47 47 55 100 chr8 chr13 chr16 chr2 chr10 chr12 chr14 chrX chr3 chr19 chr5 chr8

Figure 3c chr1 chr3 chr4 chr5 chr7 chr10chr9 chr12 chr16 chr18 chrX chrY

p36.31 p36.13 p35.3 p26.1p34.3p25.2p34.1 p24.3p32.3 p24.1p16.2p31.3 p15.33p22.2p31.1 p21.31p22.3p15.32p15.2 p21.3p15.2p14.3p15.1 p14p14.3p13.3p14.1p22.2p14.1p12p12p12.3p21.3q11p15.2p13.2p24.2p12.1q12p21.1q13.1p14p12q11.1q21.1p23q11.1p13p15.2q13.3q12.1p22.3p13.32q22p14.3p12.31q21.21q13.12q23.3q12.1p21.3p13.2q21.3p14.1p12.1q13.31q24.3q13.1p12.3p13.3q22.2p21.1p11.22q21.1p12.3q25.3q13.3q23p12.1p12.1p13.2q31.2q22.1p13.2p11.1q14.2p11.31p11.22p11.1p11.1q11.21q25q32.1q23p13.12q11.21p11.22p22.32q26p11.1q24q32.3q15q11.23q27q21.1q25.2p11.21q12p12.3q42.11p22.2q21.1q11.23q12q28.2q21.3q42.2p12.2p11.1p11.2q26.1q13.11p22.12q21.2q21.11q22.3q13q31.1q43p12.1q11.1q26.31q13.13q21.3p21.3q44q31.22q21.13q21.12q23.2q26.33q22.1p11.1q11.2p21.1p11.2q21.3q14.1q32.1q21.2q31.1p11.4q28q22.1q22.3q14.3p11.1q21.32q11.221q29p11.3q32.3q22.3q31.3q12.1q23.1q31.1q22.1q21.1q34.1q32p11.22q11.222q11.2q23.31q31.2q21.2q34.3q33.2q22.32q12.2p11.1q31.32q21.31q24.1q11.223q35.2q11.2q31.1q34q12.1q32.1q12.3q24.32q21.33q13.1q31.2q35.1q11.23q12.2q33q25.1q13.3q35.3q32q21.1q34q23.1q25.2q21.1q33.1q35q21.2q26.11q21q36.1q23.3q21.2q33.3q21.32q26.13q24.12q36.3q34.12q21.31q22.1q22.1q24.22q34.3q26.3q22.2q22.3q21.33q24.31 q23q23.1q22.1q24.33q24q12q23.2q22.2q25 q22.3q24.1q26.1 q26.3q24.3q23 q27.2 q28

d 6,784 bp 8,588 bp 8,588 bp 19 kb 8,588 bp 3,64615 kbbp 14 kb 4,924 bp8,760 bp 4,252 bp8,588 bp 10,000 bp 10,000 bp 11,000 bp 11,00010,000 bp bp Human 12,000 bp10 chromosomes kb12,00011,000 bp bp 12 10,000kb p13,000-13,000arm telomeres 12,000bp10,000 bp bp bp bp10 kb 14 kb 11,00014,00013,000 bp 14,000bp bp10 bp kb1216 kb kb 12,00010,00015,00014,000 bp bp bp bp10,0001815,000 11,000kbbp12 kb14 bpbp kb 13,00016,00015,00010,000 bp bp bpbp11,0002010,000 kb bp bp 1411,00016,00016 kb kb14,000 bp17,00016,000 bp22 bp kbbp12,00011,000 bp bp 12,000181615,000 kb kb18,0002417,000 11,000bp bpkb b 13,000bp bp12,000 bp12,000 bp bp 16,00026 kb2018,000 18kb bp kb14,000 b 13,000 bp bp 28 kb17,00012,000 bp22 15,000kb13,00020 bp13,000 14,000kb bp bp bp bp 18,000 16,00024 kb15,000 bp22 kb bp 14,00013,00017,000 bp 16,000bp bp 18,000 bp17,000 bp 14,000 18,000 b

[0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D_RNA_1 D_RNA_1 [0-50] D_RNA_1 D_RNA_1 D_RNA_1D_RNA_1D_RNA_1 D_RNA_1 D_RNA_1D_RNA_1 D_RNA_1D_RNA_1 D_RNA_1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D_RNA_4 D_RNA_4 [0-50] D_RNA_4 D_RNA_4 D_RNA_4D_RNA_4D_RNA_4 D_RNA_4 D_RNA_4D_RNA_4 D_RNA_4D_RNA_4 D_RNA_4 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] RNA_1 RNA_1 [0-50] RNA_1 RNA_1 RNA_1RNA_1RNA_1 RNA_1 RNA_1RNA_1 RNA_1RNA_1 RNA_1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] RNA_4 RNA_4RNA_4 [0-50] RNA_4 RNA_4 RNA_4RNA_4RNA_4 RNA_4 RNA_4RNA_4 RNA_4RNA_4 terra.tdf terra.tdfchr1 chr4terra.tdfchr2 terra.tdfchr7 chr10terra.tdf terra.tdfterra.tdfchr12 chr13terra.tdf chr15 terra.tdfchr17 terra.tdfchr18chr19 terra.tdfchr22terra.tdfchr21 p36.31 p36.13Chr1p16.2p35.3p25.2p15.33p34.3p24.3p34.1p15.2Chr3p32.3p23.3p22.2p15.1p22.3p31.3p14p21.3p15.2p21p31.1Chr4p12 p16.2p21.1p14q11p22.3 p15p13p15.2p21.3q13.1p13.3p12.31p13.32Chr5p14.3p13.3p12q13.3p13p12.1p13.2p11.2p14.1p12q21.21p12Chr7q11.1p11.22p13p12.3q21.3p12.3q12.1q12p11.2p11.1Chr9p13.3q22.2p12.1p12p12.1q13q21.1p11.1q11.21q23p13.2p11.1p11.22q14.2Chr10p11.2q22q11.23q11.21q12.12p13.1q25p11.1q21.1p11.31q23.3p11.1p13.3q21.1Chr12q26q12.2q22.1q24.3q27p12q11.23q12p11.22q11.2q21.2q13.1q23.1p13Chr16q25.3q28.2q13.11p13q12q21.3q21.11p11.21q24.1q13.3q31.2p11.2p13.2q13.13q13.2q22.1q31.1q24.3Chr16q21.13p11.1q14.11q32.1p12p12q31.22p11.1p13.13q14.1q11.1q22.3q21.3q31.2q14q32.3q14.2Xq22.1q23.1q14.3q32.1q15.1q32.2q42.11q11.2p11.2q14.3Yp11.2q11.2p13.11q22.3q23.31q32.3q33.1q42.2q21.1q21.1q31.1p11.1p11.1q43q34.1q21.2q24.1q31.2q12.1q34q21.2p12q12q44q11.1q34.3q11.2q35q21.31q24.32q31.32q21.32q21.1q21.3q36.2p11q35.2q12.2q25.1q11.21q32.1q22.1q37.1q21.33q11q21.31q25.2q22.1q21.1q33q11.22q12q22.31q21.32q12.3q22.3q23.1q26.11q34q11.23q31.1q13.11q23q26.13q35q23.3q21.1q36.1q24.1q12.1q21.2q31.2q13.12q22q24.12q26.3q24.3q36.3q31.3q21.2q24.22q23.1q12.2q32.1q13.2q25.2q21.3q21.31q24.31q23.3q32.3q25.3q13.31q12.3q24.2q21.33q24.33q33.2q26.1q13.32q22.11q24.3q22.1q26.2q34q13.33q13.1q25.1q22.12q22.2q26.3q13.41q22.13q22.3q13.2q25.3q13.42q22.2q23q13.43q13.31 q13.32q22.3

Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene 10 kbDDX11L1 10 kb 10 kb 10 kbDDX11L510 kb WASH7P 10 kb 10 kb 10 kb 10 kbNR_02826910 10kb kb WASH110 kb 10 kb DDX11L10 248,946 kb 190,204 kb242,184 kb248,948133,786 kb159,336 kb Human kb chromosomes 190,206 kb242,186 kb248,950133,788101,980 kb 114,354kb kb q159,338-arm telomeres kb kb 133,266 kb190,208 kb242,188 kb248,952133,790101,982 kb kb50,806 114,356kb83,248159,340 kb kb kb133,268190,21080,264 kb 58,608 kb kb 242,190 kb 46,700 kb248,954133,792 kb101,984 kbkb50,808 kb114,35883,250 kb159,342 kb 133,270kb 190,21280,266 kb 58,610 kb 242,192 kb 46,702 kb248,956133,794 kb101,986 kb50,810 kb 114,36083,252 kb 159,344 kbkb 133,272 kb 80,268 kb58,612 kb kb 46,704133,796 kb101,988 kb50,812 kb 114,362 83,254kb 159,346kb 133,274 80,270 kb58,614 kb kb 46,706 kb101,99050,814 kb kb114,36483,256 kb 80,27258,616 kb kb 46,708 kb 50,816 kb 46,710 k

[0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] D_RNA_1 D_RNA_1D_RNA_1[0-50]D_RNA_1D_RNA_1 D_RNA_1D_RNA_1D_RNA_1D_RNA_1D_RNA_1D_RNA_1D_RNA_1D_RNA_1

D_RNA_1[0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] D_RNA_4 D_RNA_4D_RNA_4[0-50]D_RNA_4D_RNA_4 D_RNA_4D_RNA_4D_RNA_4D_RNA_4D_RNA_4D_RNA_4D_RNA_4D_RNA_4 D_RNA_4 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] RNA_1 RNA_1RNA_1 [0-50]RNA_1 RNA_1 RNA_1RNA_1 RNA_1RNA_1 RNA_1RNA_1 RNA_1RNA_1 RNA_1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50][0 - 50] [0 - 50] [0 - 50] RNA_4 RNA_4RNA_4 RNA_4 RNA_4 RNA_4RNA_4 RNA_4RNA_4 RNA_4RNA_4 RNA_4RNA_4 RNA_4 [0-50] terra.tdf terra.tdfterra.tdf terra.tdf terra.tdf terra.tdfterra.tdf terra.tdfterra.tdf terra.tdfterra.tdf terra.tdfterra.tdf Chr1 Chr2 Chr4 Chr7 Chr10 Chr11 Chr13 Chr15 Chr17 Chr18 Chr19 Chr21 Chr22 Gene Gene Gene Gene Gene Gene Gene Gene Gene GeneGene Gene Gene

Figure 3d chr2 chr10

qA1 qA2 qA3 qB qC1.1 qC1.2 qC2 qC3 qD qE1 qE2 qE3 qE4 qE5 qF1 qF2 qF3 qG1 qG3 qH1 qH2 qH3 qH4 qA1 qA2 qA3 qA4 qB1 qB2 qB3 qB4 qB5.1 qB5.3 qC1 qC2 qC3 qD1 qD2 qD3 Chr2 Chr10 181 mb 130 mb

0 mb 20 mb 40 mb 60 mb 80 mb 100 mb 120 mb 140 mb 160 mb 180 mb 0 mb 20 mb 40 mb 60 mb 80 mb 100 mb 120 mb

TERRA Sequence TERRA Sequence [0 - 50] [0 - 50] D1_L1 D1_L1

[0 - 50] [0 - 50] D3_L1 D3_L1

[0 - 50] [0 - 50] D3_L2 D3_L2

[0 - 50] [0 - 50] R1_L1 R1_L1

[0 - 50] [0 - 50] R1_L2 R1_L2

[0 - 50] [0 - 50] R3_L1 R3_L1

[0 - 50] [0 - 50] R3_L2 R3_L2

[0 - 50] [0 - 50] TD1_L1 TD1_L1

[0 - 50] [0 - 50] TD1_L2 TD1_L2

[0 - 50] [0 - 50] TD3_L1 TD3_L1

[0 - 50] [0 - 50] TD3_L2 TD3_L2

Refseq genes Refseq genes

chr12 chr18

qA1.1 qA1.2 qA1.3 qA2 qA3 qB1 qB2 qB3 qC1 qC2 qC3 qD1 qD2 qD3 qE qF1 qF2 qA1 qA2 qB1 qB2 qB3 qC qD1 qD2 qD3 qE1 qE2 qE3 qE4 Chr12 Chr18 119 mb 90 mb

0 mb 20 mb 40 mb 60 mb 80 mb 100 mb 120 mb 0 mb 10 mb 20 mb 30 mb 40 mb 50 mb 60 mb 70 mb 80 mb 90 mb

TERRA Sequence TERRA Sequence [0 - 50] [0 - 50] D1_L1 D1_L1

[0 - 50] [0 - 50] D3_L1 D3_L1

[0 - 50] [0 - 50] D3_L2 D3_L2

[0 - 50] [0 - 50] R1_L1 R1_L1

[0 - 50] [0 - 50] R1_L2 R1_L2

[0 - 50] [0 - 50] R3_L1 R3_L1

[0 - 50] [0 - 50] R3_L2 R3_L2

[0 - 50] [0 - 50] TD1_L1 TD1_L1

[0 - 50] [0 - 50] TD1_L2 TD1_L2

[0 - 50] [0 - 50] TD3_L1 TD3_L1

[0 - 50] [0 - 50] TD3_L2 TD3_L2

Refseq genes Refseq genes

Figure 4a Figure 4b Detailed number Annotation

Simple_repeat* 4212

Low_complexit 400 y LINE 184 SINE 104 LTR 102 Satellite 87 rRNA 35 DNA 3 srpRNA 2 Other 2 snRNA 1

Figure 5a Figure 5b Motif N Motif n GAAA 1201 TTTC 1053 GAA 551 TTC 517 GGAAA 113 TTTCC 93 GGAA 86 GA 83 TC 83 GAAAA 67 TTTTC 57 TTCC 56 TTAGGG 33 CCCTAA 24 TTCTC 24 TTCTCC 14 GGGAA 11 GGA 7

Figure 5c Supplementary Figures and Tables TRIzol treatment Resuspension Tris10mM EDTA 1mM

Aqueous Interface Transfer on the column DNA RNA

Ethanol Over night precipitated Proteinase K Elution from the column

Fractions of acid nucleic : DNA/RNA RNA

Msp1/digestion Electrophoresis and transfer directly on the Nitrocellulose for blotting

Supplementary Figure 1 (Telomeres associated Terra RNA) Tatar-blot Northern blot TRIzol treatment Resuspension Tris10mM EDTA 1mM

Aqueous Interface Transfer on the column DNA RNA

Ethanol Over night precipitated Proteinase K Dnase treatment on the column and elution

Fractions of RNA: D R chr2 chr12 chr10 chr14 chr19 chrX chrX qA1 qA2qA1.1 qA3 qA1qA1.2 qA1.3qB qA2 qA2qA3qC1.1qA3qA1qC1.2qA4 qB1 qC2qA2qB1qC3qB3qA1.1 qA3qB2qAqDqA1.2 qC1qB3qE1qA2 qBqA3.1qE2qB4qC2qA3.3qE3 qB5.1qE4qA4qC1qC3qE5qB5.3qBqA5qC2qF1qD1qC3qA6qF2qD2qC1qF3qA7.1qD1qD3qG1 qA7.3qC1qG3qD2qC2 qBqEqH1qC3qH2qC1qD3 qC2qC2qH3qD1qF1qC3qE1 qH4qF2qDqE2.1qD2qC3 qE2.2qE1 qE2.3qE2qD3 qE3qD1 qE4qF1 qD2qF2qE5 qF3 qD3qF4 qF5 qA1.1 qA1.2 qA2 qA3.1 qA3.3 qA4 qA5 qA6 qA7.1 qA7.3 qB qC1 qC2 qC3 qD qE1 qE2 qE3 qF1 qF2 qF3 qF4 qF5

9,56050 bp kb 9,985 bp 26 kb 12 kb 9,985 bp 9,985 bp 182,012,000 bp 120,020182,013,000130,594,000 kb bp bp 182,014,000130,595,000124,800 bp bp kb 61,330120,030 kb182,015,000 kb130,596,000 bp170,880,000 bp bp 182,016,000130,597,00061,332 bp170,881,000 bpkb 120,040 bp kb182,017,000130,598,000170,882,000 bp 61,334 bp 124,810 130,599,000kb182,018,000 kb 170,883,000 bp bp 120,050 bp kb 130,600,000182,019,00061,336170,884,000 bp bp kb bp 130,601,000182,020,000170,885,000 bp 120,060bp bpkb61,338130,602,000 kb182,021,000170,886,000 bp bp 124,820 bp 130,603,000kb 170,887,000 61,340bp kb bp 170,888,000 bp 61,342 kb 170,889,000 bp 170,890 170,880,000 bp 170,881,000 bp 170,882,000 bp 170,883,000 bp 170,884,000 bp 170,885,000 bp 170,886,000 bp 170,887,000 bp 170,888,000 bp 170,889,000 bp 170,890

[0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D1_L1D1_L1 D1_L1 D1_L1[0 - 50] D1_L1 D1_L1 D1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D3_L1D3_L1 D3_L1 D3_L1[0 - 50] D3_L1 D3_L1 D3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] D3_L2D3_L2 D3_L2 D3_L2[0 - 50] D3_L2 D3_L2 D3_L2 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R1_L1R1_L1 R1_L1 R1_L1[0 - 50] R1_L1 R1_L1 R1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R1_L2R1_L2 R1_L2 R1_L2[0 - 50] R1_L2 R1_L2 R1_L2 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R3_L1R3_L1 R3_L1 R3_L1[0 - 50] R3_L1 R3_L1 R3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] R3_L2R3_L2 R3_L2 R3_L2[0 - 50] R3_L2 R3_L2 R3_L2 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L1TD1_L1 TD1_L1 TD1_L1[0 - 50] TD1_L1TD1_L1 TD1_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD1_L2TD1_L2 TD1_L2 TD1_L2[0 - 50] TD1_L2TD1_L2 TD1_L2 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD3_L1TD3_L1 TD3_L1 TD3_L1[0 - 50] TD3_L1TD3_L1 TD3_L1 [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] [0 - 50] TD3_L2TD3_L2 TD3_L2 TD3_L2[0 - 50] TD3_L2TD3_L2 TD3_L2 terra_mm10.tdfterra_mm10.tdf terra_mm10.tdf terra_mm10.tdf terra_mm10.tdf terra_mm10.tdf terra_mm10.tdf Chr2 Chr10 Chr12 Chr14 Chr19 ChrX RefseqRefseq genes genesRefseq genes Refseq genesRefseq genesRefseq genes RefseqTmem196 genes

Supplementary Figure 2 RNA composition

1,0

0,9

0,8

0,7

0,6

0,5

0,4

proportion of assigned reads assigned of proportion 0,3

0,2

0,1

0,0 DNA-bound RNA1 DNA-bound RNA2 Free-RNA1 Free-RNA2 protein-coding miRNA misc-RNA lincRNA sn/snoRNA processed_transcript

Supplementary Figure 3 a RNA composition

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

Proportion of assigned reads Proportion of assigned 0,2

0,1

0,0 TD1_L001 TD1_L002 TD3_L001 TD3_L002 protein-coding miRNA misc-RNA lincRNA sn/snoRNA processed_transcript

Supplementary Figure 3 b TERRA percent 2,50%

2,00%

1,50%

1,00%

0,50%

0,00% D-M R-M D-H R-H

Supplementary Figure 3 c Supplementary Figure 4a.

Supplementary Figure 4b. Supplementary Table 1

Species Tissues ex ALT status Restriction TaTR signal vivo cleavage Mouse Embryo Negative MspI ++ tissues fibroblasts ex vivo Testis ND MspI +++ Sperm ND MspI +++++ Brain ND MspI ++ Mouse cell 15P1, PY6, ND MspI + lines 45T1,MTT3 CCL39TK Rat cell lines FR3T3, FR- ND MspI + RAS, BPV-1, BPV-5, SVWTAII, SWTN2, WTRSV

Human cell RAJI Positive MspI + lines HELA ND 293 ND human U-2 Positive OS cancer cell line (ATCC® HTB- 96™) Human Saliva ND MspI +++ Supplementary Table 2 Deoxyribonucleotide probes

Supplementary Table 1 (cont)

Subtelomeric sequences 20 Probe sequence

Chromosome 18 short arm GGGACATCTTCTGGATATATGCCCAGGAGAGGTATTG

(NT_039674.8) CAATACCTCTCCTGGGCATATATCCAGAAGATGTCCC

ACAGAGGTGAGGCCAGTGTCAGTCTGTCCCTTCTTT Chromosome 18 long arm TGCCACAGATTTTAATATTAAACACATATAAAATA (NT_039678.8) TATTTTATATGTGTTTAATATTAAAATCTGTGGCA

Chromosome 9 short arm TCACGTTTTTCAGTGATTTTGTCATTTTTCAAGTTGT Noncoding RNAs (NT_039471.8) ACAACTTGAAAAATGACAAAATCACTGAAAAACGTGA

CTTGGCTGTCTTGGAATTCACTTTGTAGACCAGGCT Terra Chromosome 9 long arm CTGTCTCCCTGGGATTAAAGGCATGCACCACCATGC CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA

(NT_166313.2) CTCTTTTATGCATTTTGAGATGATCTTGTCCTGGAAAG CTTTCCAGGACAAGATCATCTCAAAATGCATAAAAGAG Terra-reverse Chromosome 17 long arm GGCAGAGGCAAGAGGATCTCTGTGAATTCCAGGTAAG TTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGG (NT_039649.8) CTTACCTGGAATTCACAGAGATCCTCTTGCCTCTGCC

Chromosome 19 long arm GCTGAATAGATGGTTCTATGTTTATTCCTAAGAAATAG (NT_082868.6) GTTTATTGGATCTTCACAGGAATACCCCACTCTGCTG Sine B2 GTCCTGAGTTCAATTCCCAGCAACCACATGGTGGCTCACAACCATC

GCTGCAGAGTGGGGTATTCCTGTGAAGATCCAATAAAC Xist CCAGCCATGTTTGCTCGTTTCCCGTGGATGTGCGGT Chromosome X long arm GTCTGGGTCTTTGGAGAGGCTGGCTTAGGGGCTAAAG

(NT_165788.2) CTTTAGCCCCTAAGCCAGCCTCTCCAAAGACCCAGTC

Chromosome X short arm GCAGAGCATTCAGTACTCACTAACC

GGTTAGTGAGTACTGAATGCTCTGC (NT_039699)

Chromosome Y GTTTGTACGGAATAGAGAAAAGTGTGGCAGAAAAGCG (NT_187051.1) Table 3 Num ber sam of Region library p-value Enriched genes ples total gene s positive regulation of flagellated sperm 0.00002624 RNASE10;TACR2 dRN motility (GO:1902093) GO Biological 3’ 27 A regulation of flagellated sperm motility Process 2018 0.00006281 RNASE10;TACR2 (GO:1901317) MP:0009833_absent_sperm_mitochond 0.001399 rial_sheath MP:0009234_absent_sperm_head 0.002098 MP:0009834_abnormal_sperm_annulus dRN MGI Mammalian 0.002448 5’ _morphology Spata6 7 A Phenotype 2017 MP:0008893_detached_sperm_flagellu 0.002797 m MP:0002662_abnormal_cauda_epididy 0.003146 mis_morphology Spermatozoon 0.00004370 ODF1;AKAP4;AKAP3 101 SMCP;PRM1;ODF1;ODF2;TXND Jensen TISSUES C2; Spermatid 1.476e-11 YBX2;HSPA2;PGK2; TNP2;AKAP4;AKAP3;H1FNT Sperm_fibrous_sheath 0.000002472 PGK2;AKAP4;AKAP3 Jensen dRN ODF1;ODF2;PTCHD3;PGK2;AKA exon Sperm_flagellum COMPARTMENT 5.731e-7 A P4;AKAP3 S Male_germ_cell_nucleus 0.001899 HSPA2;TNP2 MP:00045 MGI Mammalian SMCP;ATP8B3;ODF1;TNP2;OAZ 5.671e-8 42_impaired_acrosome_reaction Phenotype 2017 3 CRISP2;PRM1;ODF1;TXNDC2;YB Male_infertility Jensen DISEASES 1.056e-10 X2;HSPA2; PGK2;TNP2;AKAP4;AKAP3;OAZ3 SNORA62;ZC3H10;BC005537;92 ENCODE Histone 30105E05RIK; dRN Promoter-TSS H3K36me3_testis_mm9 Modifications 0.007206 SNORD15B;PRM2;SCARNA10;R 55 A 2015 NF138;UBE2N; DBIL5;PHF7;1700017D01RIK dRN TTS Epididymitis Jensen DISEASES 0.008968 TUBB4B 36 A MP:0011610_abnormal_primordial_ger MGI Mammalian m_cell_apoptosis Phenotype 2017 dRN Intron Tissue Protein 0.001724 BRIP1, WNT5A, ROR2 1147 A testis Expression from ProteomicsDB Supplementary Table S4- Enrichment analysis of associated genes of annotated positions (3’ UTR, 5’ UTR, exon, promoter-TSS (Transcription Start Site), TTS (Transcription Termination Site), and intron of mouse sperm

Supplementary Table S4- Enrichment analysis of associated genes of annotated positions (3’ UTR, 5’ UTR, exon, promoter-TSS (Transcription Start Site), TTS (Transcription Termination Site), and intron of mouse testis

Number of Region samples Gene ontology library p-value total genes dosage compensation by inactivation of X GO Biological Process 5’ TD 0.001753 921 chromosome (GO:0009048) 2018 RNA processing (GO:0006396) GO Biological Process 2.496e-8 protein ubiquitination (GO:0016567) 2018 4.05E-09 Jensen Nucleoplasm 2.011e-17 COMPARTMENTS GO Molecular exon TD RNA binding (GO:0003723) 1.565e-12 7742 Function 2018 MGI Mammalian MP:0001925_male_infertility 2.508e-7 Phenotype 2017 GO Cellular