<<

DNA·RNA triple helix formation can function as a cis-acting regulatory mechanism at the human β-globin

Zhuo Zhoua, Keith E. Gilesa,b,c, and Gary Felsenfelda,1

aLaboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892; bUniversity of Alabama at Birmingham Stem Cell Institute, University of Alabama at Birmingham, Birmingham, AL 35294; and cDepartment of Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, AL 35294

Contributed by Gary Felsenfeld, February 4, 2019 (sent for review January 4, 2019; reviewed by James Douglas Engel and Sergei M. Mirkin) We have identified regulatory mechanisms in which an RNA tran- of the criteria necessary to establish the presence of a triplex script forms a DNA duplex·RNA triple helix with a or one of its structure, we first describe and characterize triplex formation at regulatory elements, suggesting potential auto-regulatory mecha- the FAU gene in human erythroid K562 cells. FAU encodes a nisms in vivo. We describe an interaction at the human β-globin that is a fusion containing fubi, a -like protein, locus, in which an RNA segment embedded in the second intron of and S30. Although fubi function is unknown, the β-globin gene forms a DNA·RNA triplex with the HS2 sequence posttranslational processing produces S30, a component of the within the β-globin locus control region, a major regulator of glo- 40S ribosome. We used this system to refine methods necessary bin expression. We show in human K562 cells that the triplex is to detect triplex formation and to distinguish it from R-loop stable in vivo. Its formation causes displacement from HS2 of ma- formation, a potential source of confusion. jor transcription factors and RNA Polymerase II, and consequently We then applied these methods to search for other examples in loss of factors and polymerase that bind to the human e- and of DNA·RNA triplexes and identified an interaction between an γ-globin promoters, which are activated by HS2 in K562 cells. This RNA sequence present within an intron of the human adult results in reduced expression of these . These effects are β-globin gene and an upstream regulatory element within hy- observed when a small length of triplex-forming RNA is intro- persensitive site 2 (HS2) of the β-globin locus control region duced into cells, or when a full-length intron-containing human (LCR). The effect of this interaction is to displace transcription β-globin transcript is expressed. Related results are obtained in factors from the regulatory site and affect expression of members human umbilical cord blood-derived erythroid progenitor-2 cells, of the β-globin family. This system represents a feedback mech- in which β-globin expression is similarly affected by triplex forma- anism in which a transcript could affect its own expression by tion. These results suggest a model in which RNAs conforming to forming a triple-strand structure at a nearby regulatory element. the strict sequence rules for DNA·RNA triplex formation may par- ticipate in feedback regulation of genes in cis. Results In Vivo Triplex-Forming RNA in K562 Cells: The FAU Gene as a Source · · DNA RNA triplex | human globin genes | FAU gene | DNA RNA triplex and Target. The methods we employed for detecting DNA·RNA RNA-mediated triple-stranded structure formation in vitro are shown in SI Ap- pendix, Fig. S1 B–D, using a previously published and well- he fact that both DNA and RNA can form triple-stranded characterized triplex-forming sequence (3). RNA and individual Tstructures has been known for a long time (1, 2), and the rules governing the formation of such structures have been well Significance established (3, 4). Triplexes in which all three strands are RNA were the first to be observed (1) in vitro. Triplexes in which all RNA containing only pyrimidines, transcribed from genomic three strands are DNA (H-DNA) form under superhelical stress DNA, can form DNA∙RNA triple helices with other sites in the (5, 6), and short runs of all-RNA triplex structures have been genome. We characterize an intronic sequence in the human shown to protect a long-noncoding RNA (lncRNA) from deg- β-globin gene that forms a triplex with an upstream regulatory radation or misfolding (7–12). Triple-strand structures can also site in the β-globin locus, causing displacement of transcription be formed between a DNA duplex and single-stranded RNA factors and down regulation of expression of the genes in the (ssRNA), but the sequence constraints are greater than with locus. Consequently, overexpression of a full-length β-globin structures in which all three strands are DNA (13–17). As shown transcript containing the intron results in decreased expres- · in SI Appendix, Fig. S1A, a typical DNA RNA triplex requires sion of members of the β-globin gene cluster. This is a cis- that the RNA bases be pyrimidines, with U favored over C en- regulatory feedback mechanism in which overexpression of ergetically (4). An rU base forms a triplex with an A·TDNA the β-globin gene could result in signals to down-regulate that ; an rC base, typically protonated, forms triplex with a expression. Balance in the expression of the many genes as- G·C DNA base pair. Such DNA·RNA complexes are among the sociated with hemoglobin biosynthesis is essential for ery- most stable triplex polynucleotide structures (15, 17). The RNA throid cell function. polypyrimidine strand binds in a direction antiparallel to the DNA polypyrimidine strand. One consequence of this strand Author contributions: Z.Z., K.E.G., and G.F. designed research; Z.Z. performed research; polarity is that, except when the sequences are palindromic, a Z.Z. and G.F. analyzed data; and Z.Z. and G.F. wrote the paper. transcript cannot form a DNA·RNA triplex with its own Reviewers: J.D.E., University of Michigan; and S.M.M., Tufts University. template. Thus, source and target sequences will generally The authors declare no conflict of interest. be distinct. Published under the PNAS license. Taking these constraints into account, we searched the human 1To whom correspondence should be addressed. Email: [email protected]. genome for candidate sequences that might be either a DNA This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. target for DNA·RNA triplex formation, or the source of an RNA 1073/pnas.1900107116/-/DCSupplemental. capable of binding to such DNA sequences. As a demonstration Published online March 13, 2019.

6130–6139 | PNAS | March 26, 2019 | vol. 116 | no. 13 www.pnas.org/cgi/doi/10.1073/pnas.1900107116 Downloaded by guest on September 27, 2021 DNA strands are labeled for electrophoresis studies, and sensi- triplex formation and eliminating heteroduplex formation as an tivity of complexes to RNases distinguishes triplex from hetero- explanation for our results. duplex when the sequence is palindromic. To distinguish between these possibilities in vitro, FAU-triplex Using these guidelines for formation of DNA·RNA triplexes gel-shift assays were performed with fluorescent tagged oligo- (3, 4) (SI Appendix, Fig. S1), we first searched the Cold Spring nucleotides, as shown in Fig. 1B. We observed that Cy5-labeled Harbor Laboratory long RNA-sequencing database in K562 cells FAU triplex-forming RNA (FAU-tfRNA) forms a complex with for RNAs (>200-nt long) containing at least 20-nt-long stretches its Cy3-labeled DNA templates at pH 6.5. This complex is re- of polypyrimidines. Because we wanted to explore the potential sistant to RNase H cleavage but is degraded by RNase A, as has role of such triplexes in gene regulation, we were interested in been shown to be consistent with formation of a DNA·RNA RNAs transcribed from nonrepetitive DNA near putative gene triplex (22) (SI Appendix, Fig. S1). A third possible outcome, promoter regions. Such a configuration should favor interac- involving complete displacement of one of the two DNA strands, tions between RNA and its DNA target. We initially focused on is ruled out by the electrophoretic results (Fig. 1B) (RNase H). FAU (FAU ubiquitin-like and ribosomal protein S30 fusion), a Complex formation is dependent on the RNA sequence, as Cy5- proapoptotic regulatory gene that is expressed in K562 cells and labeled PolyU-RNA is unable to produce a band shift with the down-regulated in human breast, prostate, and ovarian cancers FAU-DNA duplex in our triplex gel-shift assay (SI Appendix, Fig. (18–21). Our search showed that one of the more abundant S2). The results indicate that triplexes rather than R-loops are RNAs satisfying the criteria for triplex formation corresponded formed in vitro between FAU-tfRNA and its targeting DNA in sequence to FAU antisense transcript (Fig. 1A). This RNA duplex, similarly to what is observed in the “model” triplex in SI contains a U-rich sequence, (UCU)6, which could in principle be Appendix, Fig. S1. targeted back to the FAU gene as part of a canonical triplex. To further exclude the possibility of an R-loop structure, we However, because it happens to be a palindromic sequence it also subjected these FAU double-stranded (dsDNA)/ssRNA could also form an R-loop in which the RNA partially displaces complexes to dimethyl sulfate (DMS) footprinting to deter- one of the DNA strands, and forms a heteroduplex with the mine the accessibility of guanines in the DNA duplex at the other, while still maintaining a complex with three strands. We N7 position. In triple-stranded structures in which all three chose this gene as a way to develop methods for demonstrating strands are DNA, it has been shown that formation of dC·dG·dC BIOCHEMISTRY A

CCTGTGTCTTCTTCTTCTTCTTCTCTCCTGTTTGG

BC DMS - + + RNase A RNase H Proteinase K 32P-FAU DNA Duplex + + + FAU RNA --+ DNA/RNA Hybrid DNA/RNA DNA (+) DNA Duplex Triplex RNA DNA Duplex RNA DNA/RNA Hybrid DNA/RNA DNA (+) Triplex RNA Hybrid DNA/RNA DNA Duplex DNA (+) Triplex RNA Hybrid DNA/RNA DNA (+) DNA Duplex Triplex

DNA·RNA triplex DNA RNA

5’-GTGGCCAAACAGGAGAAGAAG AAGAAGAAGACAGGTCGGGC -3’

Fig. 1. FAU-tfRNA forms triplex with FAU gene dsDNA in vitro. (A) University of California, Santa Cruz genome browser showing the structure of the human FAU locus. Triplex-forming region is located at exon 5 (red bar) and palindromic antisense triplex-forming sequence (FAU-tfRNA) is underlined (red). (B) FAU- tfRNA forms dsDNA/ssRNA triplex with the FAU gene triplex region in vitro, and such a triplex is resistant to RNase H but subject to RNase A digestion. Cy3- labeled FAU dsDNA (green) and Cy5-labled FAU-tfRNA (red) were incubated as described (Materials and Methods and SI Appendix, Table S1) and triplex formation (yellow) was examined by gel-shift assay. In DNA/RNA hybrid, RNA strand is labeled with Cy5 (red). (C) FAU dsDNA/ssRNA triplex formation protected the FAU dsDNA triplex region from guanine specific cleavage in a DMS footprinting assay. FAU dsDNA was labeled with 32Patthe5′-end of the polypurine-containing strand of DNA as shown in C (Bottom). FAU gene triplex-forming region is indicated with red box.

Zhou et al. PNAS | March 26, 2019 | vol. 116 | no. 13 | 6131 Downloaded by guest on September 27, 2021 triplets protects the guanines from methylation by DMS at the FAU-tfRNA interacts with the triplex target DNA of FAU, N7 position (23), as shown for the model DNA·RNA triplex in K562 cells were transfected with either biotin-labeled FAU- SI Appendix, Fig. S1 B–D. Consistent with triplex formation, we tfRNA or biotin-labeled PolyU RNA as a negative control, fol- similarly observed that all guanines within the FAU dsDNA/ lowed by streptavidin-conjugated beads pull-down (Materials and ssRNA triplex region were protected from DMS methylation, Methods). Consistent with the results obtained in vitro, FAU- measured by resistance to subsequent strand cleavage at these tfRNA but not PolyU RNA was associated with the triplex re- guanines by piperidine (Fig. 1C). This makes it clear that a gion of the FAU gene in vivo (Fig. 2A). DNA·RNA triplex structure, not an R-loop, is formed in vitro. As in the case of interactions in vitro, the palindromic nature of this tfRNA and the fact that it is transcribed in the antisense FAU-Triplex Formation in Vivo and Its Role in Gene Regulation. We direction from exon 5 of FAU raised the possibility that it might next addressed the question of whether triplexes can form in vivo form an R-loop in vivo at the FAU locus, rather than a triplex. at the FAU gene. FAU-tfRNA was transfected into K562 cells To address this possibility, nuclei isolated from K562 cells and its effect on FAU gene expression was examined. Cy5- transfected with biotin-labeled FAU-tfRNA were treated with labeled FAU-tfRNA was used first to optimize conditions for RNase H, as above, to remove R-loops. We performed separate high transfection efficiency. To examine whether the introduced experiments to show that RNase H is fully functional within the

A RNaseH(+) B RNase H(-) D 1.2 RNaseH(-) RNase H(+) *** 1 25 1.4 0.8 ** 1.2 *** 0.6 FAU 20 0.4 1 0.2 ** 0 0.8 Mock PolyU FAU 15 0.6 1.2 0.4 1

pulldown Enrichment pulldown 10 0.2 0.8 0.6 MRPL49 Relative Expression Relative 0 0.4 5 FAU-R1 MRPL49-R3 0.2 0 Mock PolyU FAU 0 Relative Expression Relative 1.2

Relative Biotin- Relative Mock PolyU FAU 1 0.8 0.6 GAPDH 0.4 C E 0.2 0 Mock PolyU FAU FAU PolyU Mock

FAU

GAPDH

1.0 0.9 0.6 Relative Intensity

Fig. 2. FAU tfRNA down-regulates FAU expression in vivo by forming dsDNA/ssRNA triplex with the FAU gene. (A) FAU-tfRNA forms dsDNA/ssRNA triplex with the FAU gene in vivo. Biotin pull-down assays show association of biotin FAU-tfRNA to the FAU gene triplex-forming region in the presence [RNase H (+)] or absence [RNase H (−)] of RNase H, compared with controls (mock and PolyU). Error bars represent mean ± SEM; n = 3. P values from Student’s t test are indicated here and in B and D:**P < 0.05, ***P < 0.01. (B) Exogeneous RNase H introduced into cell nuclei is functional (Materials and Methods). Green arrows depict the schematic locations of qRT-PCR primers used in this assay. qRT-PCR shows fold-change in FAU RNA in target (FAU-R1) or control (MRPL49-R3) ASO- transfected cells before and after treating nuclei with RNase H. Error bars represent mean ± SEM; n = 3. (C) FAU-tfRNA is unable to form an R-loop with the FAU gene in vivo. R-loop elimination assay (Materials and Methods, Results, and SI Appendix, Fig. S3A) was performed on biotin FAU-tfRNA or control (mock and biotin-PolyU and biotin β-globin) -transfected K562 cells. Nuclei were subjected to sequential digestion by S1 nuclease at pH 7.4 or pH 4.5, RNase H at pH 7.9 and S1 nuclease at pH 7.4 or pH 4.5. qPCR shows fold-change in genomic DNA spanning the FAU triplex-forming region normalized to β-actin. Error bars represent mean ± SEM; n = 3. (D) FAU-tfRNA specifically down-regulates FAU gene expression in vivo. Cells were transfected with biotin FAU-tfRNA or controls (mock and biotin-PolyU). qRT-PCR shows fold-change in mRNA at FAU-tfRNA target (FAU) or control genes (MRPL49 and GAPDH). Error bars rep- resent mean ± SEM; n = 3. (E) FAU-tfRNA specifically reduces FAU protein abundance in vivo. Cells were transfected with biotin FAU-tfRNA or controls (mock and biotin-PolyU). Western blot analysis shows fold change in FAU protein level normalized to GAPDH and presented as relative intensity in reference to mock control. Note that the Upper and Lower panels represent separate experiments.

6132 | www.pnas.org/cgi/doi/10.1073/pnas.1900107116 Zhou et al. Downloaded by guest on September 27, 2021 nucleus under our experimental conditions (Materials and This raised the question whether the triplexes formed at the Methods and Fig. 2B). To distinguish triplex from R-loop for- FAU gene could play any role in gene regulation. To address this, mation, we treated nuclei of K562 cells that were transfected FAU expression was measured by qRT-PCR in FAU-tfRNA with either biotin-labeled FAU-tfRNA or biotin-labeled PolyU or control RNA-transfected cells. We observed ∼20% down- RNA control with RNase H, then performed a biotin pull-down regulation of FAU expression only in FAU-tfRNA–transfected assay. We observed no significant difference in the amount of cells (Fig. 2D). This down-regulation is specific, as we did not nuclear FAU-tfRNA associated with the FAU gene, with or observe any changes in expression of other genes, such as without RNase H treatment (Fig. 2A). These results indicate MRPL49, located ∼1 kb upstream of the triplex region of FAU that, similar to the results obtained in vitro, this tfRNA forms gene, or of GAPDH, a housekeeping gene. Triplex formation only triplexes with its targeting DNA when transfected into had corresponding effects on FAU protein abundance, which K562 cells. decreased by 30% compared with a polyU control (Fig. 2E). Because of the importance of distinguishing triplex from R- ∙ loop in many cases of potential triplex formation, we devised an DNA RNA Triplexes Formed Between an LCR Enhancer and Intronic RNA at the β-Globin Locus. Having established criteria for identi- additional “R-loop elimination” assay. As illustrated in SI Ap- fying triplex structures in vivo, we looked for other examples of pendix, Fig. S3A, single-stranded DNA (ssDNA), formed by triplex structures with regulatory implications. We have been displacement when RNA binds to the complementary strand in a interested for some time in the human β-globin gene, which is putative R-loop, is degraded by S1 nuclease. Subsequent treat- expressed at high level in erythroid cells and is regulated by the ment with RNase H digests RNA in the remaining RNA hybrid LCR, a powerful enhancer cluster located far upstream of the with ssDNA, and the exposed ssDNA is then digested by a sec- β-like globin genes and marked by five DNase I hypersensitive ond treatment with S1 nuclease. The resulting break in DNA can (HS) sites (Fig. 3A) (24, 25). Our genome-wide search results for be detected by qPCR with primers spanning the R-loop region. potential tfRNAs in K562 cells suggested that the majority of We applied this assay to nuclei from K562 cells transfected with these RNAs are transcribed from introns. We therefore exam- either biotin-labeled FAU-tfRNA or control RNAs (SI Appen- ined the introns of the β-globin gene for sequences that could dix, Fig. S3B). We did not observe any reduction in the amount give rise to triplex-forming transcripts. We identified a stretch of of nongapped DNA template in this R-loop elimination assay polypyrimidine RNA [β-globin (L)] transcribed from the second compared with results for control samples treated only with intron of this gene that could in principle form canonical BIOCHEMISTRY RNase H. This suggests that there are no R-loops in the triplex DNA·RNA triplexes with HS site 2 (HS2) of the globin domain region of FAU. We obtained similar results using S1 nuclease LCR (Fig. 3B). In this case the sequence is not palindromic and either at pH 7.4 or pH 4.5, in nuclei from cells transfected with the RNA strand is not a transcript from the DNA it would target, FAU-tfRNA or control RNAs. These results further exclude making R-loop formation unlikely. To test whether a triplex the possibility of R-loop formation (Fig. 2C and SI Appendix, structure is formed, we first carried out an in vitro assay in which Fig. S3B). a Cy3-labeled LCR HS2 duplex DNA probe was incubated with

AB

C

DE RNaseH (-) RNaseH (+) 2.5 2.5 2.5 ** * 2.5 2 2 2 2

1.5 1.5 1.5 1.5 Mock PolyU 1 1 PolyU 1 1 β-globin(L) β-globin(L) 0.5 0.5 0.5 0.5 0 0 0 0 HS2 Negave Control HS2 Negave Control Relave Bion-pulldown Enrichment Relave Bion-pulldown Enrichment Relave

Fig. 3. β-globin tfRNA forms dsDNA/ssRNA triplex with LCR HS2. (A) Structure of the human β-globin gene locus. Schematic organization of the adult β-globin gene is shown in the middle with exons in red, introns in yellow and the intronic β-globin (L) in purple. The organization of binding sites and triplex-forming region (blue) within LCR HS2 is shown at the bottom. β-globin (L) RNA (purple) forms dsDNA/ssRNA triplex with LCR HS2 at the triplex region. (B) Nucleotide triplets form between LCR HS2 dsDNA and β-globin (L) RNA. Hoogsteen hydrogen bonds formed between β-globin (L) RNA and the polypurine strand of Watson–Crick base-paired LCR HS2 duplex DNA are denoted by dashed lines. The orientation of each strand within a triplex is shown. (C) β-globin (L) RNA forms triplex with LCR HS2 dsDNA in vitro. Cy3-labeled LCR HS2 dsDNA (green) and biotin β-globin (L) RNA (or biotin-PolyU, a negative control) were incubated as described (Materials and Methods and SI Appendix, Table S1) and triplex formation was examined by gel-shift assay. (D and E) β-globin (L) RNA forms triplex with LCR HS2 in vivo. K562 cells were transfected with biotin β-globin (L) or biotin-PolyU RNA (control). Fold-change in association of biotin-labeled RNAs with LCR HS2 triplex-forming region in the absence [RNase H (−)] (D, Left) or presence [RNase H (+)] (E, Left) of RNase H is shown, comparing to the controls (mock and/or PolyU). Neither biotin β-globin (L) RNA nor biotin-PolyU associates with a gene desert region on 12 (negative control, see Materials and Methods)(D, Right and E, Right). Error bars represent mean ± SEM; n = 3. P values from Student’s t test: **P < 0.05, *P < 0.1.

Zhou et al. PNAS | March 26, 2019 | vol. 116 | no. 13 | 6133 Downloaded by guest on September 27, 2021 biotin-labeled β-globin (L) tfRNA under conditions similar to gene desert region (see legend to Fig. 3 D and E) on chromo- those shown in Fig. 1 and SI Appendix, Fig. S1, followed by a some 12 (Fig. 3 D and E). standard gel-shift assay. It has been well documented (26, 27) We asked whether the triplexes formed at LCR HS2 affect its that protonation of cytosines of third-strand RNA under mildly enhancer activity in vivo. HS2 harbors numerous binding sites for acidic conditions is required to stabilize DNA∙RNA triplexes in erythroid specific and ubiquitous transcription factors that are vitro. As shown in Fig. 3C, an up-shifted band of the LCR essential for polymerase II (Pol II) recruitment and downstream HS2 duplex DNA probe was detected when β-globin (L) tfRNA β-like globin gene expression, such as NF-E2, USF, GATA1, and was used in our in vitro triplex formation assay, consistent with TAL1 (25, 28) (Fig. 3A). Within HS2, the USF binding motif (E- formation of a complex between DNA and RNA. This experi- box) is located 5-bp downstream of the triplex region, while ment was carried out at pH 5.0, which we found was required to binding motifs for GATA1, TAL1, and NF-E2 are located 34, observe the complex in vitro. This is presumably due to the 63, and 92 bp, respectively, upstream of the triplex region (Fig. presence of two mismatched nucleotides between β-globin (L) 3A). Studies of the conformation of DNA triplexes (29, 30) RNA and its targeting DNA template (Fig. 3B). have shown that an all-DNA triplex containing a heterogeneous polypyrimidine third strand has 12 triads per helix turn, and a Triplexes at the β-Globin Locus Attenuate Adjacent Transcription 38.4-Å helical pitch (29), representing a serious distortion of the Factors and RNA Polymerase II Binding to LCR and β-Like Globin classic B-DNA duplex. Such a perturbation could, in principle, Gene Promoters. Despite this pH dependence in vitro, we ob- be propagated into neighboring B form DNA, changing the local served strong triplex formation in vivo between β-globin (L) conformation within LCR HS2 and affecting the DNA-binding tfRNA and the LCR HS2 site. K562 cells were transfected with stabilities of transcription factors near the triplex region. It seems biotin-tagged β-globin (L) tfRNA or similarly tagged polyU as a likely that a DNA·RNA triplex would also contain a perturbed negative control, followed by a biotin pull-down assay. As shown DNA structure. To address this, K562 cells were transfected with in Fig. 3D, β-globin (L) tfRNA associates with its targeting triplex biotin-tagged β-globin (L) tfRNA, followed by chromatin im- region at LCR HS2 in vivo. There is an ∼twofold increase in the munoprecipitation (ChIP) assays to examine the interaction of association of β-globin (L) tfRNA with the triplex region of LCR transcription factors with LCR HS2 near the triplex region. HS2 relative to the polyU control. We performed experiments Interestingly, we observed, only in the cells that were trans- (Fig. 3E) like those described earlier (Fig. 2A) to exclude the fected with β-globin (L), reductions in site occupancy ranging possibility that DNA/RNA R-loops formed between β-globin (L) from 44% to 17%, correlated with distance from the triplex tfRNA and LCR HS2 in vivo. K562 cells were transfected with region, in the association of USF2, GATA1, and TAL1 with biotin-tagged β-globin (L) tfRNA or polyU. Isolated nuclei were LCR HS2, compared with a PolyU control (Fig. 4A, Right). In treated with RNase H, followed by a biotin pull-down assay. As contrast, binding of NF-E2 to its site 92-bp upstream of the expected, β-globin (L) tfRNA remained in association with its triplex region was not affected (Fig. 4A, Right), presumably targeting triplex region at LCR HS2 in vivo after RNase H because the perturbation induced by the triple-stranded region treatment (Fig. 3E). The ∼twofold enrichment compared with does not extend that far. the control was similar to that observed in Fig. 3D, suggesting It is well known that GATA1, NF-E2, and USF are required to that β-globin (L) tfRNA associated with LCR HS2 via triplex recruit Pol II to LCR HS2, a mechanism that involves formation formation. The interaction between β-globin (L) tfRNA and of a multimeric complex with Pol II while bound to LCR HS2 LCR HS2 is specific, as β-globin (L) tfRNA does not bind to a (31–35). Therefore, we carried out ChIP assays to examine the

A LCR HS2 LCR HS2 0.045 * 0.025 ** 0.04 0.035 0.02 * 0.03 ** 0.015 0.025 PolyU 0.02 0.01 β-globin(L) 0.015 0.01 0.005 0.005

ChIP Enrichment (/Input) ChIP Enrichment 0 0 IgG Pol II IgG NF-E2 USF2 GATA1 Tal1

BCγ-globin promoter ε-globin promoter 0.014 0.08 * 0.014 ** * ** 0.07 0.012 0.012 * 0.06 0.01 0.01 0.05 0.008 0.008 * 0.04 0.006 0.006 0.03 0.004 0.004 0.02 0.002 0.01 (/Input) ChIP Enrichment 0.002 ChIP Enrichment (/Input) ChIP Enrichment

0 0 0 IgG Pol II IgG NF-E2 USF2 GATA1 Tal1 IgG Pol II NF-E2 USF2 GATA1 Tal1 PolyU β-globin(L) PolyU β-globin(L)

Fig. 4. Triplex formed at the β-globin locus disrupts the association of adjacent transcription factors and RNA Pol II with LCR HS2 and β-like globin promoters in K562 cells. ChIP-qPCR analysis of Pol II and transcription factors in biotin β-globin (L) or biotin-PolyU RNA (control) -transfected K562 cells. ChIP enrichment is expressed as fraction of input. Fold-change for Pol II, NF-E2, USF2, GATA1, and TAL1 at LCR HS2 (A), γ-globin promoter (B), and «-globin promoter (C)is shown. IgG serves as a negative control. Error bars represent mean ± SEM; n = 3. P values from Student’s t test: **P < 0.05; *P < 0.1.

6134 | www.pnas.org/cgi/doi/10.1073/pnas.1900107116 Zhou et al. Downloaded by guest on September 27, 2021 effect of triplex formation on association of Pol II with LCR mock-transfected controls. Introducing PolyU into K562 cells HS2. To our surprise, we detected a 32% reduction in Pol II had small or no effects on the expression of these genes. Down- binding in the β-globin (L) tfRNA-transfected cells, compared regulation appeared to be specific, as the expression of the with the PolyU control (Fig. 4A, Left). Thus, the formation of housekeeping gene GAPDH and U6 snRNA, as well as MRPL49 triplexes between β-globin (L) tfRNA and LCR HS2 reduced Pol (mitochondrial ribosomal protein L49) was unchanged in these II association at LCR HS2, probably because it attenuated cells (SI Appendix, Fig. S4). Down-regulation of the target genes binding affinities of key transcription factors that are important was also observed at the protein level: Western blotting showed for Pol II recruitment to the LCR. an ∼30% decrease in e-globin and a 20% decrease in γ-globin Given the importance of LCR HS2 for activation of globin protein abundance (Fig. 5C). It should be noted (36) that de- promoters, we asked whether disruption of transcription factor letion of HS2 in mouse resulted in a 41% decrease in adult binding at HS2 was reflected in any changes at the «- and γ-globin β-globin expression from that allele. If human HS2 has a com- promoters. We observed that binding levels of Pol II, USF2, and parable effect, this would be the maximum we might expect, GATA1, all normally found at the γ-globin promoter, are re- suggesting that triplex formation abolished at least 50% of duced in cells carrying the β-globin (L) tfRNA (Fig. 4B). Binding HS2 enhancer function in vivo. These results strongly indicate of factors at the «-globin promoter is similarly reduced (Fig. 4C). that β-globin (L) tfRNA functions as a repressor of expression of β-like globin genes via triplex formation at LCR HS2. β-Globin Intronic RNA Acts as a Globin Gene Repressor. Does triplex It seemed possible that exogenous β-globin (L) tfRNA could formation and transcription factor displacement at LCR HS2 also bind in vivo to other genomic loci that contained the same also result in a change in expression of β-like globin genes in duplex DNA sequence as the triplex region of LCR HS2. To vivo? We extracted RNA from K562 cells transfected as above address this, we identified genes in the near with biotin-tagged β-globin (L) tfRNA. Gene-expression patterns polypyrimidine tracts carrying the same potential triplex-forming were analyzed by qRT-PCR. As shown in Fig. 5 A and B, there sequences as that of LCR HS2. Surprisingly, the DNA sequence were 26% reductions in expression of both the «- and γ-globin of LCR HS2 triplex region is only found near three genes: genes in β-globin RNA (L)-transfected cells compared with β-globin, IGF2BP1, and NCR3LG1 (Materials and Methods); the BIOCHEMISTRY A ε-globin B γ-globin C 1.2 1.2 ** ** 1 1 PolyU β-globin(L) Mock 0.8 0.8 ε-globin 0.6 0.6 11.10.7 0.4 0.4 γ-globin 0.2 0.2 11.00.8 Relave Expression Relave 0 Expression Relave 0 GAPDH Mock PolyU β-globin(L) Mock PolyU β-globin(L) D EFε-globin γ-globin 1.4 1.6 * *** 1.4 1.2 *** 1.2 *** 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Relave Expression Relave

Relave Expression 0 0 Ctrl pβA/X pβA/X-Δtriplex G Ctrl pβA/X pβA/X-Δtriplex H I 4 *** 7 Ctrl pβA/X pβA/X- Δtriplex *** 3.5 6 ε-globin 3 5 10.70.9 2.5 Enrichment 4 Ctrl 2 3 γ-globin pβA/X 1.5

ChIRP 2 1 10.91.0 ChIRP RNA Retrieval 1 GAPDH 0.5 0 0

Relave HS2 Negave Ctrl β-globin intron 2 U6 Relave

Fig. 5. Formation of β-globin triplex at LCR HS2 reduces expression of β-like globin genes in K562 cells. (A and B) Formation of β-globin triplex with ex- ogenous β-globin (L) tfRNA reduces expression of β-like globin genes in vivo. K562 cells were transfected with biotin β-globin (L) tfRNA or controls (mock and biotin-PolyU). qRT-PCR shows fold-change in «-(A) and γ-globin (B) mRNA in transfected cells. Error bars represent mean ± SEM; n = 3. P values from Student’s t test are indicated here and in E and F:**P < 0.05, ***P < 0.01. (C) β-globin (L) tfRNA specifically reduces e-andγ-globin protein abundance in vivo. Cells were transfected with biotin β-globin (L)-tfRNA or controls (mock and biotin-PolyU). Western blot analysis shows fold-change in e- and γ-globin protein level normalized to GAPDH and presented as relative intensity in reference to mock control. (D) Schematic organization of the human adult β-globin gene with exons shown in purple and introns shown in red. β-globin (L) RNA (yellow) is contained in the second intron of the β-globin gene. (E and F) Intronic β-globin (L) tfRNA reduces expression of β-like globin genes in vivo. K562 cells were transfected with expression constructs containing a β-globin gene with introns (pβA/X- CMV6-XL5), β-globin gene mutant with 15-nt triplex-forming region deleted in intron 2 (pβA/X-Δtriplex-CMV6-XL5) (Materials and Methods) or vector control (pCMV6-XL5, ctrl). qRT-PCR shows fold-change in «-(E) and γ-globin (F) mRNA in transfected cells. Error bars represent mean ± SEM; n = 3. *P < 0.1. (G) Intronic β-globin (L) tfRNA specifically reduces e- and γ-globin protein abundance in vivo. Cells were transfected as described above in E and F. Western blot analysis shows fold-change in e- and γ-globin protein level normalized to GAPDH and presented as relative intensity in reference to vector control (ctrl). (H) β-globin intron 2 RNA ChIRP-qPCR analysis in β-globin gene construct (pβA/X-CMV6-XL5, with introns) or mock (control) -transfected K562 cells. Fold-change for bound β-globin intron 2 RNA at LCR HS2 and a gene desert region at chromosome 12 (negative control) (Materials and Methods) is shown. Error bars represent mean ± SEM; n = 3. P values from Student’s t test: ***P < 0.01. (I) β-globin intron 2 ChIRP anti-sense DNA probes specifically retrieve β-globin intron 2 RNA, not U6 snRNA (negative control), in β-globin gene construct (pβA/X-CMV6-XL5) transfected K562 cells. Error bars represent mean ± SEM; n = 3. P values from Student’s t test: ***P < 0.01.

Zhou et al. PNAS | March 26, 2019 | vol. 116 | no. 13 | 6135 Downloaded by guest on September 27, 2021 latter two contain the triplex region within 2 kb of the gene 5H). We transfected K562 cells with the β-globin expression transcription start site and are also expressed in K562 cells. construct (pβA/X-CMV6-XL5) and confirmed the presence of Unlike the effect on the β-globin locus, introduction of β-globin the β-globin transcript, which included intron 2 RNA (SI Ap- (L) tfRNA into K562 cells did not affect IGF2BP1 or NCR3LG1 pendix, Fig. S5C). We constructed a set of biotin-labeled DNA expression (SI Appendix, Fig. S4 D and E). For IGF2BP1, this probes to intron 2 RNA and used them to isolate cross-linked may be because the only potential regulatory site close to the chromatin fragments carrying the intronic RNA sequence, triplex region is an E-box motif that has been reported to be an according to previously published protocols (39, 40) (Materials extremely weak binding site for a c-/Max heterodimer; much and Methods). We showed that these probes specifically pull stronger sites are found ∼1.3-kb upstream of the triplex region down intron 2 RNA, but not a nonspecific control (Fig. 5I). We (37). For NCR3LG1, the nearest transcription factor binding then used the probes to detect DNA associated with intron 2 motif is located 93-bp upstream of the triplex forming region. RNA and observed a strong signal above background, confirm- Although ZBTB7A binds to this site in K562 cells (ENCODE), ing that the β-globin intron 2 is bound at the HS2 locus (Fig. 5H). based on our observation of β-globin triplex action at LCR HS2, This association is specific, as binding of RNA was not enriched the distortion effect of triplex formation on the DNA duplex above background at a gene desert region (Fig. 5H). does not extend this far. Similar Effects Are Seen in Human Cells Expressing the Adult β-Globin The β-Globin Intron-Containing Transcript Affects Endogenous e- and Gene. The β-globin intronic RNA–β-globin (L) can function as a γ-Globin Gene Expression. The stability of the β-globin (L)- repressor of «- and γ-globin genes in K562 cells (Fig. 5 A–C). containing intronic RNAs, once transcribed from the adult Because all β-like globin genes are regulated by the powerful β-globin gene, clearly would be an important determinant of enhancer LCR HS2 through similar mechanisms during ery- whether it could play the regulatory role suggested by our ex- throid cell differentiation and development (41), it is reasonable periments. Because the endogenous adult β-globin gene is not to speculate that β-globin (L) tfRNA regulates adult β-globin expressed in K562 cells (38), to address this question we con- gene expression through comparable mechanisms. To address structed a gene-expression vector that contained the β-globin this question, we used HUDEP-2 cells, an immortalized human gene with all of its coding regions and introns (pβA/X-CMV6- umbilical cord blood-derived erythroid progenitor cell line XL5), driven by a CMV promoter (Fig. 5D). We separately that expresses β-globin predominately over γ-globin (42, 43). transfected this construct, or the vector (pCMV6-XL5), into HUDEP-2 cells were transfected with biotin-tagged β-globin (L)- K562 cells and examined the expression of endogenous β-like tfRNA or control RNA (biotin-PolyU), cells were collected 18-h globin genes (Fig. 5 E and F). We found that, only in cells car- posttransfection, and experiments were performed like those rying the intron-containing β-globin gene (pβA/X-CMV6-XL5), carried out in K562 cells (Fig. 6). As expected, results from the the expression of endogenous «- and γ-globin genes was reduced biotin pull-down assays show that β-globin (L) tfRNA associates by 21% and 25%, respectively, compared with control cells that with its targeting triplex region at LCR HS2 in HUDEP-2 cells were transfected with the vector (Fig. 5 E and F). The down- (Fig. 6A). There is an ∼twofold enrichment in the association of regulation of these genes is relatively specific, as the expression β-globin (L) tfRNA to LCR HS2 compared with the PolyU of another housekeeping gene, GAPDH, was not changed in any control (Fig. 6A). In addition, before the biotin pull-down assay, of the transfected cells (SI Appendix, Fig. S5A). A mock trans- treating nuclei with RNase H to remove RNAs from DNA/RNA fection control is shown in SI Appendix, Fig. S5B.Asβ-globin hybrids did not disrupt the association of β-globin (L) tfRNA to gene expression was controlled by the CMV promoter, the down- its targeting triplex region at LCR HS2 (Fig. 6A). As there is no regulation of «- and γ-globin genes was not caused by promoter significant difference in the binding affinity of β-globin (L) competition for erythroid-specific and ubiquitous transcription tfRNA for its targeting of triplex region at LCR HS2 between factors that are essential for globin gene expression. the two groups of cells with or without RNase H treatment, these To further confirm the role of the intronic triplex-forming data strongly suggest that β-globin (L) tfRNA associated with sequence in suppressing «-andγ-globin gene expression, we LCR HS2 via triplex formation in HUDEP-2 cells. repeated these experiments using the same β-globin gene- We next asked whether such triplexes formed at LCR expression vector, but with only the region coding for the 15-nt HS2 affect its enhancer activity by attenuating normally bound triplex-forming sequence deleted (pβA/X-Δtriplex-CMV6-XL5) transcription factors and Pol II levels at LCR HS2, similar to (Figs. 3B and 5D). Despite the fact that it was in all other re- what is observed in K562 cells. To test this, HUDEP-2 cells were spects like the first construct, it had no effect on «- and γ-globin transfected with biotin-tagged β-globin (L)-tfRNA or control gene expression (Fig. 5 E and F). Results of parallel experiments RNA (biotin-PolyU) followed by ChIP assays using antibodies determining effects of these constructs on «- and γ-globin protein against USF2, GATA1, Pol II, and Pol II S5P, as previously abundance were consistent with the observed down-regulation of described (Fig. 6B). We also measured the abundance of Pol II gene expression (Fig. 5G). These results support the conclusion S5P, the serine 5-phosphorylated form of Pol II associated with that the intronic β-globin (L) tfRNAs transcribed from the adult transcription initiation. Again, similarly to K562, there are ∼38– β-globin gene function as globin gene-expression repressors. 44% reductions in the association of USF2, GATA1, Pol II, and Taken together with the other experiments described above, they Pol II S5P with LCR HS2 and ∼25–31% reduction in bound Pol strongly support a mechanism that can function in cells, involving II at the γ- and adult β-globin promoters (Fig. 6 C and D) triplex formation at LCR HS2 by an intronic sequence in the in β-globin (L) tfRNA-transfected cells, compared with the β-globin transcript. PolyU control. These results suggested that, as in the case of K562 cells, Direct Detection at LCR HS2 of Bound Intronic Sequence from β-globin (L) tfRNA triplex formation at LCR HS2 regulates β-Globin Transcripts. In experiments described above we used β-like globin gene expression in HUDEP-2 cells. To address this, β-globin (L) tfRNA, a small probe carrying the triplex-forming HUDEP-2 cells transfected with biotin-tagged β-globin (L) or sequence, to demonstrate specific binding to the site in LCR control RNA (biotin-PolyU) as described above were collected HS2 (Fig. 3 D and E). To show that the triplex-forming sequence 18-h posttransfection and subjected to RNA extraction followed embedded in the full-length β-globin transcription product also is by qRT-PCR analyses. Because human β-globin mRNA is stable bound at LCR HS2 in vivo, we carried out chromatin isolation by with a half-life of 17–60 h (44–46), we examined β-globin pre- RNA purification (ChIRP) experiments (39) designed to detect mRNA expression levels in these cells. As shown in Fig. 6E, bound intron 2 containing the triplex-forming sequence (Fig. there was an ∼45% decrease in β-globin pre-mRNA expression

6136 | www.pnas.org/cgi/doi/10.1073/pnas.1900107116 Zhou et al. Downloaded by guest on September 27, 2021 A HS2 B HS2 3.5 * 0.14 *** 3 0.12 *** 2.5 0.1 2 0.08 RNase H(-) *** PolyU 0.06 ** * β-globin(L) 1.5 RNase H(+) 1 0.04 Relave Bion- 0.02

pulldown Enrichment 0.5 0 0 Relave ChIP Enrichment IgG Pol II Pol II S5P USF2 Gata1 PolyU β-globin(L) C β-globin promoter D γ-globin promoter 0.07 0.12 ** *** 0.06 ** 0.1 0.05 0.08 0.04 PolyU 0.06 PolyU 0.03 β-globin(L) β-globin(L) 0.02 0.04 0.01 0.02 Relave ChIP Enrichment

Relave ChIPRelave Enrichment 0 0 IgG Pol II Pol II S5P IgG Pol II E β-globin pre-mRNA F Gγ-globin pre-mRNA *** *** 1.4 ** 1.2 ** 1.2 *** ** 1 1 0.8 0.8 PolyU 0.6 PolyU 0.6 β-globin(L) β-globin(L) 0.4 0.4 0.2 0.2 Relave Expression Relave

Relave Expression Relave 0 0 /β-acn /GAPDH /U6 /β-acn /GAPDH /U6

Fig. 6. Formation of β-globin triplex at LCR HS2 regulates expression of β-like globin genes in HUDEP-2 cells. (A) β-globin (L) tfRNA forms triplex with LCR HS2 in HUDEP-2 cells. HUDEP-2 cells were transfected with biotin β-globin (L) or biotin-PolyU RNA (control). Fold-change in association of biotin-labeled RNAs to LCR HS2 triplex-forming region in the absence [RNase H (−)] or presence [RNase H (+)] of RNase H is shown. (B–D) ChIP analysis of Pol II and transcription BIOCHEMISTRY factors in biotin β-globin (L) or biotin-PolyU RNA (control) -transfected HUDEP-2 cells. ChIP enrichment is expressed as fraction of input. Fold-change for Pol II, Pol II S5P, GATA1, and USF2 at LCR HS2 (B), Pol II, Pol II S5P at the β-globin promoter (C), and Pol II at the γ-globin promoter (D) are shown. IgG serves as a negative control. (E and F) qRT-PCR shows fold-change in β-(E) and Gγ-globin (F) pre-mRNA in biotin β-globin (L) or biotin-PolyU RNA (control) -transfected HUDEP-2 cells. Transcript expression was normalized to β-actin (/β-actin), GAPDH (/GAPDH) or U6 snRNA (/U6) respectively. All error bars represent mean ± SEM; n = 3. P values from Student’s t test: ***P < 0.01; **P < 0.05; *P < 0.1.

in the β-globin (L) tfRNA-transfected cells, compared with the says described in Fig. 2, that a triple-stranded complex had PolyU control. Interestingly, Gγ-globin pre-mRNA expression formed in vivo. was also reduced by ∼45% in these cells compared with the Our results revealed a potential in vivo pathway for regulation control cells, although it accounts for less than 5% of all β-like of human β-globin gene expression in which RNA derived from globin gene expression in HUDEP-2 cells (43) (Fig. 6F). More- an intron of the β-globin transcript forms a DNA∙RNA triplex over, the down-regulation of globin genes is specific, as shown by with sites within HS2 of the β-globin LCR. We began by in- comparison with the expression levels of housekeeping genes troducing a short biotin-labeled RNA fragment, β-globin (L) β-actin or GAPDH and U6 snRNA (Fig. 6 E and F). Similarly to tfRNA, into K562 cells and verifying that it formed triplex at K562, it is clear that the reduced expression of globin genes in HS2. Binding of the third strand resulted in displacement from transfected HUDEP-2 cells is associated with loss of key tran- binding sites in HS2 of the key transcription factors USF2, scription factors and Pol II at both LCR HS2 and β- and GATA1, and TAL1, as well as loss of Pol II that is recruited by γ-globin promoters. these factors. These sites are the ones closest to the triplex- forming region, and it is probable that addition of the RNA Discussion strand perturbs B-form DNA structure and binding site confor- We have shown that triplex formation between RNA and regu- mation for some distance on either side. As noted above, latory sites on DNA or on the coding region itself can affect gene TAL1 was displaced from a site located 63-bp upstream of the expression in human erythroid cell lines. We first identified and triplex region, but NF-E2, 92-bp away, was not. characterized triplex formation at the FAU locus, using the in- The decreased association of Pol II and these key transcription teraction to demonstrate methods necessary to distinguish factors at HS2 led to a dramatic reduction in Pol II recruitment DNA·RNA triplex from heteroduplex formation (Figs. 1 and 2). at downstream β-like globin promoters. It has been shown that a Because the triplex-forming region of FAU is a palindrome, FAU holocomplex containing transcription factors and Pol II is first tfRNA can bind to this sequence from which it was transcribed in assembled at the LCR, and then delivered to its downstream the antisense direction, and repress FAU expression (Fig. 2), β-like globin gene promoters, perhaps via a looping or tracking resulting in a decrease in abundance of FAU protein as well (Fig. mechanism (28, 47–51). Loop formation between the LCR and 2E). In the more general case involving nonpalindromic inter- downstream β-like globin gene promoters is mediated by tran- actions, the source of the transcript and its DNA target in triplex scription factors GATA1, FOG1, TAL1, and coregulators formation must be distinct. Furthermore, as has been well LDB1 and LMO2 (52–57). Moreover, TAL1 interacts with established in other studies (see Introduction), DNA·RNA tri- GATA1, LMO2, and LDB1 in a multiprotein complex and plays plexes can form only when both the RNA and one of the DNA an indispensable role in recruiting LDB1 and LMO2 to LCR strands is a polypyrimidine string. With those restrictions in HS2 (53, 54, 57–59). Therefore, the reduced association of mind, we searched for other examples of triplex-related regula- transcription factors and Pol II with the LCR could lead to tory systems in human erythroid cell lines and identified the corresponding reductions of these at downstream β-like β-globin locus. It is important to emphasize that in each experi- globin gene promoters. Indeed, as shown in Fig. 4, the associa- mental system we were careful to verify, using the nuclease as- tion of Pol II, USF2, and GATA1 with «- and γ -globin gene

Zhou et al. PNAS | March 26, 2019 | vol. 116 | no. 13 | 6137 Downloaded by guest on September 27, 2021 promoters was significantly decreased in β-globin (L)-transfected presumably acting as a control to regulate production of β-globin, cells. We also confirmed (Fig. 4 B and C) that NF-E2 does not and perhaps also helping to shut down expression of other bind to the e- and γ-globin promoters (60). Given the effects of members of the β-globin family. It has been known for some time triplex formation on the binding of these transcription factors, it that triplex-forming sequences are greatly enriched in abundance is not surprising that in K562 cells the expression of both «- and over random expectation in higher eukaryotes (64). However, γ-globin genes (Fig. 5 A–C) is decreased. DNA·RNA triplex-related mechanisms of the kind we describe These experiments showed that a small RNA carrying a here require the additional constraint that RNA base sequences β-globin intronic triplex-forming sequence could affect expres- should consist largely of uracil interspersed with relatively small « γ sion of the - and -globin genes in K562 cells. If this interaction amounts of cytosine. has regulatory implications, primary transcripts carrying the full- There are at least two other known examples of cis-acting β length -globin second intron should also affect expression of DNA·RNA triplex regulation (Fig. 7). One of these, the FAU these genes. Accordingly, we introduced into K562 cells plasmids gene system that we describe here (Figs. 1 and 2), is the most direct: β containing the entire -globin genomic sequence. As shown in the antisense transcript targets its own template and inhibits tran- Fig. 5 E–G, this resulted in down-regulation of «- and γ-globin β scription of the gene. Another triplex system is found at the human gene expression, similarly to what was observed with -globin (L) SPHK1 locus, where a segment of the antisense Khps1 lncRNA tfRNA (Fig. 5 A–C). The effect was attributable to the triplex- forms a triplex with a nearby upstream sequence (65). This lncRNA forming region within the second intron: when the experiment recruits p300/CBP, which in turn contributes to activation of was repeated with that 15-nt triplex-forming region specifically SPHK1 transcription. In these two cases, as well as for the β-globin deleted from the RNA, no change in «- and γ -globin gene ex- SI Appendix pression was observed (Fig. 5 E–G). Finally, to provide direct system described here ( ,Fig.S6), triplex formation is evidence that the RNA of the intronic region is bound to HS2 in part of a cis-acting feedback system. It seems likely that this form of vivo, we carried out ChIRP experiments to identify geno- regulation will be seen at many other sites in the genome. mic DNA sequences associated with intronic RNA-containing Materials and Methods triplex-forming sequences of the β-globin transcript. As shown in Fig. 5H, those RNA sequences are associated with the chro- The SI Appendix, Materials and Methods includes descriptions of the fol- matin at HS2. lowing procedures: in vitro triplex formation and gel-shift assay, DMS footprinting assay, RNA isolation and qRT-PCR, R-loop elimination assay, Although the majority of intronic RNAs appear to have a biotin-labeled tfRNA pull-down assay, chromatin immunoprecipitation, relatively short half-life after being spliced out from pre-mRNA, ChIRP, as well as other, standard, procedures used in this paper. there has been extensive evidence indicating that there is a population of introns, including but not limited to alternatively spliced introns, that is processed more slowly than the typical constitutive introns (61). Many families of noncoding RNAs contain members associated with introns (62). A novel class of posttranscriptionally spliced introns, named detained introns, has recently been characterized in mouse embryonic stem cells. These detained introns have a mean half-life of 29 min while remaining in the nucleus and are resistant to nonsense-mediated decay (61). Some detained introns can have half-lives of over 1 h. Furthermore, certain introns remain stable in the nucleus by forming higher RNA secondary structures after intron splicing, and such intronic RNA plays an essential role in gene regulation. For example, it has been reported that in the Ig heavy-chain (IgH) locus, the intronic switch RNA that is transcribed from the Ig switch (S) region forms RNA G-quadruplexes following intron splicing and lariat debranching. Such intronic RNA di- rectly interacts with activation-induced cytidine deaminase (AID) and mediates AID targeting to S-region DNA during class switch recombination in vivo (63). Interestingly, as shown in Figs. 2, 3, and 6, tf ssRNAs can be stabilized through triplex formation with their target dsDNAs in vivo. K562 cells provide a good system in which to investigate the role in regulation of ectopically expressed β-globin transcripts, because that gene is not normally expressed in these cells. To complement these results, we carried out a parallel set of ex- Fig. 7. Models for RNA-mediated dsDNA/ssRNA triplexes formation and β their roles in gene regulation. (Top) As shown for the FAU gene, palindromic periments in HUDEP-2 cells, which express -globin pre- FAU-tfRNA forms triplex with its own DNA template. Formation of such a dominantly, as well as smaller amounts of γ-globin. Just as with β triplex reduces FAU gene expression, possibly by impeding Pol II transcrip- K562 cells, the small biotin-tagged -globin (L) tfRNA bound to tion. However, the physiological significance of this interaction is not the triplex-forming site on HS2 and displaced bound transcrip- known. (Middle) As shown for the β-globin gene, intronic β-globin-tfRNA tion factors and Pol II, with indirect displacement effects at forms a triplex with the powerful β-like globin gene enhancer LCR at HS2. downstream promoters (Fig. 6). In this case, however, the down- Formation of such a triplex attenuates the association of key transcription regulated genes were β- and Gγ -globin, both reduced in pre- factors and Pol II near the triplex-forming region at LCR HS2, presumably mRNA expression levels by ∼45%. Our results further confirm due to the conformation change at duplex DNA. This results additionally in β that the human β-globin locus can use triplex formation to reg- loss of these factors and Pol II at downstream -like globin promoters, and further reduces β-like globin gene expression. (Bottom) As demonstrated in ulate gene expression. We suggest that in normal erythroid cells Postepska-Igielska et al. (65), a homopyrimidine segment of lncRNA Khps1 forms triplex formation would occur primarily in cis, so that the ∼ β triplex with its own template DNA located 400-bp upstream of the SPHK1-B -globin transcript targets only the LCR HS2 element (SI Ap- gene transcription start site. Formation of such triplex functions as an RNA pendix, Fig. S6), with which it makes contact (49). This would anchor and helps to bring the Khps1 associated histone acetyltransferase have a negative-feedback effect on the expression of this gene, p300 near the SPHK1-B gene promoter.

6138 | www.pnas.org/cgi/doi/10.1073/pnas.1900107116 Zhou et al. Downloaded by guest on September 27, 2021 ACKNOWLEDGMENTS. We thank Dr. Ryo Kurita and Dr. Yukio Nakamura for cell culture; and Drs. Rodolfo Ghirlando, Cecilia Trainor, and Tiaojiang Xiao their great generosity in providing us with the HUDEP-2 cell line as an for their advice and criticism. This work was supported by the Intramural academic collaboration work; Prof. Jörg Bungert for the plasmid containing Research Program of the National Institute of Diabetes and Digestive and the human β-globin gene; Dr. Jeffery Miller and Colleen Byrnes for help in Kidney Diseases, National Institutes of Health.

1. Felsenfeld G, Davies DR, Rich A (1957) Formation of a three-stranded polynucleotide 34. Liang SY, et al. (2009) Defective erythropoiesis in transgenic mice expressing molecule. J Am Chem Soc 79:2023–2024. dominant-negative upstream stimulatory factor. Mol Cell Biol 29:5900–5910. 2. Felsenfeld G, Rich A (1957) Studies on the formation of two- and three-stranded 35. Johnson KD, et al. (2002) Cooperative activities of hematopoietic regulators recruit polyribonucleotides. Biochim Biophys Acta 26:457–468. RNA polymerase II to a tissue-specific chromatin domain. Proc Natl Acad Sci USA 99: 3. McDonald CD, Maher LJ, 3rd (1995) Recognition of duplex DNA by RNA polynucleo- 11760–11765. tides. Nucleic Acids Res 23:500–506. 36. Bender MA, et al. (2012) The hypersensitive sites of the murine β-globin locus control 4. Soyfer VN, Potaman VN (1995) Triple-Helical Nucleic Acids (Springer, New York). region act independently to affect nuclear localization and transcriptional elonga- 5. Lyamichev VI, Mirkin SM, Frank-Kamenetskii MD (1986) Structures of homopurine- tion. Blood 119:3820–3827. – homopyrimidine tract in superhelical DNA. J Biomol Struct Dyn 3:667 669. 37. Noubissi FK, Nikiforov MA, Colburn N, Spiegelman VS (2010) Transcriptional regula- 6. Mirkin SM, et al. (1987) DNA H form requires a homopurine-homopyrimidine mirror tion of CRD-BP by c-myc: Implications for c-myc functions. Genes Cancer 1:1074–1082. – repeat. Nature 330:495 497. 38. Dean A, Ley TJ, Humphries RK, Fordis M, Schechter AN (1983) Inducible transcription 7. Brown JA, et al. (2014) Structural insights into the stabilization of MALAT1 noncoding of five globin genes in K562 human leukemia cells. Proc Natl Acad Sci USA 80: RNA by a bipartite triple helix. Nat Struct Mol Biol 21:633–640. 5515–5519. 8. Brown JA, Kinzig CG, DeGregorio SJ, Steitz JA (2016) Hoogsteen-position pyrimidines 39. Chu C, Qu K, Zhong FL, Artandi SE, Chang HY (2011) Genomic maps of long non- promote the stability and function of the MALAT1 RNA triple helix. RNA 22:743–749. coding RNA occupancy reveal principles of RNA-chromatin interactions. Mol Cell 44: 9. Brown JA, Valenstein ML, Yario TA, Tycowski KT, Steitz JA (2012) Formation of triple- 667–678. helical structures by the 3′-end sequences of MALAT1 and MENβ noncoding RNAs. 40. Chu C, Chang HY (2016) Understanding RNA-chromatin interactions using Chromatin Proc Natl Acad Sci USA 109:19202–19207. Isolation by RNA Purification (ChIRP). Methods Mol Biol 1480:115–123. 10. Mitton-Fry RM, DeGregorio SJ, Wang J, Steitz TA, Steitz JA (2010) Poly(A) tail rec- 41. Peterson KR, et al. (1996) Effect of deletion of 5‘HS3 or 5’HS2 of the human beta- ognition by a viral RNA element through assembly of a triple helix. Science 330: globin locus control region on the developmental regulation of globin gene ex- 1244–1247. 11. Cash DD, et al. (2013) Pyrimidine motif triple helix in the Kluyveromyces lactis telo- pression in beta-globin locus yeast artificial chromosome transgenic mice. Proc Natl – merase RNA pseudoknot is essential for function in vivo. Proc Natl Acad Sci USA 110: Acad Sci USA 93:6605 6609. 10970–10975. 42. Kurita R, et al. (2013) Establishment of immortalized human erythroid progenitor cell 12. Wilusz JE, et al. (2012) A triple helix stabilizes the 3′ ends of long noncoding RNAs lines able to produce enucleated red blood cells. PLoS One 8:e59890. that lack poly(A) tails. Genes Dev 26:2392–2407. 43. Guo Y, et al. (2015) CRISPR inversion of CTCF sites alters genome topology and en- – 13. Skoog JU, Maher LJ, 3rd (1993) Repression of bacteriophage promoters by DNA and hancer/promoter function. Cell 162:900 910. RNA oligonucleotides. Nucleic Acids Res 21:2131–2138. 44. Ross J, Pizarro A (1983) Human beta and delta globin messenger RNAs turn over at

14. Semerad CL, Maher LJ, 3rd (1994) Exclusion of RNA strands from a purine motif triple different rates. J Mol Biol 167:607–617. BIOCHEMISTRY helix. Nucleic Acids Res 22:5321–5325. 45. Ross J, Sullivan TD (1985) Half-lives of beta and gamma globin messenger RNAs and of 15. Roberts RW, Crothers DM (1992) Stability and properties of double and triple helices: protein synthetic capacity in cultured human reticulocytes. Blood 66:1149–1154. Dramatic effects of RNA or DNA backbone composition. Science 258:1463–1466. 46. Russell JE, Morales J, Liebhaber SA (1997) The role of mRNA stability in the control of 16. Han H, Dervan PB (1994) Different conformational families of pyrimidine.purine.py- globin gene expression. Prog Nucleic Acid Res Mol Biol 57:249–287. rimidine triple helices depending on backbone composition. Nucleic Acids Res 22: 47. Higgs DR, Engel JD, Stamatoyannopoulos G (2012) Thalassaemia. Lancet 379:373–383. 2837–2844. 48. de Laat W, et al. (2008) Three-dimensional organization of gene expression in ery- 17. Han H, Dervan PB (1993) Sequence-specific recognition of double helical RNA and throid cells. Curr Top Dev Biol 82:117–139. RNA.DNA by triple helix formation. Proc Natl Acad Sci USA 90:3806–3810. 49. Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W (2002) Looping and interaction 18. Moss EL, Mourtada-Maarabouni M, Pickard MR, Redman CW, Williams GT (2010) FAU between hypersensitive sites in the active beta-globin locus. Mol Cell 10:1453–1465. regulates carboplatin resistance in ovarian cancer. Genes Cancer 49: 50. Palstra RJ, et al. (2003) The beta-globin nuclear compartment in development and 70–77. erythroid differentiation. Nat Genet 35:190–194. 19. Pickard MR, Edwards SE, Cooper CS, Williams GT (2010) Apoptosis regulators Fau and 51. Dostie J, et al. (2006) Chromosome conformation capture carbon copy (5C): A mas- Bcl-G are down-regulated in prostate cancer. Prostate 70:1513–1523. sively parallel solution for mapping interactions between genomic elements. Genome 20. Pickard MR, et al. (2009) Dysregulated expression of Fau and MELK is associated with Res 16:1299–1309. poor prognosis in breast cancer. Breast Cancer Res 11:R60. 52. Vakoc CR, et al. (2005) Proximity among distant regulatory elements at the beta- 21. Pickard MR, Mourtada-Maarabouni M, Williams GT (2011) Candidate tumour sup- globin locus requires GATA-1 and FOG-1. Mol Cell 17:453–462. pressor Fau regulates apoptosis in human cells: An essential role for Bcl-G. Biochim 53. Song SH, Hou C, Dean A (2007) A positive role for NLI/Ldb1 in long-range beta-globin – Biophys Acta 1812:1146 1153. locus control region function. Mol Cell 28:810–822. 22. Santos-Pereira JM, Aguilera A (2015) R loops: New modulators of genome dynamics 54. Yun WJ, et al. (2014) The hematopoietic regulator TAL1 is required for chromatin – and function. Nat Rev Genet 16:583 597. looping between the β-globin LCR and human γ-globin genes to activate transcrip- 23. Beal PA, Dervan PB (1992) The influence of single base triplet changes on the stability tion. Nucleic Acids Res 42:4283–4293. of a pur.pur.pyr triple helix determined by affinity cleaving. Nucleic Acids Res 20: 55. Krivega I, Dale RK, Dean A (2014) Role of LDB1 in the transition from chromatin 2773–2776. looping to transcription activation. Genes Dev 28:1278–1290. 24. Greaves DR, et al. (1989) The beta-globin dominant control region. Prog Clin Biol Res 56. Deng W, et al. (2012) Controlling long-range genomic interactions at a native locus by 316A:37–46. targeted tethering of a looping factor. Cell 149:1233–1244. 25. Levings PP, Bungert J (2002) The human beta-globin locus control region. Eur J 57. Wadman IA, et al. (1997) The LIM-only protein Lmo2 is a bridging molecule assem- Biochem 269:1589–1599. bling an erythroid, DNA-binding complex which includes the TAL1, E47, GATA-1 and 26. Thuong N, Helene C (1993) Sequence-specific recognition and modification of double- Ldb1/NLI proteins. EMBO J 16:3145–3157. helical DNA by oligonucleotides. Angew Chem Int Ed Engl 32:666–690. 58. Xu Z, Huang S, Chang LS, Agulnick AD, Brandt SJ (2003) Identification of a 27. Felsenfeld G, Miles HT (1967) The physical and chemical properties of nucleic acids. TAL1 target gene reveals a positive role for the LIM domain-binding protein Ldb1 in Annu Rev Biochem 36:407–448. – 28. Liang S, Moghimi B, Yang TP, Strouboulis J, Bungert J (2008) Locus control region erythroid gene expression and differentiation. Mol Cell Biol 23:7585 7599. mediated regulation of adult beta-globin gene expression. J Cell Biochem 105:9–16. 59. Soler E, et al. (2010) The genome-wide dynamics of the binding of Ldb1 complexes – 29. Liu K, Sasisekharan V, Miles HT, Raghunathan G (1996) Structure of Py.Pu.Py DNA during erythroid differentiation. Genes Dev 24:277 289. triple helices. Fourier transforms of fiber-type X-ray diffraction of single crystals. 60. Forsberg EC, Downs KM, Bresnick EH (2000) Direct interaction of NF-E2 with hyper- – Biopolymers 39:573–589. sensitive site 2 of the beta-globin locus control region in living cells. Blood 96:334 339. 30. Rhee S, Han Z-j, Liu K, Miles HT, Davies DR (1999) Structure of a triple helical DNA with 61. Boutz PL, Bhutkar A, Sharp PA (2015) Detained introns are a novel, widespread class a triplex-duplex junction. Biochemistry 38:16810–16815. of post-transcriptionally spliced introns. Genes Dev 29:63–80. 31. Zhou Z, et al. (2010) USF and NF-E2 cooperate to regulate the recruitment and activity 62. Chorev M, Carmel L (2012) The function of introns. Front Genet 3:55. of RNA polymerase II in the beta-globin gene locus. J Biol Chem 285:15894–15905. 63. Zheng S, et al. (2015) Non-coding RNA generated following lariat debranching me- 32. Johnson KD, Christensen HM, Zhao B, Bresnick EH (2001) Distinct mechanisms control diates targeting of AID to DNA. Cell 161:762–773. RNA polymerase II recruitment to a tissue-specific locus control region and a down- 64. Cox R, Mirkin SM (1997) Characteristic enrichment of DNA repeats in different ge- stream promoter. Mol Cell 8:465–471. nomes. Proc Natl Acad Sci USA 94:5237–5242. 33. Crusselle-Davis VJ, Vieira KF, Zhou Z, Anantharaman A, Bungert J (2006) Antagonistic 65. Postepska-Igielska A, et al. (2015) LncRNA Khps1 regulates expression of the proto- regulation of beta-globin gene expression by helix-loop-helix proteins USF and TFII-I. oncogene SPHK1 via triplex-mediated changes in chromatin structure. Mol Cell 60: Mol Cell Biol 26:6832–6843. 626–636.

Zhou et al. PNAS | March 26, 2019 | vol. 116 | no. 13 | 6139 Downloaded by guest on September 27, 2021