Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals
Total Page:16
File Type:pdf, Size:1020Kb
Resource Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals Graphical Abstract Authors Lionel A. Sanz, Stella R. Hartono, Yoong Wearn Lim, ..., Paul A. Ginno, Xiaoqin Xu, Fre´ de´ ric Che´ din Correspondence fl[email protected] In Brief Sanz et al. developed a high-resolution methodology to map R-loop structures and report that, in mammalian genomes, co-transcriptional R-loop formation is prevalent, conserved, and dynamic. The study highlights that promoters and terminators are hotspots of R-loop formation and associate with specific chromatin signatures. Highlights Accession Numbers d Co-transcriptional R-loop formation is prevalent, conserved, GSE70189 and dynamic d R-loops associate with specific epigenomic signatures at promoters and terminators d Terminal R-loops associate with an insulator- and enhancer- like state d Terminal R-loops define a new class of transcription terminators Sanz et al., 2016, Molecular Cell 63, 167–178 July 7, 2016 ª 2016 Elsevier Inc. http://dx.doi.org/10.1016/j.molcel.2016.05.032 Molecular Cell Resource Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals Lionel A. Sanz,1,3 Stella R. Hartono,1,3 Yoong Wearn Lim,1,3 Sandra Steyaert,2 Aparna Rajpurkar,1 Paul A. Ginno,1,4 Xiaoqin Xu,1 and Fre´ de´ ric Che´ din1,* 1Department of Molecular and Cellular Biology and Genome Center, 1 Shields Avenue, University of California, Davis, Davis, CA 95616, USA 2Department of Mathematical Modeling, Statistics, and Bioinformatics, University of Ghent, Ghent 9000, Belgium 3Co-first author 4Present address: Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland *Correspondence: fl[email protected] http://dx.doi.org/10.1016/j.molcel.2016.05.032 SUMMARY eases, including fragile X syndrome, frontotemporal dementia, amyotrophic lateral sclerosis, and ataxia with ocular apraxia R-loops are three-stranded nucleic acid structures type 2 (Skourti-Stathaki and Proudfoot, 2014). formed upon annealing of an RNA strand to one The observation that R-loops form at class switch sequences strand of duplex DNA. We profiled R-loops using a in activated murine B cells provided the first evidence that high-resolution, strand-specific methodology in hu- R-loops form under physiological conditions (Yu et al., 2003). man and mouse cell types. R-loops are prevalent, Our recent work showed that sites of stable DNA:RNA hybrid collectively occupying up to 5% of mammalian formation can be profiled genome-wide in human cells using DNA:RNA immunoprecipitation coupled to high-throughput genomes. R-loop formation occurs over conserved sequencing (DRIP-seq), a method that relies on the unique genic hotspots such as promoter and terminator specificity of the S9.6 monoclonal antibody (Boguslawski et al., regions of poly(A)-dependent genes. In most cases, 1986) for RNA:DNA hybrids and R-loops independent of the R-loops occur co-transcriptionally and undergo dy- DNA sequence. This revealed that R-loops form normally over namic turnover. Detailed epigenomic profiling re- unmethylated CpG island (CGI) promoters (Ginno et al., 2012). vealed that R-loops associate with specific chro- The observation that R-loops also form at the 30 end of a number matin signatures. At promoters, R-loops associate of human genes (Ginno et al., 2013) together with evidence that with a hyper-accessible state characteristic of unme- terminal R-loop formation is a key step in transcription termina- thylated CpG island promoters. By contrast, terminal tion (Skourti-Stathaki et al., 2011) suggests that R-loops may R-loops associate with an enhancer- and insulator- possess more general physiological roles (Skourti-Stathaki and like state and define a broad class of transcription Proudfoot, 2014). Despite the rising importance of R-loops in pathological and terminators. Together, this suggests that the reten- physiological processes, little remains known about their preva- tion of nascent RNA transcripts at their site of lence and distribution in the human genome, their conservation expression represents an abundant, dynamic, and across cell types and species, their mechanisms of formation programmed component of the mammalian chro- and turnover, and their possible functions. Here we used a novel matin that affects chromatin patterning and the con- iteration of our original method, called DNA:RNA immunoprecip- trol of gene expression. itation followed by cDNA conversion coupled to high-throughput sequencing (DRIPc-seq), to profile R-loops genome-wide at near base-pair resolution and in a strand-specific manner. Our INTRODUCTION data support the notion that the hybridization of nascent RNA transcripts to template DNA during transcription is a pro- R-loops are three-stranded nucleic acid structures composed grammed event that associates with a variety of functional bio- of an RNA:DNA hybrid and a displaced single-stranded DNA logical outputs underlying transcription regulation and chromatin (ssDNA) loop. These non-B DNA structures have long been patterning. considered to result from rare and accidental entanglements of RNA with DNA during transcription. Aberrant R-loop formation RESULTS has been linked to increased mutagenesis, hyper-recombina- tion, rearrangements, and transcription-replication collisions High-Resolution, Strand-Specific Profiling of RNA:DNA as a result of topological, replication, or transcriptional stress Hybrids (Aguilera and Garcı´a-Muse, 2012; Hamperl and Cimprich, We developed a high-resolution, strand-specific R-loop profiling 2014). R-loop dysfunction is also thought to underlie human dis- method termed DRIPc-seq. This method builds on DRIP, except Molecular Cell 63, 167–178, July 7, 2016 ª 2016 Elsevier Inc. 167 Figure 1. Distribution, Frequency, and Dy- A 100kb chr11: 68,652,844-68,885,912 MRPL21 MRGPRD MRGPRF namics of RNA:DNA Hybrids IGHMBP2 BC039516 TPCN2 55 - (A) Screenshot of a representative genomic region. plus DRIPc data are shown in red (+ strand) and blue 36 - (À strand) (two independent replicates). DRIP-seq minus data are shown in green. DRIP-seq data after 45 - RNase A and RNase H pre-treatment are shown plus below (in khaki and teal, respectively). DRIPc-seq 29 - (B) Distribution of DRIPc peak numbers as a minus function of peak size. 24 - (C) Bar chart of DRIP-qPCR (as percent input) for a Ctrl various loci. Each bar is the average of two inde- 41 - Pre-treat with RNase A pendent experiments (shown with SE). The effect + A of RNase A and RNase H pre-treatment on DRIP- 41 - DRIP-seq Pre-treat with RNase H qPCR is shown. + H (D) Location analysis of DRIPc peaks (right) compared with expected genomic distribution B C (left). The various regions are color-coded ac- 15 No treatment 12 cording to the schematic below. X refers to an RNase A extended 3-kb region as shown. (E) Number of DRIPc peaks in the sense and 10 8 RNase H antisense orientations relative to gene transcrip- tion. 4 5 (F) R-loops were measured by DRIP-qPCR at a number of loci through a time course after DRB Count (x1,000) 0 0 .05 treatment and wash (5 indicates promoters, 3 DRIP-qPCR (% input) 0 0 indicates terminators). The initial DRIP-qPCR 0 5 10 15 value for each locus prior to DRB treatment (time DRIPc peak size (kb) TFPT SRRT CALM 3 REXO4 zero) was normalized to 100%. Error bars repre- RPL13A TRIM33 FBXL17 Negative sent the average of two independent experiments Genome DEDRIPc with SE. 1.4 7 6.5 8.3 12.8 x Promoter 6.7 4.7 6 x Terminal 5 45.6 19.4 Promoter or x Terminal 4 but suffered from limited (approximately 35.3 Genebody 3.6 3 kilobase) resolution, a higher back- 51.5 sense Antisense x 2 antisense ground, and a lack of strand specificity. 5.5 Importantly, treatment of genomic DNA TSS PAS Intergenic 1 # of DRIPc peaks (x 10,000) with Ribonuclease A (RNase A) prior to x Gene body Gene body x 0 -2 +2 +5 -2 +2 +5 (kb) DRIP resulted in no significant difference F 150 TimeTi in signal (Figure 1A; Figure S1B). By popost contrast, pre-treatment of genomic DNA 120 trtreatmente 0 min with purified human Ribonuclease H2 90 5 min (RNase H) abolished the R-loop signal. 60 110 min This shows that DRIP-seq allows the 220 min DRIP-qPCR (%) 30 specific recovery of RNA:DNA hybrids. 330 min DRIPc-seq was strand-specific and 0 445 min 5' 3' 5' 3' T T 660 min showed a lower background and greater E GB RT E GB RT TFPTTFP 5' SRSRRT 3' TFPTTFP 5' SRSRRT 3' 1201 min TPCN2 5' TRIM33 5' TRIM33 3' REXO4 3' CALM3 3' TPCN2 5'TRIM33 5' TRIM33 3' REXO4 3'CALM3 3' resolution than DRIP-seq (Figure 1A). MEPCEMEPC GB MEPCEMEPC GB DRB treatment Wash DRIP-seq and DRIPc-seq (referred to hereafter as DRIP and DRIPc, respec- tively) were reproducible between biolog- that S9.6 immunoprecipitated fragments were further digested ical replicates and in strong agreement with each other (Figures by DNase I to remove any DNA, and the RNA strands were S1B and S1C). The observation that the DRIP signal (which is recovered, reverse-transcribed to cDNA, and subjected to a overwhelmingly RNase H-sensitive and results from direct strand-specific RNA sequencing (RNA-seq) protocol (see Exper- sequencing of DNA fragments) and the RNA-based DRIPc signal imental Procedures for details; Figure S1A). This technique al- are in strong agreement argues that DRIPc specifically queries lows the capture of a highly enriched population of RNA mole- RNA strands involved in RNA:DNA interactions and not from re- cules involved in RNA:DNA hybrid interactions. sidual free RNA. To further establish this, we showed that RNase DRIPc-seq was performed on the human embryonic carci- H treatment after DRIP abrogated our ability to detect a signal af- noma Ntera2 cell line.