GC Skew at the 59 and 39 Ends of Human Genes Links R-Loop Formation to Epigenetic Regulation and Transcription Termination
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genome.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press Research GC skew at the 59 and 39 ends of human genes links R-loop formation to epigenetic regulation and transcription termination Paul A. Ginno,1,3,4 Yoong Wearn Lim,1,3 Paul L. Lott,2 Ian Korf,1,2 and Fre´de´ric Che´din1,2,5 1Department of Molecular and Cellular Biology, 2Genome Center, University of California, Davis, California 95616, USA Strand asymmetry in the distribution of guanines and cytosines, measured by GC skew, predisposes DNA sequences toward R-loop formation upon transcription. Previous work revealed that GC skew and R-loop formation associate with a core set of unmethylated CpG island (CGI) promoters in the human genome. Here, we show that GC skew can dis- tinguish four classes of promoters, including three types of CGI promoters, each associated with unique epigenetic and gene ontology signatures. In particular, we identify a strong and a weak class of CGI promoters and show that these loci are enriched in distinct chromosomal territories reflecting the intrinsic strength of their protection against DNA methylation. Interestingly, we show that strong CGI promoters are depleted from the X chromosome while weak CGIs are enriched, a property consistent with the acquisition of DNA methylation during dosage compensation. Furthermore, we identify a third class of CGI promoters based on its unique GC skew profile and show that this gene set is enriched for Polycomb group targets. Lastly, we show that nearly 2000 genes harbor GC skew at their 39 ends and that these genes are preferentially located in gene-dense regions and tend to be closely arranged. Genomic profiling of R-loops accordingly showed that a large proportion of genes with terminal GC skew form R-loops at their 39 ends, consistent with a role for these structures in permitting efficient transcription termination. Altogether, we show that GC skew and R-loop for- mation offer significant insights into the epigenetic regulation, genomic organization, and function of human genes. [Supplemental material is available for this article.] Epigenetic modifications in eukaryotes, including DNA methyla- sequences by virtue of small, trans-acting RNAs that likely guide tion and histone marks, represent a critical layer of biological epigenetic regulators through complementary RNA–DNA or RNA– information employed to maintain genomic organization and RNA interactions. In vertebrates, numerous protein–DNA inter- stability and to regulate an array of nuclear processes such as actions contribute to guiding the establishment of DNA methyla- transcription, replication, and recombination (Allis et al. 2007). tion patterns; de novo DNA methyltransferases possess significant Extensive mapping efforts have identified combinations of marks preferences for certain sequences (Wienholz et al. 2010; Xie et al. that define broad and conserved epigenetic domains in eukaryotes 2012) and can be targeted to specific sites through interactions (Filion et al. 2010; Ernst et al. 2011; Roudier et al. 2011). Such pat- with DNA binding proteins (Brenner and Fuks 2006; Hervouet terns can predict the presence and activity of noncoding cis-acting et al. 2009). DNA sequence characteristics also play key roles in regulatory elements such as promoters and enhancers. Despite protecting loci from DNA methylation. Hence, certain DNA se- these intense efforts, many fundamental questions regarding the quences can escape this prevalent modification because they are establishment and maintenance of these epigenetic marks re- recognized and bound by specific proteins. This was elegantly main unanswered. demonstrated using the lacO-LacI system (Lin et al. 2000) and, Elucidating the role that DNA sequence plays in determin- more recently, by the identification of a large set of constitutive ing epigenetic patterns is a promising area of investigation. While and lineage-specific hypomethylated regions highly enriched for conventional thinking dictates that epigenetic patterns exist DNA binding factors (Hodges et al. 2011; Lienert et al. 2011). ‘‘above and beyond’’ the DNA sequence, epigenetic patterning In vertebrate genomes, CpG islands (CGIs) represent a key often relies on sequence-driven recognition modules. In the case of set of cis-acting loci that remain protected from DNA methyla- DNA methylation, a critical mark associated with transcriptional tion.SuchprotectionisessentialgiventhatCGIsfunctionas silencing, plants make use of a specific RNA-directed DNA meth- promoters for ;60% of human genes, particularly broadly ex- ylation system to target transposons (Wassenegger et al. 1994). In pressed ‘‘housekeeping’’ genes (Illingworth and Bird 2009). The animals, a similar transposon-targeted silencing mechanism is mechanistic underpinnings of this protection are the subject of carried out through the piRNA pathway (Aravin et al. 2008). In intense curiosity, and accumulating evidence suggests that DNA both systems, epigenetic modifications are deposited to specific sequence plays a key role in the process. The CFP1 protein specif- ically recognizes unmethylated CpG-rich regions by virtue of its CXXC domain and recruits the DNA methylation–repulsive 3These authors contributed equally to this work. 4Present address: Friedrich Miescher Institute for Biomedical Re- H3K4me3 mark (Thomson et al. 2010), thereby ensuring the for- search, Basel, Switzerland. mation of a protected chromatin layer (Ooi et al. 2007). Motif 5 Corresponding author analyses have also associated particular DNA sequences with E-mail [email protected] Article published online before print. Article, supplemental material, and publi- methylated or unmethylated CGIs (Feltus et al. 2003, 2006), and cation date are at http://www.genome.org/cgi/doi/10.1101/gr.158436.113. analysis of genomic methylation profiles revealed a propensity for 23:000–000 Ó 2013, Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/13; www.genome.org Genome Research 1 www.genome.org Downloaded from genome.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press Ginno et al. G-rich sequences to resist DNA methylation (Bock et al. 2006; commitment and included numerous genes encoding for tran- Straussman et al. 2009). We recently showed that unmethylated scription factors spread out among various gene families (Fig. 1F; CGI promoters are characterized by a strong strand asymmetry in Supplemental Table 1). Class IV genes showed lower average ex- the distribution of G and C residues, a property known as GC skew pression than Class I genes in pluripotent cells (Fig. 1E). (Ginno et al. 2012). GC skew undergoes a sharp transition at the To delve deeper into the function of these promoters, we de- transcription start site (TSS) and remains high in the first ;1kbof termined whether each promoter class could be associated with the transcription unit, overlapping with the protected region. We specific epigenetic signatures. For this, we analyzed the promoter further showed that transcription through regions of GC skew proximal patterns of a number of epigenetic marks, including DNA causes R-loop structures to form, in which the newly synthesized methylation and a variety of histone modifications, as measured in RNA hybridizes back to the template DNA strand and causes the human embryonic stem cells. In the case of DNA methylation, CGI nontemplate G-rich DNA strand to remain looped out in a largely promoters (Class I, II, and IV) all showed protection around the TSS single-stranded conformation (Ginno et al. 2012). Further evi- (Fig. 2A), while Class III promoters were susceptible to this mark, as dence showed that R-loop formation plays a functional role in previously observed (Ginno et al. 2012). Among CGI promoters, protecting against DNA methylation. These findings suggest that Classes I and Class IV showed strong protection around the TSS, the unmethylated CGI state is in part encoded in their DNA se- while Class II promoters were characterized by a weaker and nar- quence and mediated by the cotranscriptional formation of R-loop rower protected region, consistent with the respective CpG density structures. Here, we refine our analysis of GC skew patterns in the and GC skew of these three classes. In terms of histone modifica- human genome and reveal new features associated with this DNA tions, Class I and Class II CGIs could be most readily distinguished sequence characteristic. by their profiles of H4K20me1 and H3K79me2 deposition (Fig. 2B,C). These two marks have been linked to transcription initia- tion and/or elongation (Nguyen and Zhang 2011; Beck et al. 2012), Results and their levels are tightly correlated with gene expression (Karlic et al. 2010; Vavouri and Lehner 2012). In both cases, Class I CGIs GC skew patterns identify four broad promoter classes showed a higher density of modification over Class II, while Class associated with unique epigenetic signatures and gene IV CGIs were intermediate. Modest differences could be observed ontologies for the distribution of H3K4me3, H3K36me3, and the H2AFZ Class I promoters, corresponding to strong CGI promoters, were histone variant among the four classes (Supplemental Fig. 1). In- characterized by the highest CpG density and GC composition and terestingly, Class IV CGI promoters were enriched for the Polycomb- a sharp rise in positive GC skew at the TSS (8332 unique TSSs in this mediated H3K27me3 mark (Fig. 2D). In addition, these promoters revised data set) (Fig. 1A). Genes associated with these promoters were preferentially associated with the H3K27 methyltransferase were prototypical housekeeping genes based on their gene ontol- EZH2 and the RNF2 (also known as RING1B) protein, two key ogies and broad expression patterns (Ginno et al. 2012). Class II members of the PRC2 and PRC1 complexes, respectively (Fig. promoters defined a set of weaker CGIs characterized by shorter 2E,F). This shows that Class IV CGIs represent a subset of CGIs that lengths (;700 bp as opposed to ;1250 bp for Class I promoters), are more likely to recruit and be regulated by the Polycomb regu- reduced CpG density and GC composition, and a sharply reduced latory complexes (PRCs).