Distal Cpg Islands Can Serve As Alternative Promoters to Transcribe 2 Genes with Silenced Proximal-Promoters
Total Page:16
File Type:pdf, Size:1020Kb
Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press 1 Distal CpG islands can serve as alternative promoters to transcribe 2 genes with silenced proximal-promoters. 3 4 Shrutii Sarda1, Avinash Das1, Charles Vinson2 and Sridhar Hannenhalli1,* 5 1Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA 6 2Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA 7 8 *Corresponding author: 9 Sridhar Hannenhalli 10 3104G Biomolecular Sciences Building 11 University of Maryland 12 College Park, MD 20742 13 301 405 8219 (v) 14 301 314 1341 (f) 15 [email protected] 16 17 Subject categories: Transcriptional Regulation, Epigenetics and Computational Biology 18 Keywords: DNA methylation, promoter silencing, CpG islands, alternative promoters, genome-scale. 19 20 Running title: Orphan CGIs transcribe methylated-promoter genes. 21 22 23 24 25 26 27 Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press 28 ABSTRACT 29 DNA methylation at the promoter of a gene is presumed to render it silent, yet, a sizable fraction of 30 genes with methylated proximal-promoters exhibit elevated expression. Here, we show, through 31 extensive analysis of the methylome and transcriptome in 34 tissues, that in many such cases, 32 transcription is initiated by a distal upstream CpG island (CGI) located several kilobases away that 33 functions as an alternative promoter. Specifically, such genes are expressed precisely when the 34 neighboring CGI remains unmethylated, but remain silenced otherwise. Based on CAGE and PolII 35 localization data, we found strong evidence of transcription initiation at the upstream CGI and a lack 36 thereof at the methylated proximal promoter itself. Consistent with their alternative promoter activity, 37 CGI-initiated transcripts are associated with signals of stable elongation and splicing that extend into the 38 gene body, as evidenced by tissue-specific RNA-seq and other DNA-encoded splice signals. Furthermore, 39 based on both inter and intra-species analyses, such CGIs were found to be under greater purifying 40 selection relative to CGIs upstream of silenced genes. Overall, our study describes a hitherto unreported 41 conserved mechanism of transcription of genes with methylated proximal promoters in a tissue-specific 42 fashion. Importantly, this phenomenon explains the aberrant expression patterns of some cancer driver 43 genes, potentially due to aberrant hypomethylation of distal CGIs, despite methylation at proximal 44 promoters. 45 INTRODUCTION 46 In mammalian DNA, cytosines within CpG dinucleotides are heavily methylated throughout the genome, 47 yet there are several discrete “islands” that contain a high frequency of unmethylated CpG sites. These 48 are called CpG islands (CGI), and their identification has long been considered important in the 49 annotation of functional landmarks within the genome. Historically, CGIs served as landing strips to 50 locate annotated genes (Larsen et al. 1992), and it was for good reason as it was later discovered that 51 55-60% of all genes contain CGIs at their annotated promoters. While about half of all CGIs in the 52 genome coincide with gene promoters, the remaining half are either intragenic or intergenic and were 53 termed “orphan CGIs” due to their remote location that suggested the uncertainty over their biological 54 significance (Deaton and Bird 2011). 55 Does there exist evidence to support the idea that orphan CGIs are involved in gene regulation? Indeed, 56 several specific examples, showing promoter activity at orphan CGIs were uncovered in the context of 57 critical functions like imprinting and development (Deaton et al. 2011). For example, a CGI in intron 10 58 of the imprinted Kcnq1 gene (Mancini-DiNardo et al. 2003) promotes the initiation of a noncoding 59 transcript (Kcnq1ot1) required for the imprinting of several genes at this locus. Tissue-specific 60 alternative promoter activity was detected at another orphan CGI that promotes a specific isoform of 61 the Rapgef4 gene (Hoivik et al. 2013). Cumulative evidence suggest that most, perhaps all, CGIs have 62 promoter-like characteristics and are sites of transcription initiation (Illingworth et al. 2010). 63 Additionally, most of the conserved methylation differences between tissues occurred at orphan CGIs 64 (Illingworth and Bird 2009), suggesting that they are tightly regulated. A recent study that derived CGI 65 annotations from experimental methylation data (eCGIs) also showed that promoter-distal eCGIs 66 exhibited the most tissue-specific methylation patterns, and were linked to the tissue-specific 67 production of alternative transcripts (Mendizabal and Yi 2015). In fact, studies profiling CpG methylation 68 patterns have identified differentially methylated regions (DMRs) even in the shores of CpG islands 69 (Pollard et al. 2009). These regions of lower CpG density in close proximity (up to 2 kb) to CGIs, whose Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press 70 differential methylation patterns are strongly related to gene expression, are highly conserved and have 71 distinct tissue-specific methylation patterns (Irizarry et al. 2009a). Thus, over time, CG-dense genomic 72 loci (viz., CGIs and their shores) have been realized to be increasingly important in many functional 73 contexts, whose immense regulatory potential outside of annotated promoters is only beginning to be 74 understood. 75 Typically, methylation at a gene’s promoter renders it silent (Han et al. 2011) by modifying DNA 76 accessibility to the transcriptional machinery (Suzuki and Bird 2008), or by recruiting factors that aid in 77 generating a refractory chromatin conformation unsuitable for transcription (Kouzarides 2007). While 78 several prior studies (Suzuki and Bird 2008; van Eijk et al. 2012; Deaton and Bird 2011; Smith et al. 2012; 79 Sproul et al. 2011) have observed strongly negative correlations between promoter methylation and 80 gene expression, others report more nuanced relationships between the two (Wagner et al. 2014; Wan 81 et al. 2015; Shilpa et al. 2014; Martino and Saffery 2015), including a lack thereof. Additionally, there are 82 several instances of genes in cancer cells, wherein abnormal expression is persistent despite widespread 83 promoter hypermethylation (Guillaumet-Adkins et al. 2014; Moarii et al. 2015; Van Vlodrop et al. 2011). 84 These collectively indicate that additional factors controlling expression of genes with methylated 85 promoters have not been identified. Furthermore, due to the long standing interest in CGI-promoter 86 genes, most of this knowledge is based on analysis of CGI-promoters (Jones 2012), and the details of the 87 role of methylation in controlling non-CGI transcription start sites (TSSs) have largely been overlooked. 88 This lack of consensus prompted us to explore the global transcriptional landscape of methylated- 89 promoter genes. We found that substantial numbers of methylated-promoter genes (~1500 in each of 90 34 tissues) are expressed at high levels; such promoters are predominantly non-CGI, which is consistent 91 with prevailing knowledge on the rarity of methylation at CGI promoters (Illingworth et al. 2010; 92 Brandeis et al. 1994; Lienert et al. 2011). While the expression of many such genes can be attributed to 93 the use of alternate gene body promoters, as has been shown in some normal and cancer cells 94 (Nagarajan et al. 2014; Maunakea et al. 2010), we estimate that the high levels of expression realized by 95 almost 50% of all methylated-promoter genes remain completely unexplained (see Results). 96 Here we show, through detailed analyses across 34 primary tissues and cell types, that genes with 97 methylated and silenced promoters can in some instances utilize an upstream, hitherto unknown, CpG 98 island as an alternative promoter to express their gene product. Our results strongly support this 99 previously unreported general regulatory mechanism, which may play a role in promoting aberrant 100 transcription of driver genes in cancer cells. 101 RESULTS 102 Highly expressed genes with methylated promoters 103 We obtained RNA-seq expression and whole genome bisulfite sequencing (WGBS) methylation data for 104 30 primary tissues and 4 cell lines from the Roadmap Epigenomics Project (Bernstein et al. 2010) and 105 other sources (Lay et al. 2015; Ziller et al. 2013; Djebali et al. 2012; Menafra et al. 2014) (Supplemental 106 Table 1). Henceforth, we will refer to these 34 samples simply as ‘tissue types’. In a given tissue type, 107 there exists, on average about 9000 genes whose primary promoters are maintained in a heavily 108 methylated state (see Methods). Although methylation at a gene’s promoter is expected to render it 109 silent, we observed that ~1500 of such genes exhibited high levels of expression. We then excluded 110 genes whose expression could be explained by alternative gene body promoter activity (see Methods), Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press 111 and this resulted in 700 genes in each tissue whose expression remains unexplained. To specifically 112 assess the involvement of the closest upstream CGI in the expression of these genes, we restricted 113 downstream analysis to only those genes that did not have another gene annotated (including non- 114 coding RNAs) in the genomic region between the gene’s transcription start site and the CGI. This 115 eliminates potential biases owing to intervening transcriptional activity. We further verified that this 116 subset of genes was not enriched for any specific biological function or expression status compared to 117 the set of all methylated-promoter genes (Supplemental Fig. 1). These filters resulted in a set of ~3200 118 methylated-promoter genes (down from ~9000 overall) out of which ~440 (down from ~1500) are highly 119 expressed per tissue. The number of genes at various filtering stages across tissues are provided in 120 Supplemental Table 2.