DNA methylation in the body influences MeCP2-mediated gene repression

Benyam Kindea, Dennis Y. Wub, Michael E. Greenberga,1, and Harrison W. Gabelb,1

aDepartment of Neurobiology, Harvard Medical School, Boston, MA 02115; and bDepartment of Neuroscience, Washington University School of Medicine, St. Louis, MO 63110

Contributed by Michael E. Greenberg, November 21, 2016 (sent for review June 22, 2016; reviewed by Anne Brunet and Li-Huei Tsai)

Rett syndrome is a severe neurodevelopmental disorder caused by MBD of MeCP2 as essential for the high-affinity interaction be- mutations in the methyl-CpG binding protein gene (MECP2). MeCP2 tween MeCP2 and methylcytosine (6, 7). For many years methyl- is a methyl-cytosine binding protein that is proposed to function as ation of cytosines in the CpG dinucleotide context (mCG) has been a transcriptional . However, multiple stud- thought to represent the majority of DNA methylation in mam- ies comparing wild-type and MeCP2-deficient neurons have failed to malian cells and to be the major site of MeCP2 binding in neurons. identify gene expression changes consistent with loss of a classical It has recently been shown that in the brain, high levels of non- transcriptional repressor. Recent work suggests that one function of CG methylation (predominantly mCA) also contribute to the “ ” MeCP2 in neurons is to temper the expression of the longest neuronal methylome, with the number of mCA sites at late stages of neuronal maturation approaching the number of mCG in the genome by binding to methylated CA dinucleotides (mCA) – within transcribed regions of these genes. Here we explore the sites (8 10). We, and others, have recently investigated whether mechanism of mCA and MeCP2 in fine tuning the expression of long MeCP2 binds mCA sites and demonstrated that MeCP2 binds to genes. We find that mCA is not only highly enriched within mCA and symmetrically methylated CG with similarly high affinity the body of genes normally repressed by MeCP2, but also enriched (9, 11, 12). Thus, the number of possible sites of MeCP2 binding in neurons increases significantly as mCA is laid down in the post- within extended megabase-scale regions surrounding MeCP2-repressed natal period. Given that the mCA mark is deposited at the time genes. Whereas enrichment of mCA exists in a broad region around that MeCP2 levels increase postnatally, and when the phenotype of these genes, mCA together with mCG within gene bodies appears to RTT syndrome is first observed in MeCP2 mutant mice, it has NEUROSCIENCE be the primary driver of gene repression by MeCP2. Disruption of been suggested that the disruption of MeCP2 binding to mCA in methylation at CA sites within the brain results in depletion of neurons may be a key event in the etiology of RTT. Consistent with MeCP2 across genes that normally contain a high density of gene- this possibility, mutations that disrupt the function of Dnmt3a, the body mCA. We further find that the degree of gene repression by de novo methyltransferase responsible for depositing mCA MeCP2 is proportional to the total number of methylated cytosine in the brain, results in severe neurological deficits in mice that are MeCP2 binding sites across the body of a gene. These findings sug- reminiscent of phenotypes observed in MeCP2 KO mice (13). gest a model in which MeCP2 tunes gene expression in neurons by Furthermore, mutations in DNMT3A have been linked to in- binding within the transcribed regions of genes to impede the elon- tellectual disability and autism spectrum disorder in humans (14). gation of RNA . Considerable evidence supports the conclusion that when bound to mC sequences, MeCP2 functions as a repressor of . DNA methylation | Rett syndrome | MeCP2 | transcription Biochemical studies have demonstrated that the TRD of MeCP2 interacts with NCoR/SMRT and Sin3a complexes (5, ett syndrome (RTT) is a severe neurodevelopmental disorder 15). Notably, one of the most common non-MBD MeCP2 missense Rcharacterized by developmental stagnation and regression, stereotyped hand movements, seizures, and autism spectrum-like Significance behavior (1). RTT is caused by mutations in the gene encoding the methyl-CpG binding protein 2 (MECP2) (1), and the monogenic Mutations in the methyl-CpG binding protein 2 (MECP2) lead to nature of RTT provides the unique opportunity to investigate the the severe neurological disorder Rett syndrome, but our un- molecular basis of a complex human neurodevelopmental disor- derstanding of how MeCP2 regulates gene expression in the brain der. One particularly useful approach for studying RTT has been has been limited. Recently we uncovered evidence that MeCP2 to generate mouse models that harbor RTT-causing mutations in controls transcription of very long genes with critical neuronal MeCP2. These RTT-like mice recapitulate many features of RTT functions by binding a unique form of DNA methylation, enriched seen in humans, displaying defects in neural circuit excitatory– in neurons. Here, we provide evidence that MeCP2 represses inhibitory balance, increased incidence of seizures, motor dis- transcription by binding within transcribed regions of genes. We coordination, and breathing abnormalities (2, 3). show that this repressive effect is proportional to the total num- The onset of symptoms in girls with RTT and in mouse models of ber of methylated DNA binding sites for MeCP2 within each gene. the disorder occurs during a period of postnatal brain development Our findings suggest a model in which MeCP2 represses tran- in which MeCP2 accumulates to exceedingly high levels in neurons scription of long neuronal genes that contain many methylated of the brain, such that the number of MeCP2 molecules in neurons binding sites by impeding transcriptional elongation. approaches the number of in adult neuronal nuclei (4). Whereas MeCP2 is expressed to some extent in most cells of the Author contributions: B.K., M.E.G., and H.W.G. designed research; B.K., D.Y.W., and body, MeCP2 protein levels are approximately sevenfold higher in H.W.G. performed research; B.K., D.Y.W., and H.W.G. analyzed data; and B.K., M.E.G., and neurons (4). Brain-specific disruption of MeCP2 is sufficient to cause H.W.G. wrote the paper. the vast majority of RTT-like phenotypes in mice, providing evidence Reviewers: A.B., Stanford University; and L.-H.T., Massachusetts Institute of Technology. that RTT is predominantly a disorder of neuronal dysfunction (2, 3). The authors declare no conflict of interest. Key molecular functions of MeCP2 have been highlighted by the Data deposition: The data reported in this paper have been deposited in Gene Expression observation that RTT-causing mutations largely cluster into two Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE90704). functional domains: the methyl-DNA binding domain (MBD) and 1To whom correspondence may be addressed. Email: [email protected] or michael_ the transcriptional repressor domain (TRD) (5). Bird and col- [email protected]. leagues first identified MeCP2 on the basis of its high-affinity This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. binding to DNA containing mCG sequences and identified the 1073/pnas.1618737114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1618737114 PNAS Early Edition | 1of6 Downloaded by guest on September 27, 2021 mutations that leads to RTT, MeCP2 R306C, disrupts the in- broadly across the genome. In the latter case, mCA would be teraction between MeCP2 and NCoR, suggesting that a key predicted to recruit MeCP2 throughout the broad domain of function of MeCP2 is to mediate transcriptional repression (5, 16). mCA and could potentially repress the transcription of genes that Despite evidence that MeCP2 functions as a of tran- happen to reside within the mCA domain. At these sites, MeCP2 scription, identifying the specific targets of MeCP2 has proven to might function as a classical repressor that inhibits transcription by be difficult both because MeCP2 binds broadly across the entire binding to specific noncoding regulatory sequences or by com- neuronal genome (4, 12, 17, 18), and because the changes in pacting the DNA throughout broad genomic domains. Alterna- gene expression that occur in the absence of MeCP2 are small tively, binding of MeCP2 within the gene body might function (11, 12, 17, 19–23). These unique challenges have made it dif- to retard the movement of the RNA polymerase II complex. ficult to identify which changes in gene expression in the absence To begin to explore these possibilities, we examined the DNA of MeCP2 are direct consequences of MeCP2 loss and which are methylation (mCG and mCA) and MeCP2 binding profiles in and secondary effects of overall cellular dysfunction. around genes that have been consistently implicated as repressed As a strategy for identifying the direct targets of MeCP2 ac- or activated by the presence of MeCP2 across multiple studies tion, we recently sought to identify common features of genes (12), comparing these profiles to the average profiles for all other that might distinguish whether or not a gene will be misregulated genes in the genome. For this analysis, we calculated mCA or as a direct consequence of the absence of MeCP2. These anal- mCG levels as the number of unconverted cytosines sequenced yses revealed that at a genome-wide level, MeCP2 functions to during whole genome bisulfite sequencing analysis (8) within a – temper the expression of genes in a gene-length associated 1-kb window of the genome divided by the total number of cyto- manner, possibly by binding to mCA sequences within the sine positions sequenced within that window; we then plotted the transcribed region of these genes (12). Consistent with this idea, average values for windows across gene loci (SI Experimental the disruption of MeCP2 or Dnmt3a leads to up-regulation of Procedures). To assess MeCP2 binding we plotted the average long genes that contain a high density of mCA. Notably, the value of the MeCP2 ChIP divided by the input for 1-kb windows longer the gene the greater the extent of up-regulation that oc- across gene loci. Notably, this analysis revealed that genes that are curs in the absence of MeCP2 or Dnmt3a. Together with other repressed by MeCP2 are enriched for mCA and MeCP2 binding, recent studies indicating that both gene length and non-CG not only within the body of the gene, but also as far away from the DNA methylation are associated with gene regulation by MeCP2 transcribed region as several megabases 5′ of the transcriptional (11, 23), these findings suggest that MeCP2 acts at least in part as start site (TSS) and 3′ of the transcriptional end site (TES) (Fig. 1). a transcriptional repressor by functioning through brain-enriched This broad binding of MeCP2 is consistent with several distinct mCA to temper the expression of long genes in the brain. models of MeCP2 function. MeCP2 might regulate chromatin Despite this recent progress in identifying putative direct targets of MeCP2, several key gaps in knowledge remain. Whereas MeCP2 structure across the broad mCA domain, leading to silencing of binding to mCA sequences appears to be critical to MeCP2- transcription within the entire domain. Alternatively, although dependent repression of gene transcription, this binding has not MeCP2 binds throughout the mCA domain, it could function se- been established unequivocally. Furthermore, whereas our initial lectively as a repressor at specific regulatory elements or within the studies point to the binding of MeCP2 within genes as important for transcribed region to temper transcriptional elongation. transcriptional regulation, the sites of functionally relevant MeCP2 To further explore these possibilities, we analyzed the mCA binding—for example, whether they are at enhancers, promoters, and mCG content across the length of broad mCA-enriched and/or within the transcribed region of genes—remained to be domains that encompass genes to determine whether there is a determined. In the present study, we examine the patterns of DNA correlation between the degree of gene up-regulation in the absence of MeCP2 and the presence of mCA sequences within methylation across genes and provide evidence that the degree of ′ ′ repression experienced by each gene is proportional to the total the 5 flank, transcribed region, or 3 flanking region of the gene. number of MeCP2 binding sites within the transcribed region of the By calculating the Spearman correlation for 1-kb bins of DNA gene. Taken together, these findings support a model in which methylation in and around genes, we found that gene-body mCA MeCP2 binds to methylated cytosines within gene bodies with high is most highly correlated with an up-regulation of gene expres- affinity to temper gene expression, with the extent of gene repres- sion in the absence of MeCP2 compared with the TSS, 5′ or 3′ sion by MeCP2 being related to the total number of MeCP2 mol- flanking regions mCA (Fig. 1D). This suggests that the greater ecules bound across a gene. the level of gene-body mCA across a gene, the greater the extent In addition to its role as a repressor of gene expression, of gene up-regulation that occurs in the absence of MeCP2, MeCP2 may function as an of transcription. Consistent highlighting an intimate link between gene-body mCA content with this idea, many genes are down-regulated when MeCP2 and the function of MeCP2 as a repressor that tempers gene function is perturbed and MeCP2 has been reported to interact transcription within the transcribed regions of long genes. with the cAMP binding protein (CREB), a As a further test of the idea that an enrichment of gene-body neuronal stimulus-dependent activator of gene transcription mCA within genes is a reliable predictor that a given gene will be (19). Despite these findings, when we examined features of the repressed by MeCP2, we asked if the broad domain (400 kb) genes that are down-regulated when MeCP2 function is dis- encompassing short genes (<7 kb) was predictive of gene up- rupted, such as their length, density of mCA and mCG, and the regulation to a similar extent as gene-body mCA within long genes extent of MeCP2 binding, these MeCP2-activated genes were (>100 kb). If the level of mCA in the region in or around a gene, largely indistinguishable from similarly expressed genes whose rather than gene-body methylation per se, determines the extent transcription is unaffected when MeCP2 is mutated. Taken to- of repression by MeCP2, one might predict that short genes em- gether, these findings suggest that when bound to mCA, MeCP2 bedded within a broad domain of high-density mCA would be up- may function primarily as a repressor of gene expression. regulated in the absence of MeCP2. However, we find that short genes within a large domain of high-density mCA are not signif- Results icantly up-regulated when MeCP2 function is disrupted (Fig. S1). mCA and MeCP2 Binding Are Enriched in and Around MeCP2-Repressed This finding suggests that methylation of broad domains of CA Genes. We have previously shown that genes whose expression is sequences around genes (i.e., within their 5′ and 3′ flanking re- up-regulated when MeCP2 function is disrupted are significantly gions) is not sufficient to impose regulation by MeCP2 on a gene; longer than the typical gene and contain a higher density of mCA rather, the methylation must occur within a broad region of the within their gene bodies than genes compared with the typical gene itself for MeCP2 to exert an effect. Together, this analysis gene in the genome (12). However, it is not known if the high suggests that whereas genes that are repressed by MeCP2 are density of methylation of CA sites occurs specifically within the enriched for mCA within their 5′ flanking, transcribed, and 3′ transcribed regions of long genes or if this mark is laid down more flanking regions, the transcriptional repressive effects of MeCP2

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1618737114 Kinde et al. Downloaded by guest on September 27, 2021 A B C 0.2 0.05 0.8 0.04 0.0 0.6 0.03 −0.2 0.4 0.02 −0.4 mCG/CG mCA/CA 0.01 0.2 −0.6 D Metagene Metagene 0.00 Metagene 0.0 −0.8

log2 MeCP2 ChIP/Input 0.25 0.20 TSS TSS TES TES TES TSS 50kb 50kb 50kb +5kb +5kb +5kb +50kb +50kb − − 0.15 +50kb − 0.10 0.05 0.8 0.2 0.05 0.04 0.0 0.6 0.00 0.03 −0.2 −0.05 0.4 0.02 −0.4 Spearman Correlation mCG/CG mCA/CA TES 0.01 0.2 −0.6 TSS +50kb −50kb +75kb 0.00 0.0 −0.8

log2 MeCP2 ChIP/Input 0.25 0.20 +6mb +6mb +6mb −6mb −6mb −6mb −2mb +2mb +4mb −2mb +2mb +4mb −2mb +2mb +4mb −4mb −4mb −4mb 0.15 +1 0.10 1.00 0.05 0.00 0.06 0.80 0 −0.05

0.60 Spearman Correlation 0.04 TES −1 TSS mCG/CG mCA/CA 0.40

0.02 −1000kb +1000kb 0.20

−2 mCA/CA mCG/CG NEUROSCIENCE log2 MeCP2 ChIP/Input 0.00 0.00 gene gene gene promoter promoter +50kb to −1bp +50kb to −1bp +50kb to −1bp −50kb to −1bp −50kb to −1bp −50kb to −1bp +1mb to −50kb +1mb to −50kb +1mb to −50kb −1mb to −50kb −1mb to −50kb −1mb to −50kb MeCP2-repressed genes MeCP2-activated genes All other genes

Fig. 1. Relationship between genomic DNA methylation profiles, MeCP2 binding, and MeCP2-mediated gene regulation. (A–C) Plot of mean signal for mCA (A), mCG (B), or MeCP2 ChIP (C) density in the flanking 50 kb (Top)or6Mb(Middle) region around TSS and TES of MeCP2-activated genes (blue), MeCP2-repressed genes (red), and all other genes (black). To represent signal in genes of differing sizes the “metagene” region (gray) shows the average signal from +5kb downstream of the TSS to the TES in 100 equally sized bins per gene. Boxplots (Bottom) show the distributions of levels for mCA, mCG, and MeCP2 for promoters, gene bodies, and flanking regions. Methylation density was calculated from analysis of bisulfite sequencing data in ref. 8. mCA/CA and mCG/CG are calculated as the number of nonconverted cytosines divided by the total number of cytosines sequenced in the CA or CG dinucleotide sequence context within 1-kb bins.

MeCP2 ChIP density was calculated as the log2 fold change of MeCP2 ChIP-seq coverage relative to input coverage from the reanalysis of data in ref. 11. In A–C, analysis was restricted to genes >5 kb to avoid confounding affects of promoter mC depletion when analyzing the TES. Similar qualitative results were observed when including all genes. (D) Spearman correlation between mCA and mCG density in 1-kb bins in and around genes and gene misregulation in the MeCP2 KO

cerebral cortex. Spearman correlation was calculated between this methylation density (8) and the log2 fold change in gene expression of MeCP2 KO vs. WT cortex (12). Data are plotted from 50 kb upstream to 75 kb downstream of the TSS and 50 kb downstream of the TES. In D, analysis was restricted to genes >75 kb to allow for inclusion of the gene body; similar results with lower correlation values are observed when analyzing all genes.

are likely due to the binding of MeCP2 to methylated DNA within have observed that long genes containing a high density of gene- the transcribed region of the gene. body mCA are up-regulated in Dnmt3a cKO mice (12), thus phenocopying the misregulation of gene expression observed in Gene-Body mCA Is Critical for the Binding and Function of MeCP2. To mice lacking MeCP2. We conclude that mCA within gene bodies test directly the requirement of mCA for gene repression by recruits MeCP2, which in turn functions to suppress the tran- MeCP2, we used mice that lack mCA in the brain due to brain- scription of long genes. We note, however, that binding of MeCP2 specific conditional knockout (KO) of Dnmt3a (Nestin-Cre; Dnmt3a across the genome was not completely abolished by disruption of flx/flx, referred to as Dnmt3a cKO mice), the de novo methyl- mCA in the brain (Fig. 2), suggesting that MeCP2 likely has mCG- transferase that catalyzes the addition of a methyl group to cy- dependent as well as methylation-independent modes of binding tosines within CA sequences during early postnatal development in addition to its interaction with mCA. (12). To assess the influence of gene-body mCA on the distri- bution of MeCP2, we conducted MeCP2 ChIP sequencing (ChIP- MeCP2-Mediated Gene Repression Is Proportional to the Total Number seq) from the cortex of Dnmt3a cKO and littermate control mice. of MeCP2 Binding Sites Within the Body of a Gene. Previously we have Whereas the amount of MeCP2 expressed in the cortex of observed that both gene-body mCA density and gene length are Dnmt3a cKO and control mice are similar (12), ChIP-seq analysis correlated with gene repression by MeCP2, with long genes con- reveals that in the Dnmt3a cKO cortex, MeCP2 is preferentially taining a high density of mCA showing the highest degree of depleted from genes that normally contain a high density of gene- in the MeCP2 KO (12). In addition, we observe a body mCA in control mice (Fig. 2). These findings suggest that correlation between fold change in gene expression in the MeCP2 gene-body mCA is critical for the binding of MeCP2 within the KO compared with wild type (WT) and the density of MeCP2 transcribed regions of genes. Consistent with this observation, we ChIP signal within long genes (Fig. S2). These findings led us to

Kinde et al. PNAS Early Edition | 3of6 Downloaded by guest on September 27, 2021 0.2 sites per gene, despite the normally strong correlation between the total number of mCA and mCG sites per gene and gene length. 0.1 This analysis failed to reveal an association between gene length and the degree of gene up-regulation in the absence of MeCP2 Control when examining a set of genes that have a similar number of total 0.0 Dnmt3a cKO mCA and mCG sites within the body of genes (Fig. 3B). In con- trast, examination of a population of genes in which the variation −0.1 in gene length was restricted revealed that the degree of gene up- log2 MeCP2 ChIP/Input regulation correlated with the total number of mCA and mCG −0.2 sites across the body of a gene (Fig. 3C), suggesting that the total number of mCA and mCG sites across the body of a gene best 0.02 0.04 0.06 0.08 0.01 0.03 0.05 0.07 predicts gene up-regulation in the absence of MeCP2. These re- Gene-body mCA/CA sults were robust to the particular set of genes that was selected, as similar results were observed when analyzing gene populations Fig. 2. Disruption of Dnmt3a in the brain results in a mCA-associated de- pletion of MeCP2. MeCP2 ChIP-seq analysis of the cerebral cortex from over a range of restricted-length windows or restricted total mCA Dnmt3a cKO (Nestin-Cre; Dnmt3a flx/flx, red) and littermate controls and mCG windows (Fig. S3). In addition, our findings were con-

(Dnmt3a flx/flx, gray). The mean log2 fold change of MeCP2 ChIP coverage firmed using partial correlation analysis, which demonstrated that relative to input coverage in gene bodies was calculated for genes binned according to gene-body mCA/CA levels (200 genes per bin, 40 gene steps). Methylation data (from ref. 8) of the cerebral cortex was used for this analysis. A 0.15 0.10 consider the possibility that the total number of MeCP2 binding 0.05 sites across the body of a gene might best determine the extent of repression exerted by MeCP2. In considering this possibility, we 0.00

included both mCA and mCG in the analysis, reasoning that even MeCP2 KO/WT −0.05 though mCG density does not correlate with changes in gene 1.0 1.5 2.0 2.5 3.0 3.5 expression, it remains the case that mCG binds MeCP2 with high mRNA fold-change Log2 affinity, and thus the number of mCG and mCA sequences within Log10 mean total a gene is likely the most accurate estimate of the number of mCA and mCG per gene MeCP2 binding sites within a gene. Thus, we calculated the total B C 0.8 0.6 number of MeCP2 binding sites across the body of genes, sum- 0.6 0.4 ming the partial methylation frequency at each CG and CA and 0.4

examining the degree to which this value correlates with gene 0.2 Density 0.2 Density repression relative to gene length or methylation density alone. 0.0 0.0 Consistent with previous findings, we observed that gene length 0 1234 3.5 4.5 5.5 6.5 Log10 total and gene-body mCA density, but not mCG density, are correlated Log10 r mCA and mCG per gene gene length with the gene misregulation in MeCP2 KO mice (Spearman : 0.12 0.15 0.15 for gene length, 0.12 for mCA density, and −0.007 for mCG den- sity). However, the total number of mCA and mCG sites present 0.10 0.10 across the body of genes is slightly more correlated with gene 0.05 0.05 misregulation than either gene length or mCA density alone 0.00 0.00 r MeCP2 KO/WT (Spearman : 0.14 for total mCA, 0.14 for total mCG, and 0.14 for −0.05 MeCP2 KO/WT −0.05 total mCA and mCG). Notably, the correlation between gene 4.9 5.0 5.1 5.2 5.3 2.8 2.9 3.0 3.1 3.2 3.3 Log2 mRNA fold-change Log2 misregulation and total MeCP2 binding sites was stronger for very Log10 mean mRNA fold-change Log2 Log10 mean total long genes (genes > 100 kb, Spearman r: 0. 28 for total mCA, 0. 25 gene length mCA and mCG per gene for total mCG, and 0.27 for total mCA and mCG; genes > 400 kb, Spearman r: 0.52 for total mCA, 0.53 for total mCG, and 0.56 for Fig. 3. The total number of methylcytosines per gene, independent of gene length, is predictive of gene repression by MeCP2. (A) Mean log2 fold change total mCA and mCG), suggesting more robust detection of these in the MeCP2 KO cortex compared with WT plotted for genes according to effects for genes with many mC sites. Visulization of the change in the log total number of mCA and mCG sites per gene. (B) Distribution of gene expression in the MeCP2 KO compared with WT as a 10 gene-body log10 total mCA and mCG per gene (Top), with the area high- function of the total number of mCA and mCG sites per gene lighted in gray representing the population of genes analyzed in the Bottom

showed that the repression of genes by MeCP2 appears to be plot. Mean log2 fold change was plotted for genes according to gene length continuous and proportional to the total number of mCA and (Bottom) for genes that fall within the range of total mCA and mCG sites per mCG sites within the gene, with no clear minimum threshold gene indicated above. The area in gray (Bottom) indicates the maximum number of sites required for the effect (Fig. 3A). predicted change in gene expression that could possibly be associated with Given that gene length and the total number of mCG and mCA the variation in the total mCA and mCG sites per gene given the distribution sites across the body of a gene are highly correlated (Spearman r: of total mCA and mCG sites in the genes selected for analysis. (C) Distribu- Top 0.96), we next sought to determine whether the total number of tion of log10 gene length ( ), with the area highlighted in gray repre- Bottom methylation sites is significantly correlated with the degree of gene senting the population of genes analyzed in the plot. Mean log2 fold change plotted for genes according to the log10 total number of mCA and up-regulation in the absence of MeCP2 under conditions where Bottom the correlation between gene length and gene misregulation is mCG per gene ( ) for genes that fall within the indicated range of gene length. The area in gray (Bottom) indicates the maximum predicted excluded. By binning all genes in the genome by total mCA and change in gene expression for genes that could possibly be associated with mCG counts per gene, we assessed the extent of length-dependent the variation in gene length given the selected range of gene lengths in- gene up-regulation of a population of genes that fall within a re- dicated above. In A and the Bottom plots of B and C, mean log2 fold change stricted range of total mCA and mCG counts per gene. In this in gene expression was calculated for 500 gene bins, moving one gene be- way, a relationship between MeCP2-mediated gene repression tween each point (500 genes per bin, one gene step). Analyses were per- (i.e., genome-wide up-regulation) and gene length can be effectively formed on bisulfite-sequencing (8) and RNA-sequencing (12) data generated isolated away from an effect attributable to the total number of in cerebral cortex tissue.

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1618737114 Kinde et al. Downloaded by guest on September 27, 2021 the total number of mCA and mCG marks across the body of a Promoter Gene body gene contributes to the correlation with the degree of gene up- regulation in the absence of MeCP2, even when gene length mCA is excluded as a parameter (Spearman r, controlling for gene MeCP2-repressed −32 length = 0.12, P = 1.93 × 10 ). By contrast, if in this analysis we genes control for the total number of mCA and CG sites within genes, mCG the positive correlation between gene length and the up-regulation of gene expression in the absence of MeCP2 is no longer observed (Spearman r, controlling for the total number of mCA and mCG − sites per gene = −0.07, P = 2.45 × 10 15). Taken together, these “Meta” gene list analyses suggest that the total number of MeCP2 binding sites MeCP2-activated genes within a gene is an important determinant of the extent of MeCP2- “Single” gene list mediated repression for that gene. Thus, whereas many shorter genes likely experience little repression by MeCP2 because they -log10(p-value) CB CB HC have an insufficient number of MeCP2 binding sites within their HC CTX CTX 02040 transcribed regions, long genes with a high density of gene-body mCA are likely the most repressed by MeCP2 because they con- Fig. 4. Analysis of mCA density for MeCP2-repressed and MeCP2-activated − P tain the greatest number of total mCA and mCG marks per gene. genes. Heatmap summary of the log10 value of mCA/CA (green sidebar) or mCG/CG (black sidebar) for genes identified as misregulated in MeCP2 mCA Is Enriched in Genes That Are Repressed, but Not Activated by mutant mice compared with expression-matched control genes in individual “ ” MeCP2. Given that genes whose expression is increased when brain regions ( single gene list) or through metaanalysis of multiple studies “ ” MeCP2 function is disrupted contain high levels of gene-body ( meta gene list). Meta gene lists of MeCP2-activated and MeCP2-repressed mCA and are significantly longer than the average gene length genes were generated from reanalysis of eight microarray gene expression studies (12) (SI Experimental Procedures). Median –log10 P value was calcu- (12), we considered the possibility that the group of genes that is lated (paired, one-tailed t test) for MeCP2-activated (n = 536) or MeCP2- down-regulated in the absence of MeCP2 might also have a spe- repressed (n = 466) genes compared with 1,000 bootstrapped-resampled, cific methylation and/or chromatin signature that defines this set expression-matched control gene lists for each respective gene list. DNA of genes and explains how the presence of MeCP2 in the cell methylation data from whole genome bisulfite sequencing generated in the activates their expression. To address this possibility, we compared cortex (8), hippocampus (9), and the cerebellum (12) were analyzed.

the features (e.g., mCA and mCG content, acetylation, NEUROSCIENCE gene length) of genes whose expression is down-regulated in the absence of MeCP2 across studies of several brain regions with long genes that have a high-density of mCA within their tran- those of genes whose expression does not change when MeCP2 scribed region (12). In addition to this correlation with gene function is disrupted in these studies. This analysis revealed that repression by MeCP2, we show here that loss of mCA results in a with respect to mCA and mCG density, the extent of MeCP2 modest reduction in MeCP2 occupancy in genes, with the binding, the presence of histone acetylation marks and average greatest reduction in MeCP2 occurring in gene bodies that gene length these genes are largely indistinguishable. In particular, normally contain a high density of mCA. These findings suggest DNA methylation analysis from the cortex (8), cerebellum (12), or that the binding of MeCP2 to mCA sites contributes to MeCP2- hippocampus (9) revealed that neither mCA nor mCG is enriched dependent transcriptional repression. However, we note that in the promoters or gene bodies of genes whose expression is mCA density is not the sole determinant of DNA binding or gene consistently decreased in the absence of MeCP2 across multiple repression by MeCP2, as the binding of MeCP2 to chromatin is studies of MeCP2 mutants (Fig. 4). Furthermore, lists of MeCP2- not completely disrupted by erasure of the mCA mark. activated genes identified by analysis of gene expression changes In this study, we present evidence that it is not the length of a for individual brain regions showed little enrichment for mCA or gene per se, or the density of mCA irrespective of gene length, mCG or gene bodies (Fig. 4). but rather it is the total number of MeCP2 molecules bound to mCA and mCG sequences in the gene that predicts the extent of Discussion gene silencing by MeCP2. In a recent study, we had examined In this study, we explored determinants of MeCP2-mediated mCA density and gene length independently, observing that gene regulation. Consistent with MeCP2 functioning through genes that are below a minimum mCA density or a minimum gene-body DNA methylation, we find that mCA density and length do not show length-associated or mCA-associated de- MeCP2 occupancy within the transcribed region of a gene are repression in the MeCP2 KO, respectively (12). Whereas these correlated with the up-regulation of gene expression in the ab- findings suggested that there is a threshold mCA density and sence of MeCP2. Furthermore, we find that the number of mCA gene length required in order for MeCP2 to repress genes, and mCG MeCP2 binding sites within the body of a gene is a reexamination of the gene sets analyzed in our previous study better predictor of the repressive effects of MeCP2 than the indicates that the genes that are below the thresholds used density of mCA in the surrounding genomic territory in which a contain low levels of total mC (due to their short length or low gene resides. Thus, whereas the broad region around MeCP2- mCA density), and as a result, they would not be expected to be repressed genes is enriched in mCA, the level of gene-body mCA measurably affected in the MeCP2 mutant. Thus, our previous together with mCG appears to be a major determinant of tran- findings are consistent with a model in which the level of re- scriptional repression by MeCP2. In addition, DNA methylation pression exerted on a gene by MeCP2 is proportional to the total is not notably enriched at promoter regions of MeCP2-repressed number of MeCP2 binding sites in the gene. genes relative to expression-matched control genes (Fig. 4), Whereas MeCP2 binds to mCA and mCG marks with high further supporting a role for gene-body–mediated repression by affinity as assessed by in vitro and in vivo binding studies (9, 11, MeCP2. These findings raise the possibility that MeCP2 represses 12), it is notable that density of gene-body mCG does not appear gene transcription by operating within genes rather than affecting to be substantially enriched in MeCP2-repressed genes compared larger domains of chromatin or specific regulatory elements. with sets of genes whose expression is unaffected or decreased in However, future studies exploring the nature of Dnmt3a-mediated the absence of MeCP2 (Fig. 4). Compared with mCA, the density DNA methylation in maturing neurons will be critical to under- of mCG within gene bodies does not vary substantially across stand how high levels of DNA methylation accumulate in and the genome. Thus, lack of gene-body mCG enrichment in MeCP2- around MeCP2-repressed genes. repressed genes may reflect the fact that CG dinucleotides are Recently, we demonstrated that loss of mCA or disruption of generally highly methylated in the majority of gene bodies. We note MeCP2 function in the brain can lead to the up-regulation of that this lack of increased mCG density within MeCP2-repressed

Kinde et al. PNAS Early Edition | 5of6 Downloaded by guest on September 27, 2021 genes does not exclude the possibility that binding of MeCP2 to mechanism we describe here remains to be determined. In addi- mCG within these genes contributes to gene repression. Indeed, our tion, a large number of genes are down-regulated when MeCP2 analysis showing that the total number of mC sites in genes predicts function is disrupted, raising the possibility that MeCP2 is directly gene repression by MeCP2 supports a role for both mCG and mCA activating these genes. Our analyses failed to detect a robust en- in this repressive mechanism (Fig. 3). richment in MeCP2 binding and/or mCA content when these Whereas our study points to binding of MeCP2 in gene bodies genes were compared with sets of genes whose expression is un- as an important site of gene regulation, the molecular mechanism affected when MeCP2 function is disrupted. This raises the pos- by which this process occurs remains to be defined. Our findings sibility that MeCP2 may not activate genes by a direct mode of are consistent with a model in which each MeCP2 molecule bound action that requires mCA. One of the hallmark features of within a gene contributes to a cumulative repressive effect on MeCP2-deficent mouse and human neurons is decreased dendritic transcription elongation. For example, MeCP2 molecules along branching, soma, and nuclear size (25, 26). It has recently been the gene body might recruit the NCoR corepressor complex, demonstrated that mammalian cells globally scale transcription in thereby promoting a restrictive local chromatin structure that a cell-volume–dependent manner to preserve transcript concen- impedes or blocks the progress of RNA polymerase II. If each tration (27). Genes down-regulated in the absence of MeCP2 may instance of this MeCP2 binding and repression along the gene reflect a global reduction in transcription in the context of reduced leads to a slight increase in the rate of aborted transcription during cellular volume. Alternatively, genes may be targeted for gene the elongation phase of transcription for that gene, this would activation by MeCP2 by a yet-to-be-appreciated mechanism. Fu- result in the subtle down-regulation of genes containing many MeCP2 binding sites that we observe. This model is consistent ture studies will help to define the full complement of mechanisms with previous observations that interaction with NCoR is critical used by MeCP2 for its critical role in neuronal gene regulation. for the function of MeCP2 (5), and our finding that the MeCP2 Experimental Procedures R306C missense mutation, which disrupts the MeCP2–NCoR in- teraction, leads to length-associated up-regulation of gene ex- All animal experiments were performed using procedures approved by the pression in mouse brain (12). Future studies examining precisely Harvard Medical Area Institutional Animal Care and Use Committee. Anal- how transcription of long genes is affected in the MeCP2 KO and yses of gene expression, DNA methylation, and ChIP-seq were performed dissecting the role of NCoR in this process will allow us to test this through reanalysis of published datasets and through generation of MeCP2 model for gene-body–mediated regulation by MeCP2. ChIP-seq data from the cortex of the Dnmt3a conditional KO mice. SI Ex- perimental Procedures The present study describes one mode of gene repression by provides additional details. MeCP2, but other potential mechanisms of gene regulation medi- ated by this enigmatic protein likely remain to be uncovered. For ACKNOWLEDGMENTS. We thank A. Bird, G. Mandel, M. Coenraads, and members of the M.E.G. and H.W.G. laboratories for discussions and critical example, recent evidence suggests that MeCP2 recruits NCoR to reading of the manuscript. This work was supported by the Rett Syndrome specific regulatory elements in the genome to deacetylate the FOXO Research Trust and NIH Grant 5R01NS048276-12 (to M.E.G.) and NIH Grant and alter gene expression (24). The degree T32GM007753 and a Howard Hughes Medical Institute Gilliam Fellowship to which this mechanism intersects with the gene-body–mediated (to B.K.).

1. Chahrour M, Zoghbi HY (2007) The story of Rett syndrome: From clinic to neurobi- 16. Guy J, Cheval H, Selfridge J, Bird A (2011) The role of MeCP2 in the brain. Annu Rev ology. Neuron 56(3):422–437. Cell Dev Biol 27:631–652. 2. Chen RZ, Akbarian S, Tudor M, Jaenisch R (2001) Deficiency of methyl-CpG binding 17. Baker SA, et al. (2013) An AT-hook domain in MeCP2 determines the clinical course of protein-2 in CNS neurons results in a Rett-like phenotype in mice. Nat Genet 27(3): Rett syndrome and related disorders. Cell 152(5):984–996. 327–331. 18. Cohen S, et al. (2011) Genome-wide activity-dependent MeCP2 phosphorylation 3. Guy J, Hendrich B, Holmes M, Martin JE, Bird A (2001) A mouse Mecp2-null mutation regulates nervous system development and function. Neuron 72(1):72–85. causes neurological symptoms that mimic Rett syndrome. Nat Genet 27(3):322–326. 19. Chahrour M, et al. (2008) MeCP2, a key contributor to neurological disease, activates 4. Skene PJ, et al. (2010) Neuronal MeCP2 is expressed at near histone-octamer levels and represses transcription. Science 320(5880):1224–1229. Mol Cell – and globally alters the chromatin state. 37(4):457 468. 20. Ben-Shachar S, Chahrour M, Thaller C, Shaw CA, Zoghbi HY (2009) Mouse models of 5. Lyst MJ, et al. (2013) Rett syndrome mutations abolish the interaction of MeCP2 with MeCP2 disorders share gene expression changes in the cerebellum and hypothalamus. Nat Neurosci – the NCoR/SMRT co-repressor. 16(7):898 902. Hum Mol Genet 18(13):2431–2442. 6. Lewis JD, et al. (1992) Purification, sequence, and cellular localization of a novel 21. Samaco RC, et al. (2012) Crh and Oprm1 mediate anxiety-related behavior and social chromosomal protein that binds to methylated DNA. Cell 69(6):905–914. approach in a mouse model of MECP2 duplication syndrome. Nat Genet 44(2): 7. Meehan RR, Lewis JD, Bird AP (1992) Characterization of MeCP2, a vertebrate DNA 206–211. binding protein with affinity for methylated DNA. Nucleic Acids Res 20(19):5085–5092. 22. Zhao YT, Goffin D, Johnson BS, Zhou Z (2013) Loss of MeCP2 function is associated 8. Lister R, et al. (2013) Global epigenomic reconfiguration during mammalian brain with distinct gene expression changes in the striatum. Neurobiol Dis 59:257–266. development. Science 341(6146):1237905. 23. Sugino K, et al. (2014) Cell-type-specific repression by methyl-CpG-binding protein 2 is 9. Guo JU, et al. (2014) Distribution, recognition and regulation of non-CpG methylation biased toward long genes. J Neurosci 34(38):12877–12883. in the adult mammalian brain. Nat Neurosci 17(2):215–222. 24. Nott A, et al. (2016) 3 associates with MeCP2 to regulate FOXO 10. Xie W, et al. (2012) Base-resolution analyses of sequence and parent-of-origin de- and social behavior. Nat Neurosci 19(11):1497–1505. pendent DNA methylation in the mouse genome. Cell 148(4):816–831. 25. Yazdani M, et al. (2012) Disease modeling using embryonic stem cells: MeCP2 regu- 11. Chen L, et al. (2015) MeCP2 binds to non-CG methylated DNA as neurons mature, Stem Cells – influencing transcription and the timing of onset for Rett syndrome. Proc Natl Acad lates nuclear size and RNA synthesis in neurons. 30(10):2128 2139. Sci USA 112(17):5509–5514. 26. Li Y, et al. (2013) Global transcriptional and translational repression in human- Cell Stem Cell – 12. Gabel HW, et al. (2015) Disruption of DNA-methylation-dependent long gene re- embryonic-stem-cell-derived Rett syndrome neurons. 13(4):446 458. pression in Rett syndrome. Nature 522(7554):89–93. 27. Padovan-Merhar O, et al. (2015) Single mammalian cells compensate for differences 13. Nguyen S, Meletis K, Fu D, Jhaveri S, Jaenisch R (2007) Ablation of de novo DNA in cellular volume and DNA copy number through independent global transcriptional methyltransferase Dnmt3a in the nervous system leads to neuromuscular defects and mechanisms. Mol Cell 58(2):339–352. shortened lifespan. Dev Dyn 236(6):1663–1676. 28. Neph S, et al. (2012) BEDOPS: High-performance genomic feature operations. 14. Tatton-Brown K, et al.; Childhood Overgrowth Consortium (2014) Mutations in the Bioinformatics 28(14):1919–1920. DNA methyltransferase gene DNMT3A cause an overgrowth syndrome with in- 29. Quinlan AR, Hall IM (2010) BEDTools: A flexible suite of utilities for comparing ge- tellectual disability. Nat Genet 46(4):385–388. nomic features. Bioinformatics 26(6):841–842. 15. Nan X, et al. (1998) Transcriptional repression by the methyl-CpG-binding protein 30. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dis- MeCP2 involves a histone deacetylase complex. Nature 393(6683):386–389. persion for RNA-seq data with DESeq2. Genome Biol 15(12):550.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1618737114 Kinde et al. Downloaded by guest on September 27, 2021