Supporting Information

Leung et al. 10.1073/pnas.1322273111 SI Materials and Methods Hydroxymethyl DNA immunoprecipitation-quantitative PCR Mouse Embryonic Stell Cell Culture and Treatments. Mouse embry- (hMeDIP-qPCR) was conducted to validate the findings of genome- onic stem cells (mESCs) were passaged every 48 h in DMEM wide 5hmC datasets. The hydroxymethylated DNA kit (Diagenode) supplemented with 15% FBS (90 ml of FBS/600 ml of total media) was used according to the manufacturer’s protocol. Captured DNA (HyClone), 20 mM Hepes, 0.1 mM nonessential amino acids, 0.1 was analyzed by qPCR with primers targeting Etn/Mus subfamily of mM 2-mercaptoethanol, 100 units/mL penicillin, 0.05 mM strepto- ERVs. Primer sequences can be found in Table S3. mycin, leukemia inhibitory factor (1,000 units/mL), and 2 mM glutamine. Trypsinized cells were plated on tissue culture grade Combined Bisulfite Restriction Analysis. Genomic DNA from in- Setdb1 Dnmt3a/3b polystyrene plates treated with 0.2% gelatin from porcine skin dependent harvests were isolated from WT, KO, (Sigma) for at least 15 min at room temperature before use. WT, and DKO mESCs. DNA was subjected to bisulfite conversion Cells were maintained at 37 °C and 5% CO . with the EZ DNA Methylation-Gold kit (Zymo Research), as per 2 ’ Setdb1 conditional knockout (CKO) mESCs carried one null and manufacturer s protocol. Specific primers (Table S3) were used to one floxed allele of Setdb1, which could be deleted upon CRE- amplify the region of interest, which contains a restriction site for mediated excision. Treatment with 4-hydroxytamoxifen (4-OHT) either Taq I or Tai I endonuclease, only when CpG is methylated. induced the activation of ligand binding domain of estrogen re- PCR amplicons were digested overnight to ensure complete di- ceptor and CRE recombinase fusion , leading to conditional gestion. The products were resolved on a 2% agarose gel (2 g of deletion of the remaining allele (1). For 4-OHT treatment, ES cells agarose/100 ml of 1X Tris-Borate-EDTA buffer). Bands were were cultured in ES medium with 800 nM 4-OHT (Sigma) for 4 d, quantified with AlphaImager software. The raw signals for both di- and further cultured without 4-OHT for 2 d. Depletion of Setdb1 gested and undigested bands are normalized and presented as was validated by Western blotting with a Setdb1-specific antibody a proportion of WT methylation (%). Each experiment was repeated (Millipore). at least twice, with error bars reflecting the SD between replicates. Data Analysis. Processing and alignment of MethylC-Seq read sequences. MethylC-Sequencing and Bisulfite Sanger Sequencing. Genomic DNA Dnmt3a/3b was extracted from mESCs and was spiked in with unmethylated WT and DKO methylomes were mapped with a few lambda DNA (Promega). The DNA was fragmented by sonication. modifications as described by Lister et al. (6). Briefly, reads in Purified DNA fragments were end repaired. After A-tailing reaction, FastQ format files from the Illumina pipeline were trimmed and fragments were ligated to paired-end cytosine-methylated adapters all cytosine bases were replaced by thymine. Using Bowtie provided by Illumina. Size-selected adapter-ligated DNA was treated (v0.12.7) (7) these reads were aligned to two NCBI BUILD 37/ with sodium bisulfite using the EZ DNA Methylation-Gold kit MM9 reference sequences, one with thymines replacing cyto- (Zymo Research). The resulting DNA molecules were enriched by sines(C) and the complementary sequence with adenines re- placing guanines. We analyzed the two replicates for wild type PCR, purified, and sequenced following standard protocols from Dnmt3a−/− Dnmt3b−/− Illumina. Libraries were sequenced to a minimum of 7× coverage of and for (DKO) mESCs independently of the mouse genome. Bisulfite-treated genomic DNA (gDNA) har- each other to ensure reproducibility and finally pooled the re- vested from an independent biological replicate was also amplified spective WT and the KO mESC replicates. For the WT and Setdb1 KO mESC methylomes, which were produced later than twice by seminested PCR as previously described (2). Amplified Dnmt3a/3b products were sequenced by Sanger sequencing. Primer sequences the WT and DKO methylomes, the MethylC-Seq reads can be found on supplemental experimental procedures. were mapped using BSMAP (http://www.biomedcentral.com/1471- 2105/10/232). The Lambda virus genome was added throughout the ChIP-Sequencing Library Generation and Sequencing. mESCs were procedure as negative control to estimate the bisulfite sequencing processed according a ChIP protocol as previously described (3). error rate. We analyzed the two replicates for wild-type and for Dnmt3a−/− Dnmt3b−/− ChIP-sequencing (ChIP-seq) libraries were prepared and sequenced (DKO) mES cells independently of each using the Illumina instrument as per manufacturer instructions. Anti- other to ensure reproducibility and finally pooled the respective H3K9me3 (Abcam 8898) and anti-Tet1 (Millipore 09-872) anti- WT and the knockout mESC replicates. After the alignment, du- bodies were used. plicate reads were removed using the Picard software (http://picard. sourceforge.net) and all output was converted into BAM format RNA-Seq Library Generation and Sequencing. Total RNA from wild- using samtools (8). Alignments were merged in a strand specific type and Dnmt 3a/3b double knockout (DKO) mESCs were way, giving rise to distributions of bases at each position using the extracted using TRIzol (Invitrogen) according to manufacturer pileup option of samtools. Individual bases having a PHRED instructions. Ribosomal RNA was removed using the Ribom- quality score < 20 were filtered out. For each position x in the Minus Eukaryote kit (Invitrogen). The mRNA libraries were reference genome with a C the number of reads containing mC prepared for strand-specific sequencing as described previously (4). (methylated C) were counted and set in 10 relation to the total number of reads containing mC and a C at position ×. The context 5-Hydoxylmethylcytosine Detection: Selective Capture of 5- of the C (CG, CHG or CHH) at a given position x was determined Hydroxymethyl-Cytosine from Genomic DNA. We selectively cap- based on the pileup consensus call for positions x+1andx+2. tured 5-hydroxymethyl-cytosine (5hmC) for genome-wide analyses Methylation status of genomic regions. Each was divided as previously described (5). In summary, genomic DNA was frag- up into bins of 100 bp and for each bin total numbers of sequenced mented by sonication and size selected (∼300 bp). The DNA was mC and C were determined, for each context of the cytosine (CG, treated with β-glucosyltransferase (β-GT), which transfers an azide- CHG or CHH) and for Watson and Crick strand separately. In glucose to 5hmC. The purified products are then biotinylated by a similar way to the method presented by Lister et al. (6), a bi- click chemistry. This allows the labeled 5hmC containing DNA nomial test was used to distinguish whether the ratio of meth- fragments to be captured by using streptavidin beads. The DNA is ylated C to all C (mC/all C) in each bin was significantly different purified and sequenced following standard protocols from Illumina. from the error rate. As an overall error rate, we estimated the

Leung et al. www.pnas.org/cgi/content/short/1322273111 1of10 following numbers: WT, +strand: 0.0055; WT, −strand 0.0057; these windows in such a 5-kb region show consistent DNA meth- DKO, +strand: 0.0053; DKO, −strand: 0.0052. Finally, a false ylation change directions (either WT > KO or WT < KO). discovery rate cutoff of 1% was applied to call the mC ratio of a bin statistically signficantly different from the error rate. Analysis of 5hmC Data. The 5hmC data were mapped using Bowtie Segmentation and characterization of the CG-methylome in Dnmt3a/3b against the mm9 genome. The reads per kilobases per million of DKO. For each 100-bp bin of the CG-methylome in DKO, a bi- reads (RPKM) were calculated to represent the enrichment levels nomial test was carried out to distinguish between the overall of 5hmC at each genomic region by normalizing the total number average background level of methylation in the DKO cell line and of reads for both WT and Setdb1 KO cells. occurrences of enriched residual levels of methylation. As a conservative measure of methylation level we used a threshold of ChIP-Seq Data Analysis. ChIP-seq reads for IPs using antibodies 10%, which was higher than the actual measured level of 2.7%. A against H3K9me3 and respective input reads in FastQ format files − value of 1 was assigned to bins that showed a P value of 10 7 and from the Illumina pipeline were aligned to the NCBI BUILD 37/ lower, a value of 0 was assigned to those bins with higher P values. MM9 reference sequences using Bowtie software (v0.12.7) (7, 8). Bins that were not spanned by any reads were ignored in the The genome was divided up into bins of 100 bp and for each bin further segmentation process. The binary data were subjected to the number of reads in IP and input spanning the respective bin segmentation using the R package RHmm (http://cran.rproject.org/ was counted. RPKM values were calculated for IP and input web/packages/RHmm/index.html). For each chromosome sepa- independently Finally, RPKM values were computed by sub- rately, a two-state hidden Markov model (HMM) was estab- tracting RPKMinput from RPKMH3K9me3 for each bin. Regions of lished, corresponding to loci of enriched residual methylation enriched tag densities were identified using ChromaBlocks (3). and of loci with depleted methylation. Using RHmm’s “discrete” distribution option to model the emission probabilities, param- RNA-Seq Data Analysis. RNA-seq reads in FastQ format files from the eters were estimated using the Baum–Welch algorithm. Using Illumina pipeline were aligned to the NCBI BUILD 37/MM9 ref- the Viterbi algorithm, state labels of enriched and depleted erence sequences using TopHat software (v1.4.1) (9). expres- methylation were assigned. Bins of enriched state that did not sion was summarized as RPKM values using Cufflinks (9) software have at least four other bins with the neighboring three bins up- based on a gene transfer format file combining Gene Symbol and stream and three bins downstream were combined with neighbor- RefSeq definitions as retrieved from UCSC browser website. ing segments of the depleted state. Larger segments of the enriched state were then aggregated by combining consecutive bins with the Visualization of ChIP-Seq and mC/All C Ratio Data in Given Loci. To same state label. These segments of the enriched state were called display ChIP-seq data and mC ratios in given loci in form of a heat enriched residual methylation loci (ERML). These ERML were map, data were extracted for each locus including 5 kb upstream overlapped with genomic features of the mouse genome National of the start and 5 kb downstream of the end of the locus. To Center for Biotechnology Information (NCBI) BUILD 37/MM9 standardize the different lengths of the loci, the data for each and H3K9me3 data using the IRanges package in R and the re- locus were divided up into 100 bins of identical size and the spective annotation for Reference Sequence (RefSeq) transcripts respective values within one bin were averaged. These values were and repeat elements (RepeatMasker) were retrieved from Uni- color coded and displayed using TreeView (10) with the order versity of California Santa Cruz (UCSC). To produce shuffled of the depending, for example, on gene expression or ERML, for each chromosome independently, positions of ERML k-means clustering (using Cluster 3.0) (11) giving rise to heat were shuffled within those bins that were overlapped by at least one maps. Similarly, to create intensity profiles in exons and introns read, maintaining the number and the size distribution of original of all RefSeq genes, for each exon and intron, respectively, ERML found for the respective chromosome. 100-bp bin data were extracted from 5 kb upstream of the start to 5 kb downstream of the end, taking into account the orientation Determine Hypo- and Hypermethylated Regions in Setdb1 KO mESCs. of the respective gene. Within an exon or an intron, data were The numbers of methylated CGs and unmethylated CGs were de- divided up into 100 bins of identical size and the respective termined for 200-bp windows for both WT and setdb1 KO cells. We values within one bin were averaged. Finally, for any of the 200 only considered those windows that had at least five combined to- bin positions, within exon/intron or outside, values were aver- tally CG counts (methylated + unmethylated) in both WT and KO aged over all exons and introns, respectively. For the creation of cells. To identify windows that showed significant differences of averaged intensity profiles around transcription start sites (TSSs) DNA methylation levels between WT and KO, the numbers of of all RefSeq genes, 100-bp bin data for 5 kb up and 5 kb methylated CGs and unmethylated CGs in WT and KO were downstream of the TSSs were extracted and for any of the 100- subjected to Fisher’s exact test to calculate the P values, which bin positions, values were averaged over all genes. considered both DNA methylation differences and the sequencing coverages for that window. Windows that showed P ≤ 0.01 were Analysis. Gene Ontology analysis was carried out considered as candidate regions that show altered DNA methylation using DAVID Bioinformatics Resources (12). For the GO upon ablation of Setdb1. To further eliminate random noises, we analysis of differentially expressed genes, the list of background identified those windows that had at least one more such candidate genes was reduced to those genes that showed RPKM > 0inat window within a 5-kb region nearby, and at least three-quarters of least one condition, WT or DKO.

1. Matsui T, et al. (2010) Proviral silencing in embryonic stem cells requires the histone 7. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient methyltransferase ESET. Nature 464(7290):927–931. alignment of short DNA sequences to the . Genome Biol 10(3):R25. 2. Dong KB, et al. (2008) DNA methylation in ES cells requires the lysine methyltransferase 8. Li H, et al.; 1000 Genome Project Data Processing Subgroup (2009) The Sequence G9a but not its catalytic activity. EMBO J 27(20):2691–2701. Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. 3. Hawkins RD, et al. (2010) Distinct epigenomic landscapes of pluripotent and lineage- 9. Trapnell C, et al. (2010) Transcript assembly and quantification by RNA-Seq reveals committed human cells. Cell Stem Cell 6(5):479–491. unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 4. Parkhomchuk D, et al. (2009) Transcriptome analysis by strand-specific sequencing of 28(5):511–515. complementary DNA. Nucleic Acids Res 37(18):e123. 10. Saldanha AJ (2004) Java Treeview—extensible visualization of microarray data. 5. Li Y, Song CX, He C, Jin P (2012) Selective capture of 5-hydroxymethylcytosine from Bioinformatics 20(17):3246–3248. genomic DNA. J Vis Exp (68). 11. de Hoon MJ, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. 6. Lister R, et al. (2009) Human DNA methylomes at base resolution show widespread Bioinformatics 20(9):1453–1454. epigenomic differences. Nature 462(7271):315–322. 12. Huang W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57.

Leung et al. www.pnas.org/cgi/content/short/1322273111 2of10 ABChromosome 1 genome−wide density mCG/allCG Density 5101520 0 0 0.2 0.4 0.6 0.8 1 mCG/all CG 800 genome−wide density mCHG/allCHG 0.2 0.4 0.6 0.8 1 0 Density 0.02 200 400 600 0 0 0.02 0.04 0.06 0.08 0.1 genome−wide density mCHH/allCHH 1500 Density 0.005 0.01 0.015 500 1000 mCHG/all CHG 0

0 0 0.02 0.04 0.06 0.08 0.1 0.02 C 8.2%

0.01 0.015 Control overlapping mCHH/all CHH

0.005 with K9me3 (n=543) 0 91.8% Control not overlapping 0Mb 50Mb 100Mb 150Mb 200Mb with K9me3 (n=5572)

D 1 Mb E chr7: 49,500,000 50,500,000 LTR IAPLTR1a_Mm 1 non-LTR IAPEz-int WT mCG 1 MURVY-int DKO mCG IAPLTR1_Mm

IAPLTR2_Mm 1 ERML RLTR6-int PGCs mCG IAPLTR2b 1 ETnERV-int

Blastocyst mCG MMETn-int by ERML with H3K9me3 (%) RefSeq Genes 0204060

Proportion of sequence covered 0204060 Loci overlapping with ERML with H3K9me3(%)

F G LTR IAPLTR3-int LTR IAPLTR1a_Mm IAPEz-int non-LTR RLTR6-int non-LTR

60 IAPEY3-int

IAPEy-int IAPLTR4_I MMERVK10C-int IAPLTR2_Mm IAPLTR1_MmIAPLTR2a IAP-d-int MURVY-int IAPLTR2b RLTR44-int MER106B IAPLTR3 ETnERV3-int IAPLTR4 RLTR6_Mm (CTCGG)n ETnERV2-intETnERV-int tRNA-Pro-CCG MMERGLN-int IAPEY2_LTR MMETn-int RLTR27 RLTR45-int (CAATG)n LTR75_1 RMER16-int (CCCGAA)n MER83B-int covered by(%) Control Proportion of sequence Proportion of sequence tRNA-Ala-GCY

covered by all ERML (%) tRNA-Pro-CCY tRNA-Ser-TCG 02040 0204060

0204060 0204060 Loci overlapping with all ERML (%) Loci overlapping with Control (%)

Fig. S1. MethlyC-seq in wild-type and Dnmt DKO mESCs (relating to Fig. 1). (A) Distribution of mC/all C ratios throughout (bin size 10 kb) for CG, CHG, and CHH (where H is A, T, or C) context in WT (blue dots) and DKO (orange dots). Dark blue and red lines represent smoothed average values for WT and DKO, respectively. Black dotted lines represent the estimated error rate. (B) Genome-wide distribution of mC/all C in the CG, CHG, and CHH contexts for all 10-kb bins of WT mESC (continuous line) and of DKO mESC (dotted line). Vertical orange lines represent the estimated error rate. (C) Pie chart demonstrating the overlap between regions enriched for H3K9me3 in both WT and DKO mESCs and shuffled control regions as described in SI Materials and Methods.(D) UCSC genome browser screen captures illustrating the DNA methylation profiles for WT, DKO mESCs, primordial germ cells (PGCs) and blastocysts across large regions on chromosome 7. DNA methylation at enriched residual methylation loci (ERML) in DKO mESCs, PGCs, and blastocysts appear higher than neighboring regions. (E) Scatterplot showing for each repeat subfamily, the percentage of respective occurrences in the genome overlapping with ERML marked by H3K9me3 versus the percentage of base pairs of each repeat subfamily overlapping with ERML marked by H3K9me3 (n = 2,451). (F) Scatterplot showing for each repeat subfamily, the percentage of respective occurrences in the genome overlapping with all ERMLs versus the proportion of the total sequence in base pairs of each repeat subfamily overlapping with all ERML (n = 6,115). (G) Scatterplot showing for each repeat subfamily the percentage of respective occurrences in the genome overlapping with all control regions versus the proportion of the total sequence in base pairs of each repeat subfamily overlapping with all control regions (n = 2,451).

Leung et al. www.pnas.org/cgi/content/short/1322273111 3of10 A

WT Setdb1 KO Setdb1 -actin

B H3K9me3 peaks ERML Control (n=12702) (n=2451) (n=2451) *** *** 1.0 2.0

0.5 1.0

0.0 0.0 H3K9me3 enrichment (Input subtracted RPKM)

-0.5 -1.0 WT Setdb1 WT Setdb1WT Setdb1

KO KO KO

C 5kb chr 10: 82,860,000 82,871,000

SINE Repetitive LINE elements LTR

10

2 WT K9me3 10

2 Setdb1 KO K9me3 10 WT K9me3 2 Karimi et al. 10 (2011) Setdb1 KO K9me3 2

Fig. S2. H3K9me3 profile of Setdb1 KO mESCs (relating to Fig. 2). (A) Western blot confirming efficient depletion of Setdb1 upon 4-hydroxytamoxifen (4-OHT) treatment. (B) Boxplot showing median of H3K9me3 enrichment (input subtracted RPKM) in WT and Setdb1 KO mESCs for H3K9me3 peaks identified in WT mESCs (Left), ERML marked with H3K9me3 and control regions (Right). (C) UCSC genome browser screen capture illustrates that the H3K9me3 ChIP-seq patterns in WT and Setdb1 KO mESCs generated in this study are consistent with previously published datasets (1).

1. Karimi MM, et al. (2011) DNA methylation and SETDB1/H3K9me3 regulate predominantly distinct sets of genes, retroelements, and chimeric transcripts in mESCs. Cell Stem Cell 8(6): 676–687.

Leung et al. www.pnas.org/cgi/content/short/1322273111 4of10 A 7.0 C D H3K9me3 wildtype WT Setdb1 KO WT Setdb1 KO 6.0 Setdb1 KO 15.9% 15.4% 5.0 25.5% 43.1% 4.0

Density 3.0 SINE Other 2.0 LINE LTR 1.0 Repeats hypermethylated in Hypomethylated genes (n=104)

0.0 Hypermethylated genes (n=138) Setdb1 KO cells (n=1511) 0.0 0.2 0.4 0.6 0.8 1.0 -3.0 0.0 3.0 mCG ratio Input subtracted RPKM B 80 E MLV LTR CpG: 21 40 % of WT mCG in WT (%) mCG in WT

0 LTR 80 non-LTR 100% WT KO (%)

40 Setdb1

mCG in 0

Repetitive element classes KO 76% F Setdb1 K9me3 peaks with reduced ERML mCG in Setdb1 KO

709 231/ 2163 288

K9me3 peaks with reduced Control mCG in Setdb1 KO

921 19/ 2432 19

Fig. S3. DNA methylation profile of Setdb1 KO mESCs (relating to Fig. 2). (A) Densityplot shows the distribution of mCG in WT (black) and Setdb1 KO (red) mESCs. A modest increase of fully methylated regions is found in Setdb1-deleted cells. (B) Barplot of average mCG levels of mappable LTR (red) and non-LTR (blue) repeat families in WT (Upper) and Setdb1 KO (Lower) mESCs. Subsets of repetitive element families show mean hypomethylation, whereas others exhibit mean hypermethylation. (C) Pie chart demonstrating the distribution of repetitive element classes among significantly hypermethylated elements (n = 1,511) in Setdb1 KO mESCs. (D) Heat maps demonstrating the enrichment of H3K9me3 of WT and Setdb1 KO mESCs at genes that become hypomethylated (Left)or hypermethylated (Right) in the absence of Setdb1. (E) Sanger sequencing of bisulfite converted genomic DNA from an independent harvest validates a re- duction of mCG at murine leukemia virus (MLV) 5′ LTR upon Setdb1 deletion. Black, white, and gray ovals represents methylated CpG, unmethylated CpGs, and mutated CpG, respectively. Each horizontal line represents one sequenced molecule. The mean methylation level relative to WT (%) is shown on the Right.(F)Venn diagrams showing overlap between significantly hypomethylated regions in Setdb1 KO mESCs and ERML (Upper) and control shuffled regions (Lower).

Leung et al. www.pnas.org/cgi/content/short/1322273111 5of10 A ERML (n=2451) Control (n=2451) C

***

60 2.5 2.0 ETn/MusD 50

2.0 40

1.5 30

1.5 20 Enrichment (% of IP) 5hmC Enrichment (RPKM) 10 1.0 1.0 0 WT Setdb1 KO WT Setdb1 KO IgG 5hmC WT Setdb1 WT Setdb1

KO KO

B 50

40

30 KO n=169 20

10 Setdb1 in 0 Elements gaining 5hmC L1 Etn IAP GLN RLTR6 LTRIS2 Unkown RLTR45 RLTR44 RLTR10 RLTR1B RMER16 RMER13B MERVK10C Other ERV-I Other ERV-II Other ERV-III Repetitive element sub-family

Fig. S4. Description of 5hmC enrichment at ERVs in Setdb1-depleted mESCs (relating to Fig. 3) (A) Boxplot showing the 5hmC enrichment (RPKM) at ERML (n = 2,451) and control regions (n = 2,451) in WT and Setdb1 KO mESCs (***P < 0.001 as comparing WT versus Setdb1 KO with the Wilcoxon test). (B) Barplot showing the distribution of various repetitive element subfamilies among the regions exhibiting the greatest 5hmC increase upon Setdb1 deletion (n = 169). Details on the filtering threshold are described in SI Materials and Methods. The same thresholds applied to shuffled 5hmC data yielded fewer than five regions. (C) hMeDIP followed by qPCR analyses validate an increase in 5hmC enrichment at early mouse transposon (ETn)/Mus musculus type D retrovirus (MusD) LTR regions in Setdb1 KO compared with WT mESCs. IgG IP is included to show the background levels of immunoprecipitation. Error bars represent technical replicates.

Leung et al. www.pnas.org/cgi/content/short/1322273111 6of10 5kb A chr 11: 51,249,000 51,259,000

SINE IAP Ez LINE LTR 1 WT mCG 1 DKO mCG ERML 10

2 WT K9me3 10 2 Setdb1 KO K9me3 1 WT mCG 1 Setdb1 KO mCG 5 mCG (p-value) -5 10 2 WT 5hmC 10 2 Setdb1 KO 5hmC Col23a1 4 WT Tet1 4 Setdb1 KO Tet1 1 Mappability B 2kb 20kb chr 3: 145,308,500 145,315,500 chr 2: 38,190,000 38,245,000

Cyr61 Lhx2

Tet1 Tet1 Tet1 (Wu et al. 2011) Tet1 (Wu et al. 2011)

Fig. S5. Description of 5hmC and Tet1 enrichment at endogenous retroviruses (ERVs) in Setdb1 depleted mESCs (relating to Fig. 3) (A) Screen capture of a region on chromosome 11. 5hmC and Tet1 profiles are shown as RPKM values between the scales of 2–10 and 0–4, respectively. The example illustrates 5hmC increase (marked by gray shading) at the flanks of an intracisternal A-particle subfamily Ez (IAP-Ez) repeat. Mappability or uniqueness of reference genome from the ENCODE project for 50-bp segments is shown with the score between 0 and 1. (B) Screen capture shows similar patterns of Tet1 enrichment at known targets as shown by ChIP-seq in WT mESCs, generated in this study compared with previously published dataset (1).

1. Wu H, et al. (2011) Dual functions of Tet1 in transcriptional regulation in mouse embryonic stem cells. Nature 473(7347):389–393.

Leung et al. www.pnas.org/cgi/content/short/1322273111 7of10 A Dazl promoter CpGs: 25

% of WT

100% WT KO 77.3% Setdb1

B 5kb chr 1: 168,161,000 168,173,000

Mael

1 WT mCG 1 DKO mCG ERML

10 2 WT K9me3 10 Setdb1 KO K9me3 2 1 WT mCG 1 Setdb1 KO mCG 5 Δ mCG (p-value) -5 10 2 WT 5hmC 10 2 Setdb1 KO 5hmC 40 1 WT RNA-seq 40 1 Setdb1 KO RNA-seq

C 10kb chr 12: 110,755,000 110,790,000

Meg3/Mir1906-1

1 WTmCG 1 DKO mCG ERML 10 2 WT K9me3 10 2 Setdb1 KO K9me3 1 WT mCG 1 Setdb1 KO mCG 5 Δ mCG (p-value) -5 10 2 WT 5hmC 10 2 Setdb1 KO 5hmC 40 1 WT RNA-seq 40 1 Setdb1 KO RNA-seq

Fig. S6. Setdb1 depletion also resulted in hypomethylation of single copy genes (relating to Fig. 4) (A) Sanger sequencing of bisulfite converted genomic DNA showing a modest reduction of mCG at the promoter of the Dazl gene in Setdb1 KO mESCs. (B) UCSC genome browser screen capture showing a reduction of mCG (red box) at the promoter of Mael, a testes-specific gene, concomitant with transcriptional up-regulation. (C) Screen capture demonstrating hypo- methylation of differentially methylated region at Meg3, and imprinted gene, in Setdb1 KO mESCs. This decrease is coupled with an increase in 5hmC (red box).

Leung et al. www.pnas.org/cgi/content/short/1322273111 8of10 A me C gDNA isolation CTAGGACGTGGTCTCGT CTAGGACGTGGTCTCGT 5kb Bisulfite conversion of gDNA TTAGGACGTGGTCTTGT TTAGGATGTGGTTTCGT chr 8: 108,495,000 108,505,000

Dpep3 PCR amplification TTAGGACGTGGTCTTGT TTAGGATGTGGTTTCGT 1 WT mCG 1 DKO mCG CpG specific restriction digestion TTAGGA CGTGGTCTTGT TTAGGATGTGGTTTCGT 1 WT mCG 1 WT Setdb1 KO WT DKO Setdb1 KO mCG 5 Tai I enzyme: + - + - + - + - Resolve and quantify on agarose gel Δ mCG (p-value) -5

Dpep3 PCR product digest 10kb B chr 17: 50,420,000 50,440,000 1.00 1.00 Dpep3 1.0 Dazl 1 WT mCG 0.8 1 DKO mCG 1 0.6 0.48 WT mCG 1 0.4 Setdb1 KO mCG 5 0.08 0.2 Δ mCG (p-value) -5 Proportion of WT mCG Proportion of WT 0.0

1.00 1.00 Dazl 5kb 1.0 0.84 chr 1: 168,165,000 168,173,000 0.8 Mael 0.6 1 WT mCG 1 0.4 DKO mCG 1 0.2 WT mCG 0.03 1 Setdb1 KO mCG Proportion of WT mCG Proportion of WT 0.0 5 Δ mCG (p-value) 1.00 1.00 Mael -5 1.0 0.87 0.8 5kb 0.6 chr X: 35,474,000 35,484,000

0.4 Rhox13 0.20 1 0.2 WT mCG 1

Proportion of WT mCG Proportion of WT DKO mCG 0.0 1 WT mCG 1 1.00 1.00 Rhox13 Setdb1 KO mCG 1.0 5 Δ mCG (p-value) 0.8 -5 0.52 0.6

0.4 10kb chr 3: 95,360,000 95,385,000 0.2 0.00 Hormad1 Proportion of WT mCG Proportion of WT 0.0 1 WT mCG 1.000.82 1.00 Hormad1 1 1.0 DKO mCG 1 WT mCG 0.8 1 Setdb1 KO mCG 0.6 5 Δ mCG (p-value) 0.4 -5

0.2 0.00 5kb Proportion of WT mCG Proportion of WT 0.0 chr 7: 25,453,000 25,465,000

1.00 1.00 Tex101 Tex101 1.0 1 0.75 WT mCG 0.8 1 DKO mCG 1 0.6 WT mCG 1 0.4 Setdb1 KO mCG 5 0.2 Δ mCG (p-value) 0.00 -5 Proportion of WT mCG Proportion of WT 0.0 Setdb1 WT Setdb1 KO Dnmt3a/3b WT Dnmt3a/3b DKO

Fig. S7. Validation of single copy genes hypomethylation in Setdb1 KO mESCs by combined bisulfite restriction analysis (COBRA). (A) Summary of the pro- cedures in COBRA. Independent harvests of gDNA were bisulfite converted and amplified by PCR. The products would contain a recognition sequence (highlighted in green), which would only be preserved if a give CpG was methylated. The digested products were resolved on agarose gels and quantified.The example image is the digestion of Dpep3 PCR products by the Tai I endonuclease. (B) Quantitation of methylation of single copy developmental or germ-line genes. The raw signals were normalized and presented as a proportion of WT methylation. Error bars reflect SD between replicate experiments. (C) Genome browser screen captures showing the DNA methylation of corresponding regions integrated in B. Gray shading indicates region targeted by PCR primers.

Leung et al. www.pnas.org/cgi/content/short/1322273111 9of10 5kb 1.001.07 1.00 MageA2 chr X: 151,462,000 151,470,000 1.2 1.0 Tex101 1 0.8 WT mCG 1 DKO mCG 0.6 1 WT mCG 0.4 1 Setdb1 KO mCG 0.2 0.00 5

Proportion of WT mCG Proportion of WT Δ mCG (p-value) 0.0 -5 Setdb1 WT Setdb1 KO Dnmt3a/3b WT Dnmt3a/3b DKO

Fig. S8. Validation of no DNA hypomethylation in Setdb1 KO mESCs at negative control gene by COBRA. The MageA2 gene promoter, which is not marked by K9me3 or shown to be hypomethylated by methylC-seq in Setdb1 KO cells, is targeted by COBRA as a negative control region. Error bars reflect SD between replicate experiments.

Table S1. Top 20 mappable repeat subfamilies overlapping with ERML

Table S1

Twenty repeat subfamilies with the most significant overlap with ERML are listed. For each subfamily, the percentage of repeat incidences and the percentage of base pairs (bp) overlapping with ERML are shown. In addition, the percentage of ERML incidences overlapping with elements of each repeat subfamily is included.

Table S2. All mappable hypomethylated repeats in Setdb1 KO mESC

Table S2

The chromosomal location and repeat subclass and subfamily of all mappable repeats that become hypomethylated in Setdb1 KO mESCs (n = 2,395).

Table S3. Sequences and applications of all primers used in this study

Table S3

Leung et al. www.pnas.org/cgi/content/short/1322273111 10 of 10