CELLULAR REPROGRAMMING Volume 16, Number 3, 2014 Original Article ª Mary Ann Liebert, Inc. DOI: 10.1089/cell.2014.0002

Methylated DNA Immunoprecipitation and High-Throughput (MeDIP-seq) Using Low Amounts of Genomic DNA

Ming-Tao Zhao,1 Jeffrey J. Whyte,1 Garrett M. Hopkins,2 Mark D. Kirk,2 and Randall S. Prather1


DNA modifications, such as and hydroxymethylation, are pivotal players in modulating gene ex- pression, , X-chromosome inactivation, and silencing repetitive sequences during embryonic development. Aberrant DNA modifications lead to embryonic and postnatal abnormalities and serious human diseases, such as . Comprehensive -wide DNA methylation and hydroxymethylation studies provide a way to thoroughly understand normal development and to identify potential epigenetic in human diseases. Here we established a working protocol for methylated DNA immunoprecipitation combined with next-generation sequencing [methylated DNA immunoprecipitation (MeDIP)-seq] for low starting amounts of genomic DNA. By using spike-in control DNA sets with standard , 5-methylcytosine (5mC), and 5- hydroxymethylcytosine (5hmC), we demonstrate the preferential binding of antibodies to 5mC and 5hmC, respectively. MeDIP-PCRs successfully targeted highly methylated genomic loci with starting genomic DNA as low as 1 ng. The enrichment efficiency declined for constant spiked-in controls but increased for endogenous methylated regions. A MeDIP-seq library was constructed starting with 1 ng of DNA, with the majority of fragments between 250 bp and 600 bp. The MeDIP-seq reads showed higher quality than the Input control. However, after being preprocessed by Cutadapt, MeDIP (97.53%) and Input (94.98%) reads showed comparable alignment rates. SeqMonk visualization tools indicated MeDIP-seq reads were less uniformly distributed across the genome than Input reads. Several commonly known unmethylated and methylated genomic loci showed consistent methylation patterns in the MeDIP-seq data. Thus, we provide proof-of-principle that MeDIP-seq technology is feasible to profile genome-wide DNA methylation in minute DNA samples, such as oocytes, early embryos, and human biopsies.

Introduction CpG islands or non-CpG islands may cause embryonic re- tardation and serious diseases such as cancer. For example, NA methylation is one of the well-studied epigenetic aberrant DNA methylation of CpG islands in tumor Dmarks that can be inherited during cell division. DNA suppressor genes is associated with silencing of their func- methylation is dynamically regulated during embryonic de- tions, which is considered as a ‘‘CpG island methylator velopment, and aberrant methylation changes are correlated phenotype’’ in many types of cancer (Baylin and Jones, with tumor development and other human diseases (Baylin 2011). Therefore, genome-wide DNA methylation analysis and Jones, 2011; Smith and Meissner, 2013). In mammals, opens a new avenue for thoroughly understanding normal DNA methylation is frequently present in the context of CpG, development and epigenetic origin of human diseases. albeit less frequent found in CpHpG (H = A, T, C). The ma- DNA methylation is established and maintained by DNA jority of the mammalian genome is composed of vast oceans methyltransferases (DNMT1, DNMT3a, and DNMT3b) that where sparse but highly methylated CpG dinucleotides (non- are essential for embryonic and postnatal development in CpG islands) are distributed and isolated CpG islands with a mammals(Lietal.,1992;Okanoetal.,1999).However, high density of CG content that are resistant to DNA DNA methylation information is lost during standard mo- methylation (Deaton and Bird, 2011). Epimutations in either lecular biology manipulations, such as molecular

1Division of Animal Sciences and 2Division of Biological Sciences, University of Missouri, Columbia, MO, 65211.

1 2 ZHAO ET AL. in bacteria and PCRs, due to lack of maintenance of DNA Materials and Methods methyltransferases. Therefore, several approaches have mESC culture, embryoid body production, been proposed to preserve DNA methylation information and neural differentiation and simultaneously transform it into quantitative and mea- surable signals (Laird, 2010). Of them, the most prevailing mESCs with a transgene for enhanced green fluorescent techniques include methylated DNA immunoprecipitation protein (eGFP) (B5 cell line) were cultured as adherent (MeDIP), bisulfite sequencing (BS-seq), and reduced rep- colonies in the presence of leukemia inhibitory factor (LIF; resentation bisulfite sequencing (RRBS). Combined with Chemicon), as previously described (Meyer et al., 2004). high-throughput, next-generation sequencing, these tech- All chemicals are from Sigma (St. Louis, MO, USA) unless niques have provided comprehensive genome-wide infor- indicated otherwise. Briefly, undifferentiated ESCs were mation regarding DNA methylation. In terms of the merit thawed and grown in gelatin-treated T25 tissue culture and bias of these methods, BS-seq and RRBS can generate a flasks in ESC growth medium plus LIF (ESGM; LIF at base-resolution DNA methylome, whereas MeDIP-seq can 1000 U/mL), including Dulbecco’s modified Eagle medium only generate relative enrichment of specific regions across (DMEM; Invitrogen), 10% newborn calf serum, 10% fetal the genome. Nevertheless, bisulfite treatment inevitably bovine serum (FBS), nucleosides, and b-mercaptoethanol causes substantial DNA degradation and thus requires mi- (1 mM). Typically, to maintain the undifferentiated state crogram levels of DNA Input. In addition, the bisulfite during proliferation, ESC cultures were refreshed every conversion efficiency is also a concern, even though many other day with ESGM plus fresh LIF and passaged from one commercial kits claim more than 99% conversion. Special to two T25 flasks. For future use, adherent ESC cultures enzymes, such as -tolerant DNA polymerase, are were dissociated with 0.25% trypsin; the trypsin was in- usually employed for high-throughput library preparation in activated with ESGM followed by a slow freeze to -80C BS-seq. and transferred to liquid nitrogen for long-term storage. Affinity-enrichment methods such as MeDIP were ini- Otherwise, neural differentiation was achieved by using the tially combined with microarray technology to compare the 4- /4+ protocol that includes exposure to retinoic acid for genome-wide DNA methylation among samples (Weber neural induction (Bain et al., 1995). Briefly, the ESCs were et al., 2005). Subsequently, MeDIP was combined with a cultured in ESC induction medium (ESIM; ESGM minus next-generation sequencing platform to investigate the LIF and b-mercaptoethanol) in uncoated Petri dishes for 4 DNA methylome in precious samples, such as bone marrow days and grown as suspended embryoid bodies (EBs). After cells, mammalian oocytes, primordial germ cells, and 4 days in ESIM, the majority of ESCs formed floating preimplantation embryos. The MeDIP-seq protocol has spheres called EBs that had started neural differentiation.

been optimized to reach as low as 50 ng of DNA for starting On day 4 of neural induction, the EBs were transferred into materials (Taiwo et al., 2012). Borgel et al. performed ESIM plus 500 nM all-trans retinoic acid (Sigma) and cul- MeDIP followed by whole-genome amplification and mi- tured for an additional 4 days to complete the neural in- croarray hybridization on early-stage mouse embryos duction phase. Finally, the EBs were collected, snap frozen starting with 150 ng of DNA (Borgel et al., 2010). The in liquid nitrogen, and subsequently stored at -80C until sixth DNA modification 5-hydroxymethylcytosine (5hmC) genomic DNA extraction. can also be detected genome-wide by using a hydro- xymethylated DNA immunoprecipitation (hMeDIP) pro- Methylated DNA immunoprecipitation cedure with a 5hmC-specific antibody (Ficz et al., 2011). Comparison between MeDIP-seq and hMeDIP-seq data Genomic DNA was extracted by using an AllPrep DNA/ helps better understand the distribution of 5-methylcytosine RNA Mini Kit (Qiagen) following the manufacturer’s (5mC) and 5hmC signals on the same genomic loci across instructions. The DNA concentration was measured with the whole genome. Recently, the MeDIP-seq method a NanoDrop spectrophotometer (Thermo Scientific). The has been applied to detect other DNA modifications, such as DNA was then randomly sheared into fragments with an 5-formylcytosine and 5-carboxylcytosine (Shen et al., average size of 400 bp by an M220 focused-ultrasonicator 2013), thus illustrating the powerful advantages of MeDIP- (Covaris). The MeDIP procedure was modified according to Seq technology on genome-wide DNA modification a standard protocol with low starting amounts of DNA studies. (Borgel et al., 2012). To prevent DNA loss during the The traditional MeDIP-seq requires microgram levels of MeDIP procedure, special low DNA LoBind tubes (Ep- starting DNA that may not be applied to minute DNA pendorf ) were used. Initially sonicated DNA (1 lg, 100 ng, samples, such as mammalian oocytes and preimplantation 10 ng, and 1 ng) was diluted in 450 lL of TE buffer (10 mM embryos. Here we took advantage of next-generation se- Tris-HCl, 1 mM EDTA, pH 8.0) and then denatured at 95C quencing technology to profile genome-wide DNA methyl- for 10 min. A portion of sonicated DNA was reserved as ation by using 1 ng of genomic DNA from mouse embryonic Input that did not undergo immunoprecipitation. Denatured stem cells (mESCs). After index adapter ligation, Input DNA was placed on ice for 10 min and subsequently incu- (without immunoprecipitation) and MeDIP samples were bated with 5mC monoclonal antibody (Eurogentec, catalog amplified by PCR for an Illumina HiSeq library preparation. no. BI-MECY-0100) or 5hmC polyclonal antibody (Active Millions of reads were generated and then aligned to the Motif, catalog no. 39769) for 2 h at 4C with overhead mouse reference genome. The MeDIP-seq reads showed shaking. high alignment rate (97.53%) and were more piled up (in- After linker ligation, standard cytosine, 5mC, and 5hmC dicating regions of DNA methylation) than Input reads DNA sets (Zymo Research, catalog no. D5405) were spiked across the whole genome. into genomic DNA as controls as per the manufacturer’s MeDIP-SEQ USING MINUTE AMOUNTS OF DNA 3 protocol (Ito et al., 2010; Wu et al., 2011). According to the Ultra DNA Library Prep Kit for Illumina (NEB, catalog no. manufacturer, virtually 100% of the DNA controls are con- E7370S). Briefly, sonicated DNA (1 ng) was end repaired verted to 5mC or 5hmC. DNA controls were isolated with into dA-tailed fragments. Then the NEBNext indexed adap- Dynabeads Protein G (Invitrogen, catalog no. 10003D) by tor was ligated into sonicated DNA fragments. Adaptor-li- prewashing with 800 lL of phosphate-buffered saline (PBS)/ gated DNA was further cleaned up by AMPure XP beads 0.1% bovine serum albumin (BSA) three times by using a (Beckman Coulter, Inc., catalog no. A63881) and then sub- magnetic rack and added to the immunoprecipitation sam- jected to the MeDIP procedure as described above. The Input ples. The Dynabeads were incubated with immunoprecipi- DNA was linked to different index primers. After immuno- tation samples for 2 h at 4C with overhead shaking. The precipitation, adaptor-ligated DNA was extracted by phenol/ Dynabeads bound by 5mC or 5hmC antibodies were then chloroform and precipitated by ethanol. The antibody- trapped by a magnetic rack and washed three times by using enriched DNA was subsequently amplified by using an in- 1 · immunoprecipitation buffer [10 mM sodium phosphate dex primer and universal PCR primer provided in NEB (pH 7.0), 0.14 M NaCl, and 0.05% Triton X-100] at room Multiplex Oligos for Illumina (NEB, catalog no. E7500). For temperature with overhead shaking. The protein–DNA Input sample, 18 cycles were run, whereas 20 cycles were complex [DNA/Dynabeads/immunoglobulin G (IgG)] was performed for MeDIP samples. The PCR products were then subsequently digested with proteinase K (New England purified by AMPure XP beads and eluted with TE buffer. The Biolabs, catalog no. P8102S) for 2 h at 50C with shaking. library concentration was measured by a Qubit Fluorometric The immunoprecipitated DNA was extracted by phenol/ Quantitation system (Life Technologies), and size distribu- chloroform and precipitated by ethanol. The DNA pellet was tion was analyzed by a Fragment Analyzer Automated then resuspended in EB buffer [10 mM Tris-HCl (pH 8.5)]. CE System (Advanced Analytical Technologies). The qual- ified libraries were then uploaded onto an Illumina HiSeq Real-time quantitative PCR 2000 platform at the DNA Core (University of Missouri- Columbia) for ultradeep sequencing with single reads of 100 End-point PCR and real-time quantitative (q) PCR using bases. Input control and MeDIP-seq data were submitted to immunoprecipitation and Input DNA were performed by the NCBI Sequence Read Archive (SRA) and are available using primers previously published (Mohn et al., 2009). under accession number SAMN02687582. Intracisternal A particle (IAP) was selected as a positive control (hypermethylated) and the promoters of Gapdh and b-actin were used as negative controls (hypomethylated) for Reads alignment and data analysis mESC DNA. Spike-in control primers are listed in Table 1. Raw single-read FASTQ data were first analyzed with Real-time qPCR was performed by using iQ SYBR Green FastQC Version 0.10.1 (www.bioinformatics.babraham.ac.uk/ Supermix in an iCycler IQ Single-Color Real Time PCR projects/fastqc/) for general read quality assessment. Then the Detection System (Bio-Rad). Melting curves were generated FASTQ reads were preprocessed with Cutadapt (http://code following real- time PCR to assess the specificity of the .google.com/p/cutadapt/) to remove any adaptor and primer . The copy numbers of initial template were an- contamination. The trimmed data were then aligned to the alyzed by a relative standard curve method. Gradient dilu- mouse GRCm38 genome assembly via Bowtie2 short-read tions (1/10·) of Input DNA were used to create standard aligner (Langmead and Salzberg, 2012). (http://bowtie-bio curves. The qPCR data were obtained from three indepen- .sourceforge.net/bowtie2/faq.shtml). Peak detection, visuali- dent biological and two technical replicates and analyzed zation, and analysis of the aligned data were performed using statistically by one-way analysis of variance (ANOVA). SeqMonk (www.bioinformatics.babraham.ac.uk/projects/ seqmonk/). ‘‘Probes’’ were defined in SeqMonk as genomic Illumina library preparation and high-throughput loci where quantitative measurements of read counts could be sequencing evaluated. The promoter region of each annotated gene was The MeDIP protocol involves denaturation and generates defined as being located between - 1,000 bp and + 1,000 bp single-stranded DNA, thus Illumina (San Diego, CA) adap- flanking the start site (TSS). tors must be added before immunoprecipitation. Illumina library preparation was performed by using the NEBNext Results The specificity of 5mC and 5hmC antibodies Table Primer Information for e q 1. M DIP- PCR To test the specificity of 5mC antibody, we spiked stan- dard DNA (25 pg) sets containing 100% unmethylated cy- Names Sequences (bp) tosine (5C), 5mC, or 5hmC into mESC genomic DNA before immunoprecipitation. The standard DNA has three Control1 Forward: 190 sets of linear double-stranded (ds) DNA (897 bp) that have TTCAGGCCTGGTATGAGTCAGCAA Reverse: the same sequence. All the in standard DNA sets ACGGTCGAGCTTCTTCACCAGAAT are either 100% unmodified (5C), methylated (5mC), or Control2 Forward: 145 hydroxymethylated (5hmC). The antibodies against 5mC CACTGGTGCAACGGAAATTGCTCA (Eurogentec) and 5hmC (Active Motif ) were selected based Reverse: on their performance in other laboratories (Ficz et al., 2011; TGCCACCTGACGTCTAAGAAACCA Weber et al., 2005; Wu et al., 2011). End-point PCR was MeDIP, methylated DNA immunoprecipitation; qPCR, quantita- performed after immunoprecipitation by using antibodies tive PCR. against 5mC and 5hmC, respectively. 4 ZHAO ET AL.

As shown in Figure 1A, 5mC was relatively enriched in the 5hmC antibody exhibited more than 100 times enrich- MeDIP by using 5mC antibody, whereas 5hmC was highly ment versus 5C and 5mC (Fig. 1C). Collectively, these data pulled down in hMeDIP by using 5hmC antibody compared indicate the high specificity of 5mC and 5hmC antibodies to other DNA modifications. Two independent regions of for immunoprecipitation. control DNA sets were selected and consistent data were obtained. Subsequently, real-time qPCR was performed to MeDIP-PCRs on representative genomic loci quantitate the relative enrichment of 5C, 5mC, and 5hmC Next we tested our MeDIP procedure on representative versus Input after the MeDIP and hMeDIP procedure, re- genomic loci. We used 100 ng, 10 ng, and 1 ng of mESCs or spectively. For MeDIP, the 5mC antibody showed prefer- mEB DNA (data for mEB are not shown, but were similar to ential enrichment over 5C and 5hmC (Fig. 1B). Likewise, the mESCs) as starting material. For unmethylated controls, such as the promoters of Gapdh and b-actin, we did not observe bands after (h)MeDIP-PCR (Fig. 2A). Similarly, for intermediate methylated controls, such as differentially methylated region (DMR) in Igf2/H19, we did not see any band either, probably because of moderate methylation level and starting DNA (Fig. 2A). In fact, the DMR of Igf2/H19 has lower than 50% the methylation level in mESCs due to the global hypomethylation state of these genes (Hayashi et al., 2012; Leitch et al., 2013). In contrast, for a highly methylated region IAP, we observed abundant PCR product, even in 1 ng DNA samples for MeDIP, despite smaller amounts of PCR product in hMeDIP samples (Fig. 2A). That is reasonable because 5hmC only accounts for 5–10% of total 5mC in mESCs (Pastor et al., 2013). In addition, for constant amounts of spike-in controls, the enrichment effi- ciency was reduced with the decrease of starting DNA amounts (100 ng to 1 ng) for both MeDIP and hMeDIP (Fig. 2B). In contrast, for the endogenous highly methylated re- gion (IAP), enrichment efficiency versus Input seemed to be higher in low DNA amount (1 ng; Fig. 2C). Therefore, the MeDIP procedure by using 1 ng of DNA is practical for discrete genomic regions and might also be applied to genome-wide DNA methylation studies.

The quality of sequencing library by using 1 ng of DNA To take the advantage of Illumina high-throughput se- quencing HiSeq 2000 platform, index adaptors were added before immunoprecipitation of 1 ng of DNA for library preparation. After PCR amplification by using universal and index primers and clean up, the size distribution was ana- lyzed. The majority of MeDIP-seq library fragments were between 250 bp and 600 bp, which was ideal for the down- stream high-throughput sequencing (Fig, 3A). The MeDIP procedure removed adaptor contamination and shifted the average size of fragments when compared to the Input li- brary without any enrichment. The Input fragments were FIG. 1. Specificity of 5mC and 5hmC antibodies. (A) (h)MeDIP-PCRs by using control DNA sets were then visu- distributed with a wider range of sizes, and primer/adapter alized in a 1.5% gel. Lanes 1–3, MeDIP-PCRs by sequences were obvious with peaks of 127 bp and 140 bp using 5mC antibody (Eurogentec) and genomic DNA with (Fig. 3B). Note that the MeDIP library required two more spike-in controls. Lanes 4–6, hMeDIP-PCRs by using 5hmC cycles of PCR than Input because the immunoprecipitation antibody (Active Motif ) and genomic DNA with spike-in procedure caused about 90% loss of starting DNA. controls. Samples: lanes 1 and 4, mESC DNA + cytosine control; lanes 2 and 5, mESC DNA + 5mC control; lanes 3 and 6, mESC DNA + 5hmC control; lane 7, input control; lane Genome-wide DNA methylation profile by MeDIP-seq 8, no template control. (B) Relative enrichment versus input in starting with 1 ng of DNA control DNA sets through MeDIP-qPCR with 5mC antibody 6 The raw Illumina 100-bp single-read data (FASTQ) were dilution of 1:10 . The 5mC antibody showed a preference to bind 5mC versus cytosine and 5hmC. (C) Relative enrichment analyzed with FastQC (v. 0.10.1) for quality assessment. versus input in control DNA sets through hMeDIP-qPCR with Overall, MeDIP reads had higher-quality scores than Input 5hmC antibody dilution of 1:107. The 5hmC antibody showed reads. Before any processing, MeDIP read-quality scores higher binding efficiency to 5hmC rather than cytosine and declined at about 90 bases per read (Fig. 4C), whereas Input 5mC. Error bars indicate standard deviation. reads dramatically lost quality after 60 bases per read (Fig. MeDIP-SEQ USING MINUTE AMOUNTS OF DNA 5

ming by Cutadapt to remove any adapter and primers, the preprocessed MeDIP and Input reads showed comparable quality scores across all bases within 100 bp (Fig. 4B, D). As shown in Table 2, MeDIP reads had a 97.53% overall alignment rate, of which 64.66% reads were aligned exactly one time and 32.86% reads aligned more than one time. The mapped Input reads showed 94.98% alignment rate, of which 69.45% reads were aligned exactly one time and 25.53% were aligned more than one time. The comparable overall alignment rates (97.53% versus 94.98%) between Input and MeDIP indicate the high quality of these trimmed single reads. By using SeqMonk built-in visualization tools, we ob- served that MeDIP-seq reads were discrete and piled up with gaps in between at selective genomic loci (Fig. 5A, bottom panel). In contrast, the Input reads were continuously and randomly distributed across the genome (Fig. 5A, upper panel), implying the successful enrichment of methylated DNA by the 5mC antibody. Occasionally, MeDIP reads were packed at selective locations, resulting in significant ‘‘probe’’ value bars. These probe value bars were usually located at highly methylated regions, such as gene bodies of Tet1 and Tet2 (Table 3). In contrast, probe values were seldom seen in hypomethylated regions, such as the promoter regions of housekeeping genes (Ywhag) and pluripotency genes (Na- nog, Sox2, Pou5f1, and Stat3) that were highly expressed in mESCs, although there was substantial methylation within the gene bodies (Ywhag and Stat3) (Table 2). For example, there were no reads enriched in the promoter of Nanog in the MeDIP sample, but a substantial number of reads were lo-

cated in the Input (Fig. 5C). Lineage-specific genes were methylated either in the gene body (Meis1) or in the promoter (Hoxb1) to prevent their precocious expression in undifferentiated ESCs (Jaenisch and Young, 2008). For imprinted genes, the H19 promoter was highly methylated (Table 4), corresponding to the allele- specific methylation in the Igf2/H19 locus (Bartolomei and Ferguson-Smith, 2011). The enrichment was reflected by the probe value bars and piled reads seen in the promoter of H19 gene (Fig. 5B). Likewise, the Igf2r gene body and promoter region were both moderately methylated, which may con- tribute to its imprinted expression solely from maternal al- FIG. 2. MeDIP/hmeDIP enrichment efficiency in exoge- lele (Wutz et al., 1997). (Supplementary Data of the entire nous and endogenous DNA by using a low amount of starting data set are available at http://animalsciences.missouri.edu/ DNA. (A) Lanes 1–3, MeDIP-PCRs by using 100 ng (lane 1), faculty/prather/ and www.liebertonline.com/cell/.) Together, 10 ng (lane 2), and 1 ng (lane 3) of mESC DNA with 10 pg of these data support the idea that MeDIP-seq starting with 1 ng spike-in 5mC control DNA, respectively; lanes 4–6, hMe- of DNA can generate a reliable genome-wide DNA methyl- DIP-PCRs by using 100 ng (lane 4), 10 ng (lane 5), and 1 ng ation profile. (lane 6) of mESC DNA with 10 pg of spike-in 5hmC control DNA; lane 7, input control (1 ng of mESC DNA with 10 pg of spike-in standard DNA set); lane 8, no template control. (B) Discussion The relative enrichment efficiency of spike-in 5mC and 5hmC controls was reduced with the decreasing amounts of In this study, we have established a working MeDIP starting sample DNA. Note that the spike-in 5mC and 5hmC protocol by using 1 ng of DNA as the starting material to controls were constant (10 pg). (C) For an endogenously profile genome-wide DNA methylation. End-point PCR methylated region (IAP), MeDIP/hmeDIP efficiency in- products followed by MeDIP can be visualized at highly creased with the decrease of starting DNA. methylated endogenous regions (e.g., IAP). The enrichment efficiency was increased for endogenous loci but was re- 4A). The read-quality difference may be due to the disparate duced for constant spike-in controls with the decreasing library quality between the Input and MeDIP samples, be- amounts of starting DNA. Piled-up peaks were observed cause the immunoprecipitation procedure could inevitably discretely across the genome in our MeDIP sample, whereas remove possible adapter and primer contamination and si- Input reads were continuous and randomly distributed. multaneously enrich 200- to 500-bp fragments. After trim- Known methylation patterns were also recapitulated in our 6 ZHAO ET AL.

FIG. 3. Fragmentation analysis of input and MeDIP-seq library by using 1 ng of DNA. The majority of MeDIP library fragments were between 250 and 600 bp after 5mC antibody enrichment (A). However, the input library had a longer range of DNA fragments (200–1,000 bp). A substantial percent of input fragments was below 300 bp, along with primer and adapter contamination (127-bp and 140-bp peaks, B) which was significantly lower in the MeDIP counterpart (A). Two internal markers were 35 bp and 10,380 bp.

MeDIP-seq data. The ability to accurately detect global generate single- resolution, sodium bisulfite may DNA methylation patterns in minute samples will be of also destroy the integrity of DNA and introduce possible tremendous value when dealing with samples that contain mutations. In addition, bisulfite conversion is usually less very small amounts of DNA, such as oocytes, early em- efficient or variable for cytosines at non-CpG positions bryos, and human biopsies. (Warnecke et al., 2002), which may cause a biased methyl- Affinity-based methods (MeDIP/hMeDIP) coupled with ation profile at non-CpG sites. In contrast, MeDIP-seq neither next-generation sequencing can be used to profile genome- introduces any mutations nor requires uracil-tolerant DNA wide DNA methylation patterns with the resolution of several polymerase. In terms of low amounts of starting DNA, hundred base pairs and at a competitive cost. To achieve MeDIP-seq requires as little as 1 ng of DNA, but this amount comprehensive genomic coverage and a viable compromise is insufficient for BS-seq because bisulfite conversion will between sequencing breadth and depth, MeDIP-seq requires destroy the majority of DNA. The convenient one-step im- a range of 30–60 million reads per sample, whereas whole- munoprecipitation in the MeDIP-seq procedure shows the genome BS-seq should sequence over a billion reads per merits of profiling genome-wide 5mC, 5hmC, 5-formyl- sample (Bock et al., 2010). Although BS-seq and RRBS can cytosine, and 5-carboxylcytosine in parallel with low MeDIP-SEQ USING MINUTE AMOUNTS OF DNA 7

FIG. 4. Quality assessment of MeDIP-seq and input reads by FastQC. The input read quality dropped dramatically after 60 bp per read (A), whereas MeDIP reads kept acceptable quality until 90 bp per read (C). After preprocessing by Cutadapt,

input read quality was significantly improved at the expense of a loss of about 36% of the original reads (B). The trimming also caused around 13% loss of reads in MeDIP sample (D). The x axis shows the position in read (bp), whereas the y axis indicates Quality Score (Q).

amounts of starting DNA as long as the specificity of anti- approximately 100–200 cells, i.e., only a few blastocyst-stage bodies is verified (Shen et al., 2013). However, bisulfite con- embryos. We are not aware of any previously published version-based methods (Tet-assistant BS-seq and oxidative MeDIP-seq studies beginning with such a low amount of BS-seq) require additional treatments, such as labeling and DNA, although 1 ng of DNA could be applied to the RRBS oxidation, that will lead to further degradation and significant method and tagmentation-based whole-genome bisulfite se- loss of DNA (Booth et al., 2012; Ficz et al., 2011; Yu et al., quencing (Adey and Shendure, 2012; Schillebeeckx et al., 2012). The double degradation by oxidation and bisulfite 2013). The specialized nature of performing MeDIP-seq on treatment (oxBS-Seq) makes it impossible for 1 ng of starting low quantities of starting material and the difficulty in stan- DNA to be sufficient because typically there will be less than dardization may explain the lack of commercially available 0.01 ng of DNA left after the oxBS procedure. kits. Further optimization of this MeDIP-seq protocol may Recent BS-seq studies on genome-wide DNA methylation generate a DNA methylome from single cells. in oocytes and early embryos required several hundreds of Spike-in controls are usually employed for quality control of blastocysts and thousands of oocytes, and they were not able to MeDIP-seq efficiency. Although the spike-in homologous apply this technology to single germ cells (Kobayashi et al., fragments are different from genomic DNA, they can at least 2012; Smallwood et al., 2011). In contrast, we only used 1 ng provide indirect information about immunoprecipitation effi- of DNA for MeDIP-seq, which lowers the starting DNA to ciency (Taiwo et al., 2012). We added 10 pg of control DNA to

Table 2. Input and MeDIP Reads and Overall Alignment Rates Run by Bowtie2 Raw reads Trimmed reads Overall Align Align (million) (million) alignment (%) 1 time (%) >1 time (%) MeDIP 99.33M 86.36M 97.53% 64.66% 32.86% Input 78.19M 49.87M 94.98% 69.45% 25.53%

MeDIP, methylated DNA immunoprecipitation. 8 ZHAO ET AL.

FIG. 5. Visualization of MeDIP-seq data by SeqMonk in selective genomic loci. (A) A representative chromosome view (1 million bp) of read distribution in MeDIP (lower panel) and input data (upper panel). In the promoter of H19, reads and probe value bars were observed in the MeDIP sample (B). In contrast, no read was seen in the promoter of Nanog in the MeDIP sample, although there were piles of reads in the input (C).

100 ng, 10 ng, and 1 ng of genomic DNA samples and found control immunoprecipitation efficiency decrease was also the immunoprecipitation efficiency to drop slightly with the observed by using 5hmC antibody (Fig. 2B). In contrast, the decrease of starting DNA (Fig. 2B). This occurred because the endogenous methylation region appeared to be more enriched absolute control of DNA loss is increased along with the de- in low, rather than high, concentrations of starting DNA (Fig. crease of companion genomic DNA from 100 ng to 1 ng in the 2C). The 5mC or 5hmC antibodies bind preferentially to presence of the same amount of antibodies. A similar trend for highly methylated/hydroxymethylated fragments when low amounts (1 ng) of single-stranded DNA are present in the so- Table 3. Probe Values in the Gene Bodies lution. The MeDIP procedure would fail to identify the of Commonly Known Genes in Mouse ESCs Table 4. Probe Values in the Promoters Gene Probe Input MeDIP Enrichment of Commonly Known Genes in Mouse ESCs name no. probe value probe value fold Gene Probe Input MeDIP Enrichment Tet1 6 178.02 521.00 2.93 name no probe value probe value fold Tet2 5 144.29 362.00 2.51 Ywhag 1 44.97 75 1.67 H19 1 18.74 103 5.50 Stat3 2 39.35 216 5.49 Hoxb1 1 61.84 187.00 3.02 Hoxb3 2 153.66 236.00 1.54 Igf2r 1 33.73 109.00 3.23 Meis1 7 176.14 725 4.12 Igf2r 5 217.37 645.00 2.97 The higher enrichment fold, the more methylation in defined probes. Note that the following promoters (Nanog, Pou5f1, Ywhag, The higher enrichment fold, the more methylation in defined probes. Sox2, and Stat3) were unmethylated and thus not listed in the table. ESCs, embryonic stem cells; MeDIP, methylated DNA immuno- ESCs, embryonic stem cells; MeDIP, methylated DNA immuno- precipitation. precipitation. MeDIP-SEQ USING MINUTE AMOUNTS OF DNA 9 hypomethylated (CpG poor) compared to hypermethylated Bain, G., Kitchens, D., Yao, M., Huettner, J.E., and Gottlieb, (CpG rich) regions. We tried to evaluate the relative enrich- D.I. (1995). Embryonic stem cells express neuronal properties ment versus unmethylated Gapdh or b-actin promoters in 1 ng in vitro. Dev. Biol. 168, 342–357. of starting DNA but failed. Rare amounts of Gapdh or b-actin Bartolomei, M.S., and Ferguson-Smith, A.C. (2011). Mamma- fragments were pulled down after immunoprecipitation due to lian genomic imprinting. Cold Spring Harb. Perspect. Biol. 3. their hypomethylation; therefore, not enough DNA was en- Baylin, S.B., and Jones, P.A. (2011). A decade of exploring the riched for the following real-time qPCRs. cancer - biological and translational implications. Potential biases, such as copy number variations, GC Nat. Rev. Cancer 11, 726–734. content, and CpG density bias, are thought to exist in Bock, C., Tomazou, E.M., Brinkman, A.B., Muller, F., Simmer, MeDIP-seq data (Laird, 2010). The ratio of genomic DNA F., Gu, H., Jager, N., Gnirke, A., Stunnenberg, H.G., and versus 5mC/5hmC antibody and specificity of primary Meissner, A. (2010). Quantitative comparison of genome- wide DNA methylation mapping technologies. Nat. Bio- antibodies may also introduce variations among MeDIP- technol. 28, 1106–1114. seq data in individual laboratories. In addition, whole- Booth, M.J., Branco, M.R., Ficz, G., Oxley, D., Krueger, F., genome amplification mediated by universal and index Reik, W., and Balasubramanian, S. (2012). Quantitative se- primers during high-throughput sequencing library prepa- quencing of 5-methylcytosine and 5-hydroxymethylcytosine ration may contribute to MeDIP enrichment bias, espe- at single-base resolution. Science 336, 934–937. cially at certain CpG-rich targets (Borgel et al., 2012). Borgel, J., Guibert, S., Li, Y., Chiba, H., Schubeler, D., Sasaki, Therefore, it is necessary to validate MeDIP-seq data by H., Forne, T., and Weber, M. (2010). Targets and dynamics of other DNA methylation detection approaches, such as bi- promoter DNA methylation during early mouse development. sulfite sequencing. Nat. Genet. 42, 1093–1100. Because low amounts of genomic DNA (1 ng) are not Borgel, J., Guibert, S., and Weber, M. (2012). Methylated DNA enough for BS-seq, it is convenient to assess the MeDIP-seq immunoprecipitation (MeDIP) from low amounts of cells. data by examining commonly known methylated and un- Methods Mol. Biol. 925, 149–158. methylated genomic loci. The CpG islands in housekeeping Deaton, A.M., and Bird, A. (2011). CpG islands and the regu- gene promoters are generally unmethylated (Jones, 2012). lation of transcription. Genes Dev. 25, 1010–1022. In mESCs, pluripotency genes are usually highly expressed Ficz, G., Branco, M.R., Seisenberger, S., Santos, F., Krueger, F., and their promoters are hypomethylated. However, the lin- Hore, T.A., Marques, C.J., Andrews, S., and Reik, W. (2011). eage-specific gene promoters are methylated to prevent Dynamic regulation of 5-hydroxymethylcytosine in mouse ES transcriptional initiation and elongation (Fouse et al., 2008; cells and during differentiation. Nature 473, 398–402. Imamura et al., 2006). In addition, the DMRs of imprinted Fouse, S.D., Shen, Y., Pellegrini, M., Cole, S., Meissner, A., Van Neste, L., Jaenisch, R., and Fan, G. (2008). Promoter control regions at imprinted genes are allele-specific methylated: One allele is methylated whereas the other is CpG methylation contributes to ES cell gene regulation in parallel with Oct4/Nanog, PcG complex, and histone H3 K4/ unmethylated to facilitate parent-of-origin expression (Bar- K27 trimethylation. Cell Stem Cell 2, 160–169. tolomei and Ferguson-Smith, 2011). In this study, a handful Hayashi, K., Ogushi, S., Kurimoto, K., Shimamoto, S., Ohta, H., of selected methylated loci in gene bodies and promoters and Saitou, M. (2012). Offspring from oocytes derived from were relatively enriched in the MeDIP-seq data (Tables 3 in vitro primordial germ cell-like cells in mice. Science 338, and 4). The promoter regions of pluripotency genes and 971–975. housekeeping genes are off-targets by the MeDIP-seq pro- Imamura, M., Miura, K., Iwabuchi, K., Ichisaka, T., Nakagawa, cedure because of their low methylation level. In contrast, M., Lee, J., Kanatsu-Shinohara, M., Shinohara, T., and Ya- upstream regions of H19 and Igf2r where the differentially manaka, S. (2006). Transcriptional repression and DNA hy- methylated regions reside were more than three times en- permethylation of a small set of ES cell marker genes in male riched. Therefore, at least these commonly known genes germline stem cells. BMC Dev. Biol. 6, 34. were verified in our MeDIP-seq data by using 1 ng of DNA, Ito, S., D’Alessio, A.C., Taranova, O.V., Hong, K., Sowers, further illustrating the reliability of the MeDIP-seq protocol. L.C., and Zhang, Y. (2010). Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass Acknowledgments specification. Nature 466, 1129–1133. Jaenisch, R., and Young, R. (2008). Stem cells, the molecular We would like to thank Drs. Mingyi Zhou and Nathan circuitry of pluripotency and nuclear reprogramming. Cell Bivens at University of Missouri DNA core facility for 132, 567–582. MeDIP-seq library construction and Illumina deep se- Jones, P.A. (2012). Functions of DNA methylation: islands, st quencing, and acknowledge funding from Food for the 21 start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484– Century at the University of Missouri, and USDA-NRSP8. 492. Kobayashi, H., Sakurai, T., Imai, M., Takahashi, N., Fukuda, Author Disclosure Statement A., Yayoi, O., Sato, S., Nakabayashi, K., Hata, K., Sotomaru, Y., Suzuki, Y., and Kono, T. (2012). Contribution of intra- The authors declare that no conflicting financial interests genic DNA methylation in mouse gametic DNA methylomes exist. to establish oocyte-specific heritable marks. PLoS Genet. 8, e1002440. References Laird, P.W. (2010). Principles and challenges of genomewide Adey, A., and Shendure, J. (2012). Ultra-low-input, tagmentation- DNA methylation analysis. Nat. Rev. Genet. 11, 191–203. based whole-genome bisulfite sequencing. Genome Res. 22, Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read 1139–1143. alignment with Bowtie 2. Nat. Methods 9, 357–359. 10 ZHAO ET AL.

Leitch, H.G., McEwen, K.R., Turp, A., Encheva, V., Carroll, T., Taiwo, O., Wilson, G.A., Morris, T., Seisenberger, S., Reik, W., Grabole, N., Mansfield, W., Nashun, B., Knezovich, J.G., Pearce, D., Beck, S., and Butcher, L.M. (2012). Methylome Smith, A., Surani, M.A., and Hajkova, P. (2013). Naive analysis using MeDIP-seq with low DNA concentrations. pluripotency is associated with global DNA hypomethylation. Nat. Protoc. 7, 617–636. Nat. Struct. Mol. Biol. 20, 311–316. Warnecke, P.M., Stirzaker, C., Song, J., Grunau, C., Melki, J.R., Li, E., Bestor, T.H., and Jaenisch, R. (1992). Targeted and Clark, S.J. (2002). Identification and resolution of arti- of the DNA methyltransferase gene results in embryonic le- facts in bisulfite sequencing. Methods 27, 101–107. thality. Cell 69, 915–926. Weber, M., Davies, J.J., Wittig, D., Oakeley, E.J., Haase, M., Meyer, J.S., Katz, M.L., Maruniak, J.A., and Kirk, M.D. (2004). Lam, W.L., and Schubeler, D. (2005). Chromosome-wide and Neural differentiation of mouse embryonic stem cells in vitro promoter-specific analyses identify sites of differential DNA and after transplantation into eyes of mutant mice with rapid methylation in normal and transformed human cells. Nat. retinal degeneration. Brain Res. 1014, 131–144. Genet. 37, 853–862. Mohn, F., Weber, M., Schubeler, D., and Roloff, T.C. (2009). Wu, H., D’Alessio, A.C., Ito, S., Wang, Z., Cui, K., Zhao, K., Methylated DNA immunoprecipitation (MeDIP). Methods Sun, Y.E., and Zhang, Y. (2011). Genome-wide analysis of 5- Mol. Biol. 507, 55–64. hydroxymethylcytosine distribution reveals its dual function Okano,M.,Bell,D.W.,Haber,D.A.,andLi,E.(1999).DNAme- in transcriptional regulation in mouse embryonic stem cells. thyltransferases Dnmt3a and Dnmt3b are essential for de novo Genes Dev. 25, 679–684. methylation and mammalian development. Cell 99, 247–257. Wutz, A., Smrzka, O.W., Schweifer, N., Schellander, K., Pastor, W.A., Aravind, L., and Rao, A. (2013). TETonic shift: Wagner, E.F., and Barlow, D.P. (1997). Imprinted expression biological roles of TET proteins in DNA demethylation and of the Igf2r gene depends on an intronic CpG island. Nature transcription. Nat. Rev. Mol. Cell Biol. 14, 341–356. 389, 745–749. Schillebeeckx, M., Schrade, A., Lobs, A.K., Pihlajoki, M., Wilson, Yu, M., Hon, G.C., Szulwach, K.E., Song, C.X., Zhang, L., D.B., and Mitra, R.D. (2013). Laser capture microdissection- Kim, A., Li, X., Dai, Q., Shen, Y., Park, B., Min, J.H., Jin, P., reduced representation bisulfite sequencing (LCM-RRBS) maps Ren, B., and He, C. (2012). Base-resolution analysis of 5- changes in DNA methylation associated with gonadectomy- hydroxymethylcytosine in the mammalian genome. Cell 149, induced adrenocortical neoplasia in the mouse. Nucleic Acids 1368–1380. Res. 41, e116. Shen, L., Wu, H., Diep, D., Yamaguchi, S., D’Alessio, A.C., Fung, H.L., Zhang, K., and Zhang, Y. (2013). Genome-wide Address correspondence to: analysis reveals TET- and TDG-dependent 5-methylcytosine Randall S. Prather oxidation dynamics. Cell 153, 692–706. Division of Animal Sciences Smallwood, S.A., Tomizawa, S., Krueger, F., Ruf, N., Carli, N., University of Missouri Segonds-Pichon, A., Sato, S., Hata, K., Andrews, S.R., and E125D ASRC Kelsey, G. (2011). Dynamic CpG island methylation landscape in 920 East Campus Drive oocytes and preimplantation embryos. Nat. Genet. 43, 811–814. Columbia, MO 65211 Smith, Z.D., and Meissner, A. (2013). DNA methylation: Roles in mammalian development. Nat. Rev. Genet. 14, 204–220. E-mail: [email protected]