Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

A SILAC-based DNA interaction screen that identifies candidate binding to functional

DNA elements

Gerhard Mittler1,2,4, Falk Butter3 and Matthias Mann3

1 Center for Experimental Bioinformatics (CEBI), University of Southern Denmark,

Campusvej 55, DK-5230 Odense M, Denmark

2 Current address: Dept. Cellular and Molecular Immunology, Proteomics,

Max-Planck Institute of Immunobiology, Stübeweg 51, D-79108 Freiburg, Germany

3 Dept. Proteomics and Signal Transduction, Max-Planck-Institute for Biochemistry, Am

Klopferspitz 18, D-82152 Martinsried, Germany

4 BIOSS – Center of Biological Signalling Studies, Albert-Ludwigs-University Freiburg,

Albertstrassse 19, D-79104 Freiburg, Germany

Running title: SILAC for DNA-binding proteins

Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Abstract

Determining the underlying logic that governs the networks of expression in higher eukaryotes is an important task in the post-genome era. Sequence-specific transcription factors (TFs) that can read the genetic regulatory information and proteins that interpret the information provided by CpG methylation are crucial components of the system that controls the transcription of protein-coding by RNA polymerase II. We have previously described Stable Isotope Labeling by

Amino acids in Cell culture (SILAC) for the quantitative comparison of proteomes and the determination of protein-protein interactions. Here, we report a generic and scalable strategy to uncover such DNA protein interactions by SILAC that uses a fast and simple one-step affinity capture of transcription factors from crude nuclear extracts. Employing mutated or non-methylated control oligonucleotides, specific TFs binding to their wild-type or methyl-CpG bait are distinguished from the vast excess of co-purifying background proteins by their peptide isotope ratios that are determined by mass spectrometry. Our proof of principle screen identifies several proteins that have not been previously reported to be present on the fully methylated CpG island upstream of the human metastasis associated 2 gene promoter. The approach is robust, sensitive and specific and offers the potential for high-throughput determination of TF binding profiles.

[Supplemental material will be available online at http://www.genome.org]

2 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Introduction

The interactions between transcription factors (TFs) and their DNA binding sites are an integral part of gene regulatory networks and represent the key interface between the proteome and genome of an organism. These sequence-specific factors exert their effects through dynamic interactions with a plethora of protein complexes that modify and remodel chromatin, change the subnuclear localization of target genes and regulate the promoter recruitment, activity and processivity of the transcriptional machinery

(reviewed in Kadonaga 2004; Remenyi et al. 2004). Besides sequence-specific binding, a certain class of TFs interacts with so called CpG islands that consist of clustered arrays of the dinucleotide sequence CG in a methylation (5-methyl cytosine) dependent manner (Ohlsson and Kanduri 2002). These CpG islands are found in the proximal promoter regions of almost half of the genes in the (Ohlsson and

Kanduri 2002) and can be methylated in a tissue-specific manner or upon transformation to malignancy (Robertson 2005).

Thus the determination and characterization of TF binding sites throughout the whole human genome is pivotal to our understanding of how genes are differentially expressed. While much progress has been made in the high-throughput identification of potential binding sites for a given protein by both the microarray chip-based readout of chromatin immunoprecipitation (ChIP) assays (ChIP-chip) and protein binding microarrays (Mukherjee et al. 2004; Warren et al. 2006), a scalable complementary technique that – in an unbiased way – reveals proteins binding in a sequence-specific manner to a given site is presently not available. Traditional methods for the unbiased identification of sequence-specific nucleic acid binding proteins employ a combination of several steps of classical chromatography followed by a final affinity purification step that uses their cognate recognition sequence as a ligand (Kadonaga 2004).

3 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

The classical approach is laborious and requires monitoring the purification process by functional assays (EMSA, DNA footprinting, in vitro transcription) and is thus impractical on a proteomic scale. Routine high-throughput identification of sequence-specific DNA binding factors is mainly hampered by their low abundance, the degeneration of their binding sites and the competition by unspecific binding of positively charged nuclear proteins to the negatively charged phosphate backbone of DNA.

In contrast, computer predictions of DNA sequence binding specificities are fast and simple but have certain limitations (reviewed in Bulyk 2003).

First, they are based on experimental data derived from the published literature and may therefore not be sufficiently comprehensive and sensitive or may be subject to sampling biases. Second, they do not take into account the context dependency of TF binding and the effects of interactions between positions in the binding sequence. Third, they are relatively poor predictors of quantitative binding to variant DNA motifs (Tompa et al. 2005; Udalova et al. 2002). Lastly, they cannot predict which isoforms or which polypeptides of a TF protein family are binding to a given element (Saccani et al. 2003).

Recent breakthroughs in quantitative protein mass spectrometry (reviewed in Ong and

Mann 2005) are providing us with the tools that will enable us to tackle many of the obstacles mentioned above. In a tour de force analyzing 53 ion exchange chromatography fractions of a tryptic digest by mass spectrometry in combination with their Isotope Coded Affinity Tag (ICAT) technology Aebersold and colleagues demonstrated that it is indeed possible to identify a sequence-specific factor by quantitative proteomics (Himeda et al. 2004).

We have previously described Stable Isotope Labeling by Amino acids in Cell culture

(SILAC) for the quantitative encoding of proteomes (Ong et al. 2002). Following a one-step affinity purification from SILAC encoded extracts, specific binders can be directly distinguished by the isotope ratios of their tryptic peptides as determined

4 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

by mass spectrometry provided that a specificity control can be designed (Schulze et al.

2005; Schulze and Mann 2004). Here we use immobilized oligonucleotides harboring functional DNA elements for the TFs AP2 and ESRRA/ERR-alpha in their wild-type and control (point mutated) form to prove that we are able to uncover such interactions by

SILAC. Investigating a CpG island of the human metastasis associated 2 (MTA2) gene, we further demonstrate that a similar approach can be used to identify proteins preferentially binding to methylated CpG sites. Apart from ZBTB33/Kaiso, which was known to bind the methylated MTA2 CpG island, our screen resulted in the identification of several known as well as unknown methyl-CpG-dependent binding partners. The DNA protein interaction screen proved sensitive, robust, fast and specific and could in principle become a standard technique to investigate functional cis-elements on DNA.

5 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Results

A generic strategy and proof of principle for the determination of DNA-protein interactions

Modern mass spectrometric technology has become very powerful at determining the components of complex protein mixtures (reviewed in Cox and Mann 2007; Cravatt et al.

2007). We have previously used this technology to determine peptide-protein interactions in the following way. Two cell populations are metabolically encoded through a stable isotope variant of an essential amino acid. In this approach, which we have termed SILAC, the peptides originating from the two cell populations can be distinguished because they have different mass. Pull-downs with a bait from the lysine-d4 encoded cell lysate and with a control from the non-encoded cell lysates are mixed and analyzed together. In this way, background proteins binding non-specifically to the beads used in the pull-downs will be present in a one to one ratio in the two SILAC forms. Binders specific to the bait will have a quantitative ratio between the two forms.

Here we develop the same principle into an assay for DNA-protein interactions

(Figure 1).

Because the pull-down experiment is performed from nuclear extracts and transcription factors are typically of low abundance, we decided to grow the cells in suspension and to encode them with 2H4-lysine (Lysine-d4) as the SILAC amino acid, because deuterated amino acids are more economical than 13C or 15N substituted amino acids. We routinely prepared nuclear extracts from spinner cultures of unlabeled and 2H4-lysine labeled

HeLa S3 cells. Known ratio-mixing experiments demonstrated virtual identity of different extracts in terms of relative protein abundance as determined by measuring

6 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

the isotope ratios of lysine containing peptides from the 100 most abundant proteins by nanoLC-MS (data not shown).

As a proof of principle we decided to investigate proteins binding to the DNA sequence

GCCCGGGGC, a known binding site for the transcription factor AP2 that interacts as a homodimer with this palindromic DNA sequence motif (Egener et al. 2005). Indeed,

SELEX experiments have determined that it is an optimal binding sequence (Mohibullah et al. 1999). We designed the DNA bait in the following way (Figure 2A). A double stranded DNA segment harboring the above mentioned optimal AP2 binding site was biotinylated at the 5´-end of the sense strand for coupling to streptavidin beads. The sequence length of 40 bp was chosen as a compromise between minimizing non-specific binding (requiring short sequences) and given „native context‟ to the binding site (requiring long sequences). This sequence length is suitable for automated oligodeoxynucleotide synthesis and sufficient for metazoan DNA binding factors, which usually target 5 to 12 base pairs (Wingender et al. 2001). Furthermore, we designed the bait to contain a restriction site (6 bp) and a spacer to the 5´-end to enable elution of bound proteins by restriction endonuclease digestion, which is very efficient

(supplementary figure S1).

We also synthesized a control bait, designed to abolish binding of specific but not of non-specific binders to the sequence. As shown in Figure 2A, this was accomplished with two point mutations introduced in the AP2 binding site (GTACGGGGC) which have previously been shown to abrogate binding of AP2 in a band-shift assay (Mohibullah et al. 1999). The SDS-PAGE pattern of proteins binding to the wild-type and control bait, respectively, appeared essentially identical (Figure 2B). Therefore, it was impossible to identify any wild-type-specific polypeptide band just based on visually comparing the protein composition. This demonstrates the necessity of a quantitative proteomics approach to uncover such proteins.

7 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Lysine-d4 encoded nuclear extract from 2.5 x 108 cells was then incubated with the oligonucleotide containing the AP2 binding site whereas the point mutated control DNA was exposed to the non labeled extract. After mild washing, beads from the two experiments were combined and the baits and bound proteins were liberated by restriction enzyme digest and the eluted material was separated by one dimensional

SDS-PAGE. The entire gel lane was cut into six pieces, which were trypsin digested and analyzed by liquid chromatography tandem mass spectrometry as described before

(Ong et al. 2004). More than 250 proteins were identified in this experiment but quantitative analysis of the SILAC peptide pairs showed that almost all of them were present in approximately equal amounts (Figure 2D). This means that they bound equally to both baits or to the beads. As expected we identified TFAP2A (AP2-alpha) as a specific interaction partner (ratio 6.75 between bait and control). Interestingly, in addition to the expected binding partner, the Pur proteins PURA (ratio 4.22) and PURB

(ratio 6.73), which can either function as transcriptional activators or repressors, were also found to interact specifically with the wild-type bait (Figure 2C and supplementary table S1). The AP2 family consists of five members, TFAP2A (alpha), TFAP2B (beta),

TFAP2C (gamma), TFAP2D (delta) and TFAP2E (epsilon) (Tummala et al. 2003).

Alignment of the peptides identifying AP2 showed that we had sequenced peptides common to all five proteins as well as unique peptides proving the presence of TFAP2A and TFAP2C (supplementary table S1 and figure S2). In the case of PURA and PURB, which are closely related proteins encoded by two different genes, we sequenced peptides specific for each, so we conclude that both were present. An antibody specific for PURA was obtained and confirmed presence of this protein (Figure 2C). Both PURA and PURB can bind as homo- or heterodimers in a rather sequence-independent manner to double-stranded DNA that is characterized by a high degree of polypurine- polypyrimidine asymmetry (Bergemann et al. 1992).

8 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Pur proteins have been shown to recognize the structure formed by GGN repeats

(Knapp et al. 2006). Such a repeat is present on the antisense strand (GGC GGT) and overlaps with the AP2 consensus site. The first GGN repeat is disrupted by two nucleotide changes (to TAC) in the AP2 mutant site (Figure 2A), which may explain the reduced binding of PURA and PURB to the mutant DNA and emphasizes our ability to identify structure-dependent associations in the screen. Figure 2D shows the ratios for all proteins identified in this experiment. As can be seen in the figure there is a gap between the non-specific and specific binders, which have ratios between 1.0 and 2.0 and the specific proteins which have a ratio of greater than 3.0 in this experiment.

However, this represents an empirically derived threshold because the AP2 proteins were known binders and the PURA protein was shown to bind to our column. As discussed elsewhere (MacCoss et al. 2003; McClatchy et al. 2007; Ong et al. 2003), in general the level of significance of a measured ratio depends on the experiment and within an experiment depends on the signal-to-noise of peptides and the peptide ratio variance of the protein being quantified. Therefore, in order to objectively measure the degree of specificity of binding to a functional DNA element the statistical significance of the relative protein changes has to be determined. Hence, in the following experiments that mainly deal with screening for unknown interaction partners we employed our newly developed MaxQuant software (Cox and Mann 2007; Cox and Mann submitted), which calculates the significance of a ratio being separated from the main distribution as a function of protein intensity.

Determination of TFs binding to bioinformatically predicted DNA cis-elements

Many bioinformatics and functional genomics experiments result in the prediction of functional DNA sequences, which may recruit specific effector proteins. In principle, our approach should be well suited to determine such candidate proteins.

9 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

In order to test this with a specific example, we synthesized a bait for a putative DNA binding factor from a recent experiment from Mootha et al (Mootha et al. 2004). In that experiment, the transcriptional co-activator PPARGC1A (PGC1-alpha) was overexpressed, leading to the differential expression of mRNAs measured by a microarray experiment. Regulated mRNAs were identified and corresponding genes assessed for possible binding motifs for transcription factors. Of the three motifs found we picked one, which had been hypothesized to be a binding site for the orphan nuclear (NR) ESRRA (ERR-alpha). That site had indirectly been confirmed by a luciferase reporter assay upon cotranfection of ESRRA and PPARGC1A (PGC1-alpha) expressing plasmids.

After synthesis of the DNA sequence - a 26 bp sequence from the promoter region of the human ESRRA gene that contains a predicted autoregulatory binding motif for ESRRA itself (Mootha et al. 2004) - and a single point mutant control in the middle of the motif

(see Figure 3A) we performed the SILAC binding experiment as before but analyzed the data by the MaxQuant software. In order to keep the number of false positive specific binders as small as possible, stringent criteria were applied. Therefore, we only accept proteins that have been quantified by at least two peptides showing a ratio that is more than three standard deviations (p<0.0012) apart from the center of the distribution of all proteins. Out of a total of 703 identified proteins, we sequenced 18 unique peptides of

ESRRA and quantified its highly significant ratio (p=1.09x10-49) between bait and control as roughly nine-times (9.02), which correlates well with data obtained by immunoblot analysis (Figure 3A). The MS, MS/MS and MS3 spectra of one of the peptides employed for quantification are shown in Figure 3B to D. This confirms in vitro binding of the protein to this bioinformatically determined motif. Although expected, this confirmation is important because it is well known that the sequence context of a putative transcription factor binding site can influence the ability of the protein to actually bind

10 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

to it (Hoglund and Kohlbacher 2004). Notably, ESRRA does not interact with consensus binding sites present on the P450 promoter (Johnston et al. 1997).

Another NR, NR2F6 (v-erb A related gene 2) also bound to the same DNA sequence with a significant ratio (p=3.56x10-23) of 4.93 (Table 1). We detected and quantified (ratio

4.15) one peptide mapping to the NR protein NR2C1 ( 2) but excluded it from our list due to our stringent criteria. ESRRA and NR2F6 bind to the super-family of hormone response elements (HRE) on DNA that usually consists of two direct or inverted repeats of the hexamer motif (half-site) TGACCT separated by spacer nucleotides of variable length. Promiscuous (but sequence context dependent) binding of orphan NRs as monomers to NR half-sites or extended half sites

(e.g. imperfect direct repeats) is well known (Giguere et al. 1995; Wilson et al. 1993), so it is not surprising that we found specific binding of two receptors to this motif (see discussion). We also quantified eight proteins that are significantly (p<0.0012) but only moderately specific (ratios between 1.5 and 2.4) for the wild-type promoter element

(supplementary table S2).

Feasibility of scale-up

We next investigated if the method could be streamlined as required to use it in large- scale experiments. In the work described above, protein eluates were separated by

SDS-PAGE, excised and in-gel digested. This serves to simplify the complex protein mixture and to lower the dynamic range requirements of the experiment. However, the gel step also results in an increase of analysis time as several bands are analyzed in succession and makes the procedure less automatable as in-gel protein digestion requires a more complicated protocol than in-solution digest.

We therefore repeated the experiment with the ESRRA motif using the in-solution digest instead of the gel. In this experiment, a total of 197 proteins were identified,

11 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

including ESRRA sequenced with nine peptides. The protein ratio was 7.44, which is highly significant (p=4.27x10-27) and very similar to the in-gel experiment (Table 1). We conclude that despite the great complexity of the proteins bound to DNA targets a single in-solution analysis and therefore automation is feasible. Furthermore, in this experiment, only lysine was labeled, whereas labeling both arginine and lysine results in better quantification as virtually all peptides can be quantified (Schulze et al. 2005).

Furthermore, we did not yet use „exclusion lists‟ which are lists of peptide masses of the most abundant background proteins, in order to exclude them from sequencing. This allows preferential sequencing of low abundance peptides, but was not possible because of software limitations on our mass spectrometer. If both steps are included, in-solution digests should perform similarly to the more laborious in-gel procedure, as we have already shown in the case of peptide pull-downs.

Identification of binding partners to modified DNA

DNA modification can be important determinants for protein binding, too. For example, damaged DNA recruits DNA repair enzymes and DNA methylation, in addition to its roles in epigenetics (Klose and Bird 2006), can directly influence transcription factor binding to a given DNA sequence (Perini et al. 2005). In particular, it is well known that cancerous cells frequently hypermethylate promoter regions of tumor suppressor genes leading to their down regulation (reviewed in Robertson 2005). Conversely, MeCpG can also be a precondition for transcription factor binding and transcriptional activation (Ego et al. 2005). In order to study methylation dependent DNA binding of transcription factors, we selected the CpG island -623 to -601 upstream of the transcription start site of the Metastasis Associated Gene 2 (MTA2). A 41 bp bait containing this island in the fully methylated and non-methylated form were synthesized as bait and control, respectively (Figure 4A).

12 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

We chose this region because it is an endogenously methylated site and because one binding protein, ZBTB33 (previously known as Kaiso) had already been determined in vitro and by chromatin immunoprecipitation (ChIP) (Yoon et al. 2003).

As shown in Figure 4B and Table 2, this experiment resulted in the identification and quantification of ZBTB33 with a ratio of 11.26 (p=2.04x10-8) using two peptides. Of 328 quantified proteins seven had significant fold changes between experiment and control, showing that they bind more tightly to the methylated compared to the non-methylated

CpG island (Table 2).

Apart from ZBTB33, we found several putative and not previously described methyl-CpG-specific binding partners of the MTA2 CpG island, namely the SRA-YDG domain containing protein UHRF1 (formerly known as ICBP90), the replication clamp protein PCNA, the PCNA interacting proteins DNA methyltransferase 1 (DNMT1) and

KIAA0101/p15PAF, the ubiquitin specific protease USP7, as well as the hypothetical protein ACTL8 (actin-like 8/FLJ32777) (Table 2 and supplementary figure S3).

Antibodies available for ZBTB33, UHRF1, PCNA and USP7 further confirmed specific binding of these factors to the methylated probe (Figure 4C). Interestingly, we also identified another known DNMT1 interaction partner, the histone H3 K9 methyltransferase EHMT2/G9a (Esteve et al. 2006) with two peptides and quantified one peptide with a ratio of 3.40 but excluded it due to our stringent criteria (see above and supplementary methods). ZBTB33 and UHRF1 have been shown to directly interact with methyl-CpG sites (Bostick et al. 2007; Daniel et al. 2002; Unoki et al. 2004), so we assume direct binding of these factors to the 5-methyl cytosine modified bait. The other proteins do not posses any known methyl-CpG binding domain but some of them

(PCNA, DNMT1) have been identified as interaction partners of NP95, the mouse orthologue of UHRF1 (Sharif et al. 2007). Therefore, their specific binding to the methylated bait is most likely indirect (see discussion and supplementary figure S4).

13 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Discussion

The ability to rapidly determine the protein binding partners of a DNA sequence of interest is becoming increasingly important given the current pace of genome sequencing, which accompanies the discovery of highly conserved non-coding sequences as well as novel potential transcription factor binding sites in mammals

(Bejerano et al. 2004; Cawley et al. 2004; Harbison et al. 2004; Xie et al. 2005). In this study we have outlined a strategy for unbiased identification of sequence and modification-specific DNA binding proteins using SILAC-based quantitative proteomics.

General aspects of the screen (Specificity and sensitivity of the approach)

The screen uses custom-made oligonucleotides with a biotin-linker and takes advantage of differential metabolic labeling of cell populations using SILAC. This allows the distinction of proteins by mass spectrometry. In order to identify specific candidate binding partners of a DNA element of interest it is crucial to design appropriate control baits, in which the binding of specific interaction partners is severely diminished. In the case of modification dependent DNA protein interactions the control bait will simply contain the identical DNA sequence in the unmodified form. For sequence context dependent interactions the design of suitable controls can be achieved in several ways.

If data correlating the effect of mutations with the activity of a putative cis-element are not available, two other possibilities can be pursued. First, phylogenetic footprinting analysis of the cis-element by multiple species alignment of synthenic regions (reviewed in Wasserman and Sandelin 2004) will often reveal highly conserved nucleotides, which can be mutated in the control. Second, the use of the complete reverse sequence of the cis-element or part of it can be a suitable control (Travis et al. 1993),

14 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

because it does not change the overall nucleotide composition of the double-helix and conserves the GC/AT content of each DNA strand.

The fact that the vast majority of binding partners identified in our pilot screens either reproduce and confirm published data or can be placed in a functional biological context, argues for a low false positive detection rate of the technique, especially when the robust statistics of the MaxQuant software is applied (Cox and Mann submitted; Graumann et al. 2007). However, as is the case for every scalable screening method we cannot exclude false negative results. This is exemplified by our experiment we performed to detect MeCpG-dependent binders (Figure 4, Table 2), which failed to identify the indirect interaction partner EHMT2/G9a (part of an UHRF1 complex) due to sensitivity issues in combination with our stringent statistical filtering (see results). Furthermore, when working with a single cell line and with unstimulated cells we cannot find proteins that are not expressed in that cell line or need to be modified in a signal dependent manner in order to bind DNA. In contrast to most previous work from our laboratory, the DNA protein interaction screen involves the preparation of nuclear extracts from the labeled and unlabeled cell population, respectively, which represents a second sample manipulation step after cell lyses. Therefore, standardized and reproducible conditions in cell growth and nuclear extract preparation are crucial for the success of the assay system.

In comparison to the EMSA technique, our solid phase approach is much less subjected to non-physiological buffer conditions (ionic strength and pH), which occur during gel electrophoresis and can drastically disturb the equilibrium between free DNA and protein-DNA complexes (Sidorova et al. 2005). Thus, interactions revealed by our

SILAC-based screen in vitro will most likely also have the potential to occur in vivo, where they are of course influenced by additional levels of regulation (chromatin structure and histone modifications).

15 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Importantly, ChIP-chip studies have suggested that in vitro affinity of transcription factors for specific DNA sequences is often recapitulated in the relative occupancy of these regions in vivo (Harbison et al. 2004; Horak et al. 2002).

The energetics of protein-DNA interactions are dominated by interactions with the sugar- phosphate backbone of DNA; hence about two thirds of all contacts are not sequence-specific (Hoglund and Kohlbacher 2004). Despite very large affinities for their target (dissociation constants between 10-8 to 10-12 M) the binding specificity of DNA binding proteins can therefore be often rather modest (as low as a factor 10). Apart from the ability to monitor all-or-nothing interactions (Schulze and Mann 2004), the SILAC methodology excels in its potential to accurately quantify relative differences in the range from 2 to 10 (Ong et al. 2003) and is hence well suited for our protein DNA interaction screen (see Table 1 and 2).

The amount of TFAP2A molecules per cell has been estimated to be around 200 000

(Egener et al. 2005), which means that we have used approximately 500 pmol of

TFAP2A as input in our pilot experiment (in the presence of 25 pmol of bait). Although drastically decreased (ten times) compared to related studies (Himeda et al. 2004;

Yaneva and Tempst 2003), sample consumption is still significant and needs to be further optimized in the future. This will be achieved in several ways. The ratio of total protein input versus bait can be reduced by factor ten, which will not lead to a substantial decrease in the total amount of proteins bound to the bait. More importantly the relative abundance of the bound factors will change only slightly or not at all. Because our LC tandem MS proteomic platform using the LTQ-FT is not so much limited by sample amount as by sample complexity we do not expect any negative effect on our analysis

(de Godoy et al. 2006). Double-labeling of the extracts with heavy lysine and arginine derivatives will roughly double the number of peptides that can be quantified and hence increase sensitivity. Moreover, several improvements in mass spectrometric analysis

16 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

currently in development, such as the peptide exclusion lists mentioned above, should dramatically improve the sensitivity of peptide mixture analysis.

Determination of sequence-specific DNA protein interactions

Apart from validating the methodology, the results obtained with both the bait bearing a binding site for AP2 and the bait designed to confirm a direct interaction with ESRRA are functionally interesting. Our data clearly demonstrate that the point mutations in the

AP2 site drastically reduce binding of PURA and PURB to the bait. Because the interaction of Pur proteins with DNA is mainly based on DNA structure (e.g. the propensity to form cruciform DNA) (Wortman et al. 2005), our observation suggests the ability to identify structure-dependent associations in our screen. At the same time, this example also demonstrates that we can monitor the specific binding of several candidate factors to our DNA columns. This is further corroborated by the observations obtained with the bait harboring an ESRRA binding motif present at the ESRRA gene promoter.

Apart from confirming specific ESRRA binding we find that the NR NR2F6 is also able to bind this sequence element in vitro, and might therefore play a role in regulating ESRRA . Indeed, it has been shown that NR2F6 can bind the same motif present upstream of the renin and luteinizing gene in vivo (Liu et al.

2003; Zhang and Dufau 2000). In both cases NR2F6 acts as a transcriptional repressor competing with binding of other NRs to the same motif in cis.

Thus the DNA protein interaction screen is a powerful tool to conduct completely unbiased experiments that can follow up bioinformatic cis-element predictions. Unlike

EMSA our screen is directly able to answer the question which proteins can potentially bind a sequence of interest. Therefore, changes in protein occupancy that are influenced by single nucleotide polymorphisms, disease related nucleotide exchanges or alterations in relative protein expression levels may be easily monitored.

17 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Determination of DNA modification dependent protein interactions

The experiment performed with the MTA2 promoter CpG island probe is to our knowledge the first example of a quantitative proteomics approach designed to identify factors that bind to DNA in a modification dependent manner. Besides two known methyl-CpG interacting proteins, namely ZBTB33 and UHRF1, we identified two factors

(PCNA, DNMT1) that are part of a recently described UHRF1 complex (Sharif et al.

2007) as well as several proteins (USP7, KIAA0101, ACTL8) that have so far not been directly implicated in the biology of DNA methylation (Figure 4 and Table 2). However, we have not found any protein belonging to the MBD (Methyl-CpG Binding Domain) family (e.g. MECP2, MBD1 to MBD4), which have been discovered by means of binding to plasmid DNA derived methylated DNA substrates in vitro (Boyes and Bird 1991).

MBD3 interacts with DNA in a methylation-independent manner and MBD4 is mainly linked to DNA repair processes (reviewed in Klose and Bird 2006). The absence of

MECP2, which is able to bind even a single MeCpG site can simply be explained by the fact that this protein is not expressed in HeLa cells (Ng et al. 1999). A series of experiments have recently challenged the view that MBD proteins bind to methylated

CpGs in a sequence- and context-independent manner (Ballestar et al. 2003; Fraga et al. 2003; Klose et al. 2005; Le Guezennec et al. 2006). MECP2 for instance, requires a run of four or more A/T base pairs adjacent to the methyl-CpG for efficient DNA binding in vitro and in cells (Klose et al. 2005). Although unlikely, the lack of comprehensive binding data for MBD family members makes it impossible for us to exclude that our assay system might be biased against detection of these proteins due to sensitivity issues or experimental conditions.

However, our results confirm (Figure 4 and Table 2) the specific binding of ZBTB33 to the methylated CpG island of the MTA2 gene in vitro (Yoon et al. 2003). Strikingly, the absence of ZBTB33 does not detectably alter the expression of MTA2 and other

18 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

putative ZBTB33 target genes suggesting a redundant function (Prokhortchouk et al.

2006). UHRF1, being part of a multiprotein complex containing PCNA and DNMT1

(supplementary figure S4), might represent such a redundancy factor. Hence, we conclude that our screening system can uncover both direct and indirect protein DNA interactions that are specific for modified DNA.

Perspectives in the context of genome-wide mapping of DNA protein interactions

The Tempst and the Aebersold groups have described related approaches to identify transcription factors by affinity purification procedures combined with mass spectrometry

(Himeda et al. 2004; Yaneva and Tempst 2003). The study of Yaneva and Tempst requires prefractionation of nuclear extracts on a phosphocellulose column. In contrast, the quantitative proteomics approach by Himeda et al. used the ICAT (Isotope Coded

Affinity Tag) technique (Gygi et al. 1999), to identify proteins binding to a functionally important enhancer element. Both methodologies have limitations that prevent their use in high throughput investigations. Fractionation of nuclear extracts requires the parallel monitoring of the binding activities by EMSA. The ICAT approach requires chemical derivatization and subsequent clean up steps by strong cation exchange chromatography before LC-MS analysis, increasing the complexity of analysis (see introduction).

The strategy presented here has the potential to quantify almost all peptides present in a sample, because metabolic labeling of cells with ´heavy´ derivatives of lysine and arginine leads to complete encoding of tryptic peptides representing the total cellular proteome (Schulze et al. 2005). The screen is relatively simple to perform given high sensitivity mass spectrometric equipment associated with advanced software.

Streamlining the procedure should thus allow to perform it in large scale formats. In general it can be employed to detect the following type of interactions:

19 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

(1) direct binding; (2) indirect binding; (3) partner binding, in which a protein exhibits a low affinity interaction with DNA but specificity is achieved via binding to a nearby factor;

(4) modification dependent binding regulated by signaling.

Therefore, the DNA protein interaction screen can serve as a follow-up experiment in approaches based on bioinformatics prediction of functional elements or discovery of disease related nucleotide changes. In addition it complements rather than competes with established genome-wide location analysis (“ChIP on chip” and “ChIP-seq” experiments). It is completely unbiased with respect to both the bait and the potential preys present in nuclear extracts and does not require any a priori knowledge about

DNA binding specificities of the proteins involved. Whereas the DNA protein interaction screen reveals possible binders for a particular DNA sequence, the orthogonal

ChIP-chip experiment delivers binding sites for a given transcription factor. Currently, technical limitations in nucleotide resolution (in the range of several hundred base pairs) require extensive bioinformatic modeling of ChIP-chip data in order to define and locate the probable TF binding sites (Cawley et al. 2004; Harbison et al. 2004; Wei et al. 2006).

Accordingly, there has been relatively little overlap between two sets of proposed TP53

() binding sites in related studies (reviewed in Holstege and Clevers 2006). Protein arrays and protein binding microarrays (PPMs) have so far been only successfully used in yeast (Hall et al. 2004; Mukherjee et al. 2004). Protein arrays cannot reveal cis-elements that are bound in a cooperative manner by two or more proteins. Like the protein array studies, PPM experiments require purification and subsequent labeling of proteins that cannot be performed at large scale in higher organisms. In conclusion, the

SILAC based DNA protein interaction screen will have an important and unique role in delivering highly validated candidate proteins representing potential in vivo binding partners of functional DNA elements.

20 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Methods

Antibodies

The TFAP2A (AP2-alpha) and PCNA antibodies were purchased from Santa Cruz. The antibodies against PURA and ZBTB33 were a generous gift from Dr. E. M. Johnson

(Mount Sinai School of Medicine, New York, NY) and Dr. A. B. Reynolds (Vanderbilt

University, Nashville, TN), respectively. The ESRRA/ERR-alpha antibody was from

Novus Biologicals (Littletown, CO), the UHRF1/ICBP90 antibody from BD Biosciences

(Franklin Lake, NJ) and the USP7 antibody (ab4080) was purchased from

Abcam (Cambridge, UK). The RNA polymerase II antibody directed against the CTD was derived from hybridomas (WG16.1).

DNA affinity chromatography and quantitative proteomics

A detailed description can be found in the supplementary material. Briefly, HeLa-S3 cells were metabolically labeled with 2H4-lysine as the SILAC amino acid in RPMI medium followed by isolation and high salt extraction of nuclei (Dignam et al. 1983). Affinity purifications were performed with biotinylated double stranded oligonucleotides

(wild-type and control baits) that have been immobilized on streptavidin magnetic beads

(Dynal MyOne, Invitrogen) at their maximum binding capacity of ~200 pmol/mg. Routine analyses were carried out with nuclear extracts corresponding to 2.5x108 cells for both the 2H4-lysine labeled and the unlabeled state in the presence of 0.4 ml beads, respectively. Protein-DNA complexes were combined, eluted by restriction enzyme cleavage (PstI, NEB) and subjected to GeLC-MS analysis.

21 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Acknowledgments

We thank Dr. E. M. Johnson for anti PURA and Dr. A. B. Reynolds for anti ZBTB33 antibodies. CEBI (Center of Experimental Bioinformatics) is supported by a generous fund of the Danish National Research Foundation (Grundforskningfond). We are grateful to members of CEBI and the Department of Proteomics and Signal Transduction at the

Max-Planck-Institute for Biochemistry for constructive comments and discussion. We are thankful to Dr. Juergen Cox for developing the MaxQuant software and help in its application.

22 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Figure Legends

Figure 1. Principle of the DNA-protein interaction screen. Proteomes are metabolically labeled with 2H4-lysine (Lysine-d4) to allow discrimination based on differences in peptide mass (4 Da). Biotinylated (Bio) DNA molecules bearing functional elements (e.g. potential transcription factor binding sites or methylated CpGs) are synthesized and immobilized on streptavidin magnetic beads. Control columns are designed to lack the functional element of interest (e.g. point mutations in cis-elements or non-methylated CpGs). Nuclear extracts from unlabeled (Lysine-d0) and labeled cells are prepared and subjected to DNA affinity chromatography, followed by the release of

DNA-protein complexes by restriction enzyme digestion. After in-gel-digestion with trypsin, differentially labeled forms of lysine containing tryptic peptides are detected by mass spectrometry (MS). Peptides originating from proteins specifically recognizing functional elements will have a larger peak intensity of the lysine-d4 form. Unspecific interaction partners will have 1:1 ratios between both isotopic forms.

Figure 2. Properties of the DNA-protein interaction screen exemplified by a functional element harboring a binding site for the transcription factor AP2. (A) The sequence of the DNA baits used for the experiment. The AP2 binding site is shown in bold letters, whereas the point mutations designed to disrupt DNA binding are highlighted in grey. (B to D) SILAC is essential for the identification of specific binders.

(B) Specific interaction partners cannot be identified by 1D SDS-PAGE and silver staining of the eluted material from the AP2 wild-type (WT) and mutant (MT) columns.

(C) Western Blot analysis of selected specific interaction partners recapitulates results from the SILAC analysis. Equal amounts (10 per cent of total) of input (IN) and flow- through (FT) as well as eluted (Elution) material (50 per cent of total) from the

23 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

AP2 wild-type (WT) and mutant (MT) columns, respectively, were analyzed by immunobloting with antibodies directed against TFAP2A/AP2-alpha, RNA polymerase II

(pol II) and PURA. RNA polymerase II serves as a control for equalized total protein amounts and unspecific binding. (D) SILAC analysis discriminates specific from unspecific binders. Proteins containing peptides with lysine-d4 to lysine-d0 ratios of greater than 3:1 are considered to be specific.

Figure 3. The orphan ESRRA (ERR-alpha) exhibits specific binding to its bioinformatically predicted binding sequence. (A) Western Blot analysis

(same conditions as in Figure 2C) confirms the specific interaction of ESRRA as revealed by the SILAC experiment (B to D and Table 1). Sequences of the DNA baits employed are depicted. The predicted ESRRA binding site derived from an autoregulatory motif of the ESRRA gene (Mootha et al. 2004) is shown in bold letters whereas the point mutation for the control column is highlighted in grey. (B)

Representative peptide mass spectrum demonstrating the specific binding of ESRRA.

The MS spectrum of a quadruply charged labeled (monoisotopic peak marked with filled circle) and unlabeled (monoisotopic peak marked with open circle) tryptic ESRRA peptide acquired in the ICR cell of the LTQ-FT (0.3 ppm mass accuracy after recalibration) is shown. (C) MS/MS (MS2) fragmentation spectrum that identifies the peptide shown in (B). The precursor ion (m/z 684.81) was fragmented in the linear ion trap (LTQ part) of the mass spectrometer to obtain sequence information. (D)

MS/MS/MS (MS3) spectrum of the y15(2+) ion present in the MS2 experiment (C) further confirming the identification of the peptide shown in (B).

24 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Figure 4. Methyl-CpG specific binding partners can be discriminated by SILAC analysis. (A) Scheme of the CpG island in the upstream region of the human MTA2 gene that is located at position -623 to -601 relative to the transcription start site (arrowhead).

The cytosine residues highlighted in grey were either fully methylated (MeCpG) or not methylated (CpG) in the SILAC experiment. The structure of 5-methyl cytosine is shown.

(B) Representative peptide mass spectrum demonstrating the preferential binding of the methyl-CpG binding protein ZBTB33/Kaiso, which was known to interact with the fully methylated MTA2 CpG island. The MS spectrum of a labeled (monoisotopic peak marked with filled circle) and unlabeled (monoisotopic peak marked with open circle) tryptic ZBTB33 peptide acquired is shown. (C) Western Blot analysis (same conditions as in Figure 2C) of selected specific interaction partners recapitulates results from the

SILAC analysis (Table 2).

Table 1. Comparison of in-gel digestion versus in-solution digestion based SILAC analysis of proteins interacting specifically with the DNA column bearing a functional

ESRRA/ERR-alpha site (as shown in Figure 3). The Uniprot accession number, the normalized ratio, the ratio variability (in %) as well as the number of identified unique peptides and the number of quantified peptides for specific binders are shown.

Table 2. Proteins binding preferentially to the DNA column bearing the fully methylated CpG island of the human MTA2 gene. The Uniprot accession number, the number of identified unique peptides, the normalized ratio, the ratio variability (in %) and the number of quantified peptides for specific binders are shown. Biological functions are listed, if known. PCNA interaction motifs (Warbrick 2000) are present in KIAA0101

(QKGIGEFFRL aa 62-71) and DNMT1 (RQTTITSHFAK aa 163-173).

25 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

UHRF1 possesses ubiquitin E3 ligase activity in vitro and USP7, an ubiquitin protease stabilizes E3 ligases by preventing their auto-ubiquitination (for details see supplementary figure S4).

26 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

References

Ballestar, E., Paz, M.F., Valle, L., Wei, S., Fraga, M.F., Espada, J., Cigudosa, J.C., Huang, T.H., and Esteller, M. 2003. Methyl-CpG binding proteins identify novel sites of epigenetic inactivation in human cancer. Embo J. 22: 6335-6345. Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S., and Haussler, D. 2004. Ultraconserved elements in the human genome. Science 304: 1321-1325. Bergemann, A.D., Ma, Z.W., and Johnson, E.M. 1992. Sequence of cDNA comprising the human pur gene and sequence-specific single-stranded-DNA-binding properties of the encoded protein. Mol. Cell. Biol. 12: 5673-5682. Bostick, M., Kim, J.K., Esteve, P.O., Clark, A., Pradhan, S., and Jacobsen, S.E. 2007. UHRF1 plays a role in maintaining DNA methylation in mammalian cells. Science 317: 1760-1764. Boyes, J. and Bird, A. 1991. DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. Cell 64: 1123-1134. Bulyk, M.L. 2003. Computational prediction of transcription-factor binding site locations. Genome Biol. 5: 201. Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J. et al. 2004. Unbiased mapping of transcription factor binding sites along human 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116: 499-509. Cox, J. and Mann, M. 2007. Is proteomics the new genomics? Cell 130: 395-398. Cox, J. and Mann, M. submitted. High peptide identification rates and proteome-wide quantitation via novel computational strategies. Cravatt, B.F., Simon, G.M., and Yates, J.R., 3rd. 2007. The biological impact of mass- spectrometry-based proteomics. Nature 450: 991-1000. Daniel, J.M., Spring, C.M., Crawford, H.C., Reynolds, A.B., and Baig, A. 2002. The p120(ctn)-binding partner Kaiso is a bi-modal DNA-binding protein that recognizes both a sequence-specific consensus and methylated CpG dinucleotides. Nucleic Acids Res. 30: 2911-2919. de Godoy, L.M., Olsen, J.V., de Souza, G.A., Li, G., Mortensen, P., and Mann, M. 2006. Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system. Genome Biol. 7: R50. Dignam, J.D., Lebovitz, R.M., and Roeder, R.G. 1983. Accurate transcription initiation by RNA polymerase II in a soluble extract from isolated mammalian nuclei. Nucleic Acids Res. 11: 1475-1489. Egener, T., Roulet, E., Zehnder, M., Bucher, P., and Mermod, N. 2005. Proof of concept for microarray-based detection of DNA-binding oncogenes in cell extracts. Nucleic Acids Res. 33: e79; doi:10.1093/nar/gni1079. Ego, T., Tanaka, Y., and Shimotohno, K. 2005. Interaction of HTLV-1 Tax and methyl- CpG-binding domain 2 positively regulates the gene expression from the hypermethylated LTR. Oncogene 24: 1914-1923. Esteve, P.O., Chin, H.G., Smallwood, A., Feehery, G.R., Gangisetty, O., Karpf, A.R., Carey, M.F., and Pradhan, S. 2006. Direct interaction between DNMT1 and G9a coordinates DNA and histone methylation during replication. Genes Dev. 20: 3089-3103. Fraga, M.F., Ballestar, E., Montoya, G., Taysavang, P., Wade, P.A., and Esteller, M. 2003. The affinity of different MBD proteins for a specific methylated locus depends on their intrinsic binding properties. Nucleic Acids Res. 31: 1765-1774.

27 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Giguere, V., McBroom, L.D., and Flock, G. 1995. Determinants of target gene specificity for ROR alpha 1: monomeric DNA binding by an orphan nuclear receptor. Mol. Cell. Biol. 15: 2517-2526. Graumann, J., Hubner, N.C., Kim, J.B., Ko, K., Moser, M., Kumar, C., Cox, J., Schoeler, H., and Mann, M. 2007. SILAC-labeling and proteome quantitation of mouse embryonic stem cells to a depth of 5111 proteins. Mol. Cell. Proteomics 7: 672- 683. Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. 1999. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17: 994-999. Hall, D.A., Zhu, H., Zhu, X., Royce, T., Gerstein, M., and Snyder, M. 2004. Regulation of gene expression by a metabolic enzyme. Science 306: 482-484. Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J. et al. 2004. Transcriptional regulatory code of a eukaryotic genome. Nature 431: 99-104. Himeda, C.L., Ranish, J.A., Angello, J.C., Maire, P., Aebersold, R., and Hauschka, S.D. 2004. Quantitative proteomic identification of as the trex-binding factor in the muscle creatine kinase enhancer. Mol. Cell. Biol. 24: 2132-2143. Hoglund, A. and Kohlbacher, O. 2004. From sequence to structure and back again: approaches for predicting protein-DNA binding. Proteome Sci. 2: 3. Holstege, F.C. and Clevers, H. 2006. Transcription factor target practice. Cell 124: 21- 23. Horak, C.E., Mahajan, M.C., Luscombe, N.M., Gerstein, M., Weissman, S.M., and Snyder, M. 2002. GATA-1 binding sites mapped in the beta-globin locus by using mammalian chIp-chip analysis. Proc. Natl. Acad. Sci. U.S.A. 99: 2924-2929. Johnston, S.D., Liu, X., Zuo, F., Eisenbraun, T.L., Wiley, S.R., Kraus, R.J., and Mertz, J.E. 1997. Estrogen-related receptor alpha 1 functionally binds as a monomer to extended half-site sequences including ones contained within estrogen-response elements. Mol. Endocrinol. 11: 342-352. Kadonaga, J.T. 2004. Regulation of RNA polymerase II transcription by sequence- specific DNA binding factors. Cell 116: 247-257. Klose, R.J. and Bird, A.P. 2006. Genomic DNA methylation: the mark and its mediators. Trends Biochem. Sci. 31: 89-97. Klose, R.J., Sarraf, S.A., Schmiedeberg, L., McDermott, S.M., Stancheva, I., and Bird, A.P. 2005. DNA binding selectivity of MeCP2 due to a requirement for A/T sequences adjacent to methyl-CpG. Mol. Cell 19: 667-678. Knapp, A.M., Ramsey, J.E., Wang, S.X., Godburn, K.E., Strauch, A.R., and Kelm, R.J., Jr. 2006. Nucleoprotein interactions governing cell type-dependent repression of the mouse smooth muscle alpha-actin promoter by single-stranded DNA-binding proteins Pur alpha and Pur beta. J. Biol. Chem. 281: 7907-7918. Le Guezennec, X., Vermeulen, M., Brinkman, A.B., Hoeijmakers, W.A., Cohen, A., Lasonder, E., and Stunnenberg, H.G. 2006. MBD2/NuRD and MBD3/NuRD, two distinct complexes with different biochemical and functional properties. Mol. Cell. Biol. 26: 843-851. Liu, X., Huang, X., and Sigmund, C.D. 2003. Identification of a nuclear orphan receptor (Ear2) as a negative regulator of renin gene transcription. Circ. Res. 92: 1033- 1040. MacCoss, M.J., Wu, C.C., Liu, H., Sadygov, R., and Yates, J.R., 3rd. 2003. A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. Anal. Chem. 75: 6912-6921.

28 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

McClatchy, D.B., Liao, L., Park, S.K., Venable, J.D., and Yates, J.R. 2007. Quantification of the synaptosomal proteome of the rat cerebellum during post-natal development. Genome Res. 17: 1378-1388. Mohibullah, N., Donner, A., Ippolito, J.A., and Williams, T. 1999. SELEX and missing phosphate contact analyses reveal flexibility within the AP-2[alpha] protein: DNA binding complex. Nucleic Acids Res. 27: 2760-2769. Mootha, V.K., Handschin, C., Arlow, D., Xie, X., St Pierre, J., Sihag, S., Yang, W., Altshuler, D., Puigserver, P., Patterson, N. et al. 2004. Erralpha and Gabpa/b specify PGC-1alpha-dependent oxidative phosphorylation gene expression that is altered in diabetic muscle. Proc. Natl. Acad. Sci. U.S.A. 101: 6570-6575. Mukherjee, S., Berger, M.F., Jona, G., Wang, X.S., Muzzey, D., Snyder, M., Young, R.A., and Bulyk, M.L. 2004. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 36: 1331-1339. Ng, H.H., Zhang, Y., Hendrich, B., Johnson, C.A., Turner, B.M., Erdjument-Bromage, H., Tempst, P., Reinberg, D., and Bird, A. 1999. MBD2 is a transcriptional repressor belonging to the MeCP1 histone deacetylase complex. Nat. Genet. 23: 58-61. Ohlsson, R. and Kanduri, C. 2002. New twists on the epigenetics of CpG islands. Genome Res. 12: 525-526. Ong, S.E., Blagoev, B., Kratchmarova, I., Kristensen, D.B., Steen, H., Pandey, A., and Mann, M. 2002. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1: 376-386. Ong, S.E., Kratchmarova, I., and Mann, M. 2003. Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC). J. Proteome Res. 2: 173-181. Ong, S.E. and Mann, M. 2005. Mass spectrometry-based proteomics turns quantitative. Nat. Chem. Biol. 1: 252-262. Ong, S.E., Mittler, G., and Mann, M. 2004. Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 1: 119-126. Perini, G., Diolaiti, D., Porro, A., and Della Valle, G. 2005. In vivo transcriptional regulation of N- target genes is controlled by E-box methylation. Proc. Natl. Acad. Sci. U.S.A. 102: 12117-12122. Prokhortchouk, A., Sansom, O., Selfridge, J., Caballero, I.M., Salozhin, S., Aithozhina, D., Cerchietti, L., Meng, F.G., Augenlicht, L.H., Mariadason, J.M. et al. 2006. Kaiso-deficient mice show resistance to intestinal cancer. Mol. Cell. Biol. 26: 199- 208. Remenyi, A., Scholer, H.R., and Wilmanns, M. 2004. Combinatorial control of gene expression. Nat. Struct. Mol. Biol. 11: 812-815. Robertson, K.D. 2005. DNA methylation and human disease. Nat. Rev. Genet. 6: 597- 610. Saccani, S., Pantano, S., and Natoli, G. 2003. Modulation of NF-kappaB activity by exchange of dimers. Mol. Cell 11: 1563-1574. Schulze, W.X., Deng, L., and Mann, M. 2005. Phosphotyrosine interactome of the ErbB- receptor kinase family. Mol. Syst. Biol. doi:10.1038/msb4100012. Schulze, W.X. and Mann, M. 2004. A novel proteomic screen for peptide-protein interactions. J. Biol. Chem. 279: 10756-10764. Sharif, J., Muto, M., Takebayashi, S., Suetake, I., Iwamatsu, A., Endo, T.A., Shinga, J., Mizutani-Koseki, Y., Toyoda, T., Okamura, K. et al. 2007. The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA. Nature 450: 908-912.

29 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Sidorova, N.Y., Muradymov, S., and Rau, D.C. 2005. Trapping DNA-protein binding reactions with neutral osmolytes for the analysis by gel mobility shift and self- cleavage assays. Nucleic Acids Res. 33: 5145-5155. Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J. et al. 2005. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23: 137-144. Travis, A., Hagman, J., Hwang, L., and Grosschedl, R. 1993. Purification of early-B-cell factor and characterization of its DNA-binding specificity. Mol. Cell. Biol. 13: 3392-3400. Tummala, R., Romano, R.A., Fuchs, E., and Sinha, S. 2003. Molecular cloning and characterization of AP-2 epsilon, a fifth member of the AP-2 family. Gene 321: 93-102. Udalova, I.A., Mott, R., Field, D., and Kwiatkowski, D. 2002. Quantitative prediction of NF-kappa B DNA-protein interactions. Proc. Natl. Acad. Sci. U.S.A. 99: 8167- 8172. Unoki, M., Nishidate, T., and Nakamura, Y. 2004. ICBP90, an -1 target, recruits HDAC1 and binds to methyl-CpG through its SRA domain. Oncogene 23: 7601- 7610. Warbrick, E. 2000. The puzzle of PCNA's many partners. Bioessays 22: 997-1006. Warren, C.L., Kratochvil, N.C., Hauschild, K.E., Foister, S., Brezinski, M.L., Dervan, P.B., Phillips, G.N., Jr., and Ansari, A.Z. 2006. Defining the sequence-recognition profile of DNA-binding molecules. Proc. Natl. Acad. Sci. U.S.A. 103: 867-872. Wasserman, W.W. and Sandelin, A. 2004. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5: 276-287. Wei, C.L., Wu, Q., Vega, V.B., Chiu, K.P., Ng, P., Zhang, T., Shahab, A., Yong, H.C., Fu, Y., Weng, Z. et al. 2006. A global map of p53 transcription-factor binding sites in the human genome. Cell 124: 207-219. Wilson, T.E., Fahrner, T.J., and Milbrandt, J. 1993. The orphan receptors NGFI-B and establish monomer binding as a third paradigm of nuclear receptor-DNA interaction. Mol. Cell. Biol. 13: 5794-5804. Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R. et al. 2001. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29: 281-283. Wortman, M.J., Johnson, E.M., and Bergemann, A.D. 2005. Mechanism of DNA binding and localized strand separation by Pur alpha and comparison with Pur family member, Pur beta. Biochim. Biophys. Acta 1743: 64-78. Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. 2005. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434: 338- 345. Yaneva, M. and Tempst, P. 2003. Affinity capture of specific DNA-binding proteins for mass spectrometric identification. Anal. Chem. 75: 6437-6448. Yoon, H.G., Chan, D.W., Reynolds, A.B., Qin, J., and Wong, J. 2003. N-CoR mediates DNA methylation-dependent repression through a methyl CpG binding protein Kaiso. Mol. Cell 12: 723-734. Zhang, Y. and Dufau, M.L. 2000. Nuclear orphan receptors regulate transcription of the gene for the human luteinizing hormone receptor. J. Biol. Chem. 275: 2763-2770.

30 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Web site references http://msquant.sourceforge.net/ http://www.ebi.ac.uk/IPI/IPIhelp.html http://www.ebi.uniprot.org/index.shtml http://www.ebi.ac.uk/GOA/ http://tranche.proteomecommons.org/

Data availability:

Supplementary data accompanies this paper. Mass spectrometric raw data and

MaxQuant evidence files have been uploaded to Tranche at „proteomicscommons.org‟.

31 Light Heavy Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Lysine-d0 Lysine-d4

Isolation of Isolation of Nuclei Nuclei

Lysine-d0 (Kd0) Lysine-d4 (Kd4) nuclear extract nuclear extract

Bio Bio Affinity purification with Affinity purification with DNA control bait DNA wild type bait

Combine, Elute, SDS PAGE, Digest with trypsin

Identify and Quantify by MS

specific

unspecific Peptide Kd0 Peptide Kd4 Intensity

m/z A AP2-WTDownloaded: from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press CAAAACTGCAGGATCGAACTGACC GCC CGG GGC ACGTGTC AP2-MT: CAAAACTGCAGGATCGAACTGACC GTA CGG GGC ACGTGTC B C

AP2 WTAP2 MT

AP2 WTAP2 MTAP2 WTAP2 MT

pol II

TFAP2A

PURA IN FT Elution

SDS PAGE Western blot analysis D PURB 7:1 TFAP2A 6:1 TFAP2C 5:1 PURA 4:1 3:1 2:1 1:1

1:1

Ratio (Lysine-d4:Lysine-d0) 1:2 1:3 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Estrogen Related Receptor-alpha (ESRRA): A WT: B RLCLVCGDVASGYHYGVASCEACK (4+) CGTAGAACTGCAGGACCCGCAG TGA CCT TGA GTTTT GAC 100.0

MT: 90.0 CGTAGAACTGCAGGACCCGCAG TGA TCT TGA GTTTT GAC 684.81 80.0 e

c 70.0 n a d

n 60.0 u A NE (Kd4)NE (Kd0)ESRRAESRRA WTESRRA MTESRRA WT MT 50.0 v i t a l 40.0 pol II e R e b * 30.0 ESRRA 20.0 683.81 IN FT Elution 10.0

C D 684.0 685.0 686.0 m/z

y 2+ y 15 y 10 10 0.0 MS2 10 0.0 9 MS3 90.0 90.0 80.0 80.0 ^b6 e e

c y 70.0 2+ c 70.0

n 6 b n a 11 2+ a d 60.0 y d n 60.0 2+ 14 2+ n 2+

u y b y u 14 b 10 2+ 16 ^b

A 50.0 ^b 2+ A 50.0

4 6 y 7 - H O 13 b17 2

v 2+ - H O v i i

t b b 2 40.0 9 5 t 40.0

a b y b a l ^b 4 y 8 11 l 13 e 30.0 2+ 6 2+ e ^a y R e b 30.0 8

- NH R e b b 2+ b 3 5 ^b 5 y 2+ 2+ 2+ 18 2+ y2 5 ^b ^b 7 b y b y 2+ ^b 10 12 20.0 8 12 16 19 b 20.0 ^a 9 ^b y 23 7 y 13 y2 3 b 12 10.0 3 b - H O 10.0 9 2 y14

400 600 800 1000 1200 1400 1600 400 600 800 1000 1200 1400 1600 m/z m/z A Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

CpG island MTA2MTA2 gene

-623 to -601 O Me HN

O N R

GGCGCGCGAGTCTTTGGGGCGCG

B C ZBTB33 (Kaiso): FIAELGVPLSQVK (2+) 702.92 100.0

90.0 NE MeCpGCpG MeCpGCpG

80.0 USP7

70.0 e

c UHRF1 n a

d 60.0 n u PCNA A 50.0 v i t a l pol II e

R e b 40.0 ZBTB33 30.0 IN FT Elution 20.0 700.91 10.0

701.0 702.0 703.0 704.0 m/z Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

ESRRA in-gel digestion

Protein Name Uniprot Ratio Unique Quantified Norm. Peptides Peptides

ESRRA P11474 9.02 18 14

NR2F6 P10588 4.93 7 3

ESRRA in-solution digestion

Protein Name Uniprot Ratio Unique Quantified Norm. Peptides Peptides

ESRRA P11474 7.44 9 5

Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Proteins binding preferentially to the methylated MTA2 CpG island

Protein Name Uniprot Ratio Unique Quantified Features Norm. Peptides Peptides

UHRF1 (ICBP90) Q96T88 9.10 17 11 MeCpG binding Ub ligase PhD finger binds Me2 & Me3 H3K9

PCNA P12004 4.43 9 20 binds to DNMT1 and KIAA0101

USP7 Q93009 8.50 8 5 Deubiquitinating enzyme

DNMT1 P26358 7.06 11 3 5-MeC MTase binds to UHRF1 binds to PCNA

ZBTB33 (Kaiso) Q86T24 11.28 3 2 MeCpG binding

ACTL8 (FLJ32777) Q96M75 8.22 4 3 Actin ATPase

KIAA0101 Q15004 17.47 3 3 binds to PCNA

Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Material and Methods

Liquid chromatography mass spectrometry

Proteins eluted from the DNA affinity columns were separated on 4-12 % Bis-Tris SDS gels (NuPAGE, Invitrogen) and stained either with silver or colloidal Coomassie. The gels were sliced into six to 8 equally sized gel pieces and subjected to tryptic in-gel digestion, essentially as described (Shevchenko et al. 2006). Prior to LC-MS analysis tryptic peptide mixtures were desalted using STAGE tips as described previously

(Rappsilber et al. 2003; Rappsilber et al. 2007) with the following modifications. In order to ensure recovery of highly hydrophilic peptides, flow-through fractions from the C18

STAGE tips were further applied to Carbon STAGE tips (made from Empore 3M, activated carbon discs) and the pooled elutions from the C18 and Carbon tips were subsequently analyzed by LC-MS. For the AP2 column experiments nanoscale-LC

(Agilent 1100 nanoflow system) was coupled to a QSTAR-Pulsar (ABI-Sciex) quadrupole time-of-flight hybrid mass spectrometer. Peptides were eluted from an analytical column by a linear gradient running from 5 to 60% (v/v) acetonitrile (in 0.5% acetic acid) with a flow rate of 200 nl/min in 90 minutes and sprayed directly into the orifice of the mass spectrometer. The 15 cm fused silica emitter with an inner diameter of 75 µm (New

Objective, USA and Proxeon Biosystems, Denmark) was packed in-house with reverse- phase ReproSil-Pur C18-AQ 3 µm resin (Dr. Maisch GmbH, Germany). Proteins were identified by information-dependent acquisition of fragmentation spectra of doubly, triply or quadruply charged peptides with a precursor selection window of m/z = 350 to 1400 using pulsed extraction of fragments enhancing at m/z = 600. Tandem mass spectra were acquired for 1.5 s and fragmented peptides were excluded from sequencing for 90 s (Schulze and Mann 2004). Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

The other experiments were analyzed by nanoscale-LC (Agilent 1100 nanoflow system) coupled to a 7-Tesla linear ion-trap Fourier-transform ion cyclotron resonance mass spectrometer (LTQ-FT, Thermo Electron, Germany) equipped with a nanoelectrospray source (Proxeon, Denmark). Peptides were eluted from an analytical column by a linear gradient running from 2 to 50% (v/v) acetonitrile (in 0.5% acetic acid) with a flow rate of

250 nl/min in 90 minutes and sprayed directly into the orifice of the mass spectrometer.

Information dependent acquisition of MS, MS/MS and MS3 spectra was performed essentially as described with minor modifications (Olsen and Mann 2004). Survey full- scan MS spectra (m/z 350 – 1600) were acquired in the Fourier transform ion cyclotron resonance (FT-ICR) cell with resolution R = 25,000 at m/z = 400 with a target value of

5,000,000 ions allowing a maximum fill-time of 2000 ms. The three most intense ions were sequentially isolated for SIM (selected ion monitoring) scans with a 12 Th mass window, R = 50,000 and a target accumulation value of 50,000 (maximum fill-time =

500 ms). The same ions were fragmented in the linear ion trap by CID (collision induced dissociation) at a target value of 10,000 (maximum fill-time = 250 ms). For MS3 the most intense ion (>350 Th) from each MS2 spectrum was further isolated and fragmented.

Target ions selected for MS2 were dynamically excluded for 90 s. Total cycle time was 3 to 3.5 s. The general mass spectrometric conditions were: spray voltage, 2.3 kV; no sheath and auxiliary gas flow; ion transfer tube temperature, 120°C; collision gas pressure, 1.3 mTorr; normalized collision energy using wide-band activation mode; 30% for MS2 and 28% for MS3. Ion selection thresholds were 500 counts for MS2 and

5 counts for MS3. An activation q = 0.25 and activation time of 30 ms was applied in both

MS2 and MS3 acquisitions.

Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Data analysis

MS data acquired on the QSTAR-Pulsar instrument were searched with Mascot 2.0

(Matrix Science, UK) against the human International Protein Index protein database

(IPI, version 3.04, http://www.ebi.ac.uk/IPI/IPIhelp.html), to which we added frequently observed contaminants as described (Ong et al. 2004). The maximum allowed mass deviation MMD (Zubarev and Mann 2007) for monoisotopic precursor ions and MS/MS peaks was restricted to 0.1 Da in each case. Enzyme specificity was set to trypsin (with a maximum of 2 missed cleavages), allowing for cleavage N-terminal to proline and between aspartic acid and proline (Olsen et al. 2004). Carbamidomethyl cysteine was

2 set as fixed and oxidized methionine, N-acetylation and H4-lysine as variable modification. Protein identifications were further analyzed and manually verified by the use of MSQuant (http://msquant.sourceforge.net/).

FT-MS data were analyzed by MaxQuant version 1.0.6.15 (Cox and Mann submitted)

2 and processed with lysine-d4 ( H4-lysine) labeling and standard settings for LTQFT.

Enzyme specificity was set to trypsin, allowing for cleavage N-terminal to proline and between aspartic acid and proline. Carbamidomethyl cysteine was set as fixed and oxidized methionine, N-acetylation, as well as loss of ammonia from N-terminal glutamine as variable modifications. The pre-analyzed MaxQuant data was searched against Human International Protein Index protein sequence database (IPI, version 3.24) supplemented with frequent contaminants and concatenated with reversed copies of all sequences using Mascot (version 2.1.04, Matrix Science) with an MMD (Zubarev and

Mann 2007) of 30 ppm for monoisotopic precursor ions and 0.5 Da for MS/MS peaks.

Identified peptides were further analyzed and re-quantified with MaxQuant software. The required false positive rate was set to 0.05 at the peptide level and the required false discovery rate was set to 0.01 at the protein level with the minimum required peptide length set to 6 amino acids. Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

In addition to the protein false discovery rate threshold, proteins were only considered to be identified with at least two different (at sequence level) peptides, thereof one can be uniquely assigned to the respective protein sequence.

For the interpretation of MS3 spectra, peak lists derived from Xcalibur 1.4 (Thermo

Fisher Scientific) raw files were generated by software developed in-house (DTA Super

Charge; http://msquant.sourceforge.net/), searched with Mascot (version 2.0) against the human IPI database (version 3.04) and analyzed by the MS3 scoring function of

MSQuant as described (Olsen and Mann 2004).

Normalized protein quantification values were only calculated for proteins that could be quantified with at least two MaxQuant-compatible SILAC-pairs associated with them.

The extracted peptide ion current dependent (a measure for the protein abundance) statistical significance of protein ratios was further calculated with MaxQuant

(significance “B”) as described (Cox and Mann submitted; Graumann et al. 2007). This is because for highly abundant proteins the statistical spread of unregulated proteins is much more focused than for low abundant ones, which is due to the fact that intense signals (high signal-to-noise ratio in mass spectrum) can be measured more accurate than less intense ones (low signal-to-noise ratio). Therefore, a protein that shows, for instance, a ratio of two should be very significant when it is highly abundant, while at very low abundance it should only be marginally significant. We defined ratios to represent highly significant “outliers” from the main distribution of all ratios (in one experiment) if they exhibited a distance of more than three standard deviations from the center of the distribution (significance < 0.0012) and considered these proteins to constitute specific interaction partners. Finally, we assessed the accuracy of protein ratio by calculating the coefficient of variability over all redundant quantifiable peptides. Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

DNA affinity chromatography experiments

Single-stranded HPLC-purified oligonucleotides (DNA Technology, Aarhus, Denmark) bearing a spacer sequence (5´CAAAA3` at the sense strand) followed by a PstI restriction site (5´CTGCAG3´) are reconstituted at a concentration of 50 nmol/ml in annealing buffer (20 mM Tris-HCl, pH 8.0, 10 mM MgCl2, 0.1 M KCl). Complementary strands were annealed with the minus strand in slight excess to the biotinylated plus strand to ensure the exclusive immobilization of double stranded DNA molecules to streptavidin beads. Therefore, 0.03 ml of the sense strand oligonucleotide harboring the

5´Biotin-TEG spacer arm are mixed with 0.04 ml of the complementary (non biotinylated) oligonucleotide, heated for 5 min at 90°C and gradually cooled to 65°C (10 min). After an incubation step at 65°C for 5 min, the thermoblock is switched-off and allowed to cool to room temperature (the double-stranded DNA can be stored for several months at -20°C at this step). The biotinylated dsDNA (~250 pmol) is then diluted in buffer DW (20 mM

Tris-HCl, pH 8.0, 2 M NaCl, 0.5 mM EDTA, 0.03% NP-40) to give a final volume of

0.4 ml and incubated (rotary wheel, 3 hrs, room temperature) with 1 mg of Dynal MyOne

(Invitrogen) streptavidin magnetic beads that have been equilibrated with two washes

(0.4 ml each) of TE buffer containing 0.01 % (v/v) NP40 and two washes of buffer DW

(0.75 ml each). After the binding step the beads are washed one time in 0.4 ml TE

(containing 0.02 % NP40), three times in 0.4 ml DW buffer and resuspended in 0.1 ml of buffer DW. At this step the DNA affinity column (~200 pmol/mg) can be stored for several weeks in the refrigerator.

Before the DNA affinity pull-down experiment the beads harboring the immobilized dsDNA are incubated for one hour in blocking buffer (20 mM Hepes, pH 7.9, 0.05 mg/ml

BSA, 0.05 mg/ml glycogen, 0.3 M KCl, 0.02 % NP40, 2.5 mM DTT, 5 mg/ml polyvinylpyrrolidone) at RT on a rotary wheel using 1.3 ml buffer per mg beads. Excess blocking buffer is removed by washing the beads with 1.3 ml (per mg beads) Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

1x restriction endonuclease buffer (New England Biolabs buffer 3) containing 0.02 %

NP40 and two times with 2.67 ml buffer G (20 mM Tris-HCl, pH 7.3, 10 % (v/v) glycerol,

0.1 M KCl, 0.2 mM EDTA, 10 mM potassium glutamate, 0.04 % NP40, 2 mM DTT, 0.4 mM PMSF and 0.005 mg/ml each of aprotinin, leupeptin and pepstatin A) per mg beads.

2 Nuclear extracts from H4-lysine labeled and unlabeled cells that have been stored at

-80°C are cleared for insoluble matter by centrifugation at 15,000xg for 20 min in a benchtop centrifuge at 4°C. Cleared labeled and unlabeled nuclear extracts (~6-10 mg/ml) are adjusted to 10 mM potassium glutamate and quickly diluted with on volume of buffer G containing either 0.2 mg/ml of poly dIdC (for GC-rich DNA elements) or poly dAdT (for AT-rich DNA elements and the methyl-CpG baits). Again, insoluble matter is removed by centrifugation at 15,000xg for 10 min in a benchtop centrifuge at 4°C and the diluted extracts are separately subjected to a pre-clearing step (one hour, rotating wheel, cold lab) employing MyOne streptavidin magnetic beads (washed with TE/0.02%

NP40, buffer DW and equilibrated in buffer G) at a final concentration of 1.5 mg/ml.

The beads are removed with a magnetic separator and the labeled and unlabeled extracts are mixed separately with the “blocked” probe and the control DNA magnetic beads at a final concentration of 0.7 mg/ml, followed by a 3 hr incubation step (cold lab, rotating wheel). Magnetic beads are recovered (separator) and subjected to four washes with buffer G (1.8 ml per mg beads) and one wash (0.9 ml per mg beads) with restriction endonuclease buffer (NEB buffer 3; adjusted to 0.02% (v/v) NP40, 2.5% (v/v) glycerol,

1 mM DTT, 0.2 mM PMSF and 0.005 mg/ml each of aprotinin, leupeptin and pepstatin A). The probe and control beads are immediately combined using restriction endonuclease buffer and suspended at a concentration of 3.75 mg/ml followed by the addition of 0.014 ml of PstI (New England Biolabs) and a one hour incubation at 25°C in an Eppendorf thermomixer (at 1400 rpm). Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

This step is repeated once, the combined elutions are concentrated by ultrafiltration

(Millipore Y3K, 8,000xg, 10°C, benchtop centrifuge) and the concentrated sample is adjusted to 1x LDS sample buffer/15 mM beta-mercaptoethanol (Invitrogen) directly in the ultrafiltration device. Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Figure and Table Legends

Figure S1. Restriction endonuclease cleavage is an efficient method to release the immobilized DNA. A slight excess of biotinylated double stranded DNA oligonucleotides is bound to streptavidin beads. Equal volumes of input and flow-through (FT) were separated by agarose gel electrophoresis and visualized by ethidium bromide staining.

Twenty percent of eluted material (Elution) from the experiment shown in Figure 2 (main text) was analyzed by agarose gel electrophoresis after isolation of DNA by phenol- chloroform extraction.

Figure S2. Identified tryptic peptides demonstrating the binding of TFAP2A (AP2- alpha) and TFAP2C (gamma) to the AP2 site containing DNA (see Figure 1 in main text). Observed tryptic peptides matching to AP2 family members are superimposed on a multiple alignment (ClustalW, http://www.ebi.ac.uk/Tools/clustalw/index.html) of

TFAP2A (AP2A_HUMAN), TFAP2B (AP2B_HUMAN), TFAP2C (AP2C_HUMAN),

TFAP2D (AP2D_HUMAN) and TFAP2E (AP2E_HUMAN). Shared peptides are highlighted in red, whereas TFAP2A- and TFAP2C-specific ones are illustrated in blue and green, respectively. Lysine and arginine residues amino-terminal to the trypsin cleavage site are highlighted in bold italics.

Figure S3. Representative peptide mass spectra demonstrating the preferential binding of UHRF1 (ICBP90), USP7, DNMT1, KIAA0101, ACTL8 (actin-like protein 8) and

PCNA to the DNA column bearing the methylated MTA2 CpG island. Representative MS spectra of labeled (monoisotopic peak marked with filled circle) and unlabeled

(monoisotopic peak marked with open circle) tryptic peptides are shown. Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

The unlabeled peptide isotope cluster represents the relative binding of the proteins to the non-methylated DNA sequence, whereas the labeled peptide isotope cluster represents the relative binding to the methylated CpG island. A peptide mass spectrum derived from a tryptic peptide identifiying the RNA polymerase II POLR2E (p23) subunit is shown as a control for unspecific interactions with the DNA column.

Figure S4. Hypothetical model explaining the methylation-specific interaction of

UHRF1/ICBP90, DNMT1, PCNA, KIAA0101, USP7 and ZBTB33/Kaiso with the MTA2

CpG island. UHRF1 and ZBTB33 bind directly (indicated by arrowheads) to the methylated (indicated by filled circles) CpGs, whereas the other proteins are most likely recruited by protein-protein interactions (see Figure 4 and Table 2 in main text). Briefly,

DNMT1 directly interacts with PCNA (Chuang et al. 1997), UHRF1 (Achour et al. 2007;

Bostick et al. 2007; Sharif et al. 2007) and KIAA0101 (Simpson et al. 2006; Yu et al.

2001). UHRF1 possesses ubiquitin E3 ligase activity (modifies histone H3) in vitro

(Citterio et al. 2004) and contains six copies of a signature motif (consensus P/AXXS) that mediates the contact between USP7 and its target molecules (Sheng et al. 2006).

USP7, an ubiquitin protease binds to and stabilizes E3 ligases by preventing their auto- ubiquitination (Canning et al. 2004). Since UHRF1 was recently shown to also read out the H3 K9 di- and tri-methylation mark by virtue of its PHD finger domain (Karagianni et al. 2007) our pull-down data corroborate the idea that UHRF1 is a central player coordinating maintenance and interpretation of DNA and histone methylation marks

Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Table S1. Proteins interacting specifically with the DNA column bearing a functional

AP2 site. The Uniprot accession number, the molecular weight (MW), the ratio, the standard deviation (STDEV) and the number of quantified peptides (Q. Peptides) for specific binders are shown.

Table S2. Proteins exhibiting preferential binding to the ESRRA promoter hormone response element. The Uniprot accession number, the normalized ratio, the ratio variability, the quantification significance (ratio significance) and the number of unique peptides are shown. Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary References

Achour, M., Jacq, X., Ronde, P., Alhosin, M., Charlot, C., Chataigneau, T., Jeanblanc, M., Macaluso, M., Giordano, A., Hughes, A.D. et al. 2007. The interaction of the SRA domain of ICBP90 with a novel domain of DNMT1 is involved in the regulation of VEGF gene expression. Oncogene 27: 2187-2197. Bostick, M., Kim, J.K., Esteve, P.O., Clark, A., Pradhan, S., and Jacobsen, S.E. 2007. UHRF1 plays a role in maintaining DNA methylation in mammalian cells. Science 317: 1760-1764. Canning, M., Boutell, C., Parkinson, J., and Everett, R.D. 2004. A RING finger ubiquitin ligase is protected from autocatalyzed ubiquitination and degradation by binding to ubiquitin-specific protease USP7. J. Biol. Chem. 279: 38160-38168. Chuang, L.S., Ian, H.I., Koh, T.W., Ng, H.H., Xu, G., and Li, B.F. 1997. Human DNA- (cytosine-5) methyltransferase-PCNA complex as a target for p21WAF1. Science 277: 1996-2000. Citterio, E., Papait, R., Nicassio, F., Vecchi, M., Gomiero, P., Mantovani, R., Di Fiore, P.P., and Bonapace, I.M. 2004. Np95 is a histone-binding protein endowed with ubiquitin ligase activity. Mol. Cell. Biol. 24: 2526-2535. Cox, J. and Mann, M. submitted. High peptide identification rates and proteome-wide quantitation via novel computational strategies. Graumann, J., Hubner, N.C., Kim, J.B., Ko, K., Moser, M., Kumar, C., Cox, J., Schoeler, H., and Mann, M. 2007. SILAC-labeling and proteome quantitation of mouse embryonic stem cells to a depth of 5111 proteins. Mol. Cell. Proteomics 7: 672- 683. Karagianni, P., Amazit, L., Qin, J., and Wong, J. 2007. ICBP90, a novel methyl K9 H3 binding protein linking protein ubiquitination with heterochromatin formation. Mol. Cell. Biol. 28: 705-717. Olsen, J.V. and Mann, M. 2004. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc. Natl. Acad. Sci. U.S.A. 101: 13417-13422. Olsen, J.V., Ong, S.E., and Mann, M. 2004. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 3: 608-614. Ong, S.E., Mittler, G., and Mann, M. 2004. Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 1: 119-126. Rappsilber, J., Ishihama, Y., and Mann, M. 2003. Stop and go extraction tips for matrix- assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75: 663-670. Rappsilber, J., Mann, M., and Ishihama, Y. 2007. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2: 1896-1906. Schulze, W.X. and Mann, M. 2004. A novel proteomic screen for peptide-protein interactions. J. Biol. Chem. 279: 10756-10764. Sharif, J., Muto, M., Takebayashi, S., Suetake, I., Iwamatsu, A., Endo, T.A., Shinga, J., Mizutani-Koseki, Y., Toyoda, T., Okamura, K. et al. 2007. The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA. Nature 450: 908-912. Sheng, Y., Saridakis, V., Sarkari, F., Duan, S., Wu, T., Arrowsmith, C.H., and Frappier, L. 2006. Molecular recognition of p53 and MDM2 by USP7/HAUSP. Nat. Struct. Mol. Biol. 13: 285-291. Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Shevchenko, A., Tomas, H., Havlis, J., Olsen, J.V., and Mann, M. 2006. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 1: 2856-2860. Simpson, F., Lammerts van Bueren, K., Butterfield, N., Bennetts, J.S., Bowles, J., Adolphe, C., Simms, L.A., Young, J., Walsh, M.D., Leggett, B. et al. 2006. The PCNA-associated factor KIAA0101/p15(PAF) binds the potential tumor suppressor product p33ING1b. Exp. Cell Res. 312: 73-85. Yu, P., Huang, B., Shen, M., Lau, C., Chan, E., Michel, J., Xiong, Y., Payan, D.G., and Luo, Y. 2001. p15(PAF), a novel PCNA associated factor with increased expression in tumor tissues. Oncogene 20: 484-489. Zubarev, R. and Mann, M. 2007. On the proper use of mass accuracy in proteomics. Mol. Cell. Proteomics 6: 377-381.

Web site references http://msquant.sourceforge.net/ http://www.ebi.ac.uk/IPI/IPIhelp.html http://www.ebi.uniprot.org/index.shtml http://www.ebi.ac.uk/Tools/clustalw/index.html

Figure S1 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

AP2 WTAP2 MTAP2 WTAP2 MT AP2 WTAP2 MT

M Input FT M Elution

DNA agarose gel Figure S2 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

P05549|AP2A_HUMAN ------MLWKLTDNIKYEDC-EDRHDGTSNGTARLPQLGTVGQSPYTSAPPLSHT 48 Q92481|AP2B_HUMAN MHSPPRDQAAIMLWKLVENVKYEDIYEDRHDGVPSHSSRLSQLGSVSQGPYSSAPPLSHT 60 Q92754|AP2C_HUMAN ------MLWKITDNVKYEEDCEDRHDGSSNGNPRVPHLSSAGQHLYSPAPPLSHT 49 Q6VUC0|AP2D_HUMAN ------MLVHTYSAME------RPDGLG-AAAGGARLSSLPQAAYGPAPPLCHT 41 Q7Z6R9|AP2E_HUMAN ------MSTTFPGLVHDAEIRHDGSNSYRLMQLGCLESVANSTVAYSSSSPLTYS 49 * :. :. . : * .:.** ::

P05549|AP2A_HUMAN PNA----DFQPP-YFPPPY--QPI-YPQSQDP---YSHVN-DPYS--LNPLHAQPQP--Q 92 Q92481|AP2B_HUMAN PSS----DFQPP-YFPPPY--QPLPYHQSQDP---YSHVN-DPYS--LNPLHQ-PQ---Q 103 Q92754|AP2C_HUMAN GVA----EYQPPPYFPPPY--QQLAYSQSADP---YSHLG-EAYAAAINPLHQPAPTGSQ 99 Q6VUC0|AP2D_HUMAN PAATAAAEFQPP-YFPPPYPQPPLPYGQAPDAAAAFPHLAGDPYGG-LAPLAQPQPP--- 96 Q7Z6R9|AP2E_HUMAN TTG---TEFASP-YFSTNHQYTPL-HHQSFHYEFQHSHPAVTPDAYSLNSLHHSQQYYQQ 104 . :: .* **.. : : : *: . ..* . . : .*

P05549|AP2A_HUMAN HPGWPGQRQ---SQESGLLHTHRGLPHQLSG-LDP-----RRDY---RRHEDLLHGP-HA 139 Q92481|AP2B_HUMAN HPWGQRQRQEVGSEAGSLLPQPRAALPQLSG-LDP-----RRDYHSVRRPDVLLHSAHHG 157 Q92754|AP2C_HUMAN QQAWPGRQSQEGAGLPSHHGRPAGLLPHLSG-LEAGAVSARRDAY--RRSDLLLPHAHAL 156 Q6VUC0|AP2D_HUMAN QAAWAAPRAAARAHEE--PPGLLAPPARALG-LDP-----RRDYA--TAVPRLLHGLADG 146 Q7Z6R9|AP2E_HUMAN IHHGEPTDFINLHNARALKSSCLDEQRRELGCLDAYR---RHDLS--LMSHGSQYGMHPD 159 : * *:. *:*

P05549|AP2A_HUMAN LSSGLGD-LSIHSLPH--AIEEVPHVEDP---GINIPDQT-VIKKGPVSLSKSNSNAVSA 192 Q92481|AP2B_HUMAN LDAGMGDSLSLHGLGHP-GMEDVQSVEDANNSGMNLLDQS-VIKKVPVPP-----KSVTS 210 Q92754|AP2C_HUMAN DAAGLAENLGLHDMPH--QMDEVQNVDDQ---HLLLHDQT-VIRKGPISMT----KNPLN 206 Q6VUC0|AP2D_HUMAN AHGLADAPLGLPGLAAAPGLEDLQAMDEP---GMSLLDQS-VIKKVPIPSK---ASSLSA 199 Q7Z6R9|AP2E_HUMAN QRLLPGPSLGLAAAGA----DDLQGSVEAQ-CGLVLNGQGGVIRRG------200 *.: ::: : : : .* **::

P05549|AP2A_HUMAN IPINKDNLFGGV-VNPNEVFCSVPGRLSLLSSTSKYKVTVAEVQRRLSPPECLNASLLGG 251 Q92481|AP2B_HUMAN LMMNKDGFLGGMSVNTGEVFCSVPGRLSLLSSTSKYKVTVGEVQRRLSPPECLNASLLGG 270 Q92754|AP2C_HUMAN LPCQKE--LVGAVMNPTEVFCSVPGRLSLLSSTSKYKVTVAEVQRRLSPPECLNASLLGG 264 Q6VUC0|AP2D_HUMAN LSLAKDS-LVGGITNPGEVFCSVPGRLSLLSSTSKYKVTVGEVQRRLSPPECLNASLLGG 258 Q7Z6R9|AP2E_HUMAN ------GTCVVNPTDLFCSVPGRLSLLSSTSKYKVTIAEVKRRLSPPECLNASLLGG 251 *. ::********************:.**:****************

P05549|AP2A_HUMAN VLRRAKSKNGGRSLREKLDKIGLNLPAGRRKAANVTLLTSLVEGEAVHLARDFGYVCETE 311 Q92481|AP2B_HUMAN VLRRAKSKNGGRSLRERLEKIGLNLPAGRRKAANVTLLTSLVEGEAVHLARDFGYICETE 330 Q92754|AP2C_HUMAN VLRRAKSKNGGRSLREKLDKIGLNLPAGRRKAAHVTLLTSLVEGEAVHLARDFAYVCEAE 324 Q6VUC0|AP2D_HUMAN VLRRAKSKNGGRCLRERLEKIGLNLPAGRRKAANVTLLTSLVEGEAVHLARDFGYVCETE 318 Q7Z6R9|AP2E_HUMAN ILRRAKSKNGGRCLREKLDRLGLNLPAGRRKAANVTLLTSLVEGEALHLARDFGYTCETE 311 :***********.***:*:::************:************:******.* **:*

P05549|AP2A_HUMAN FPAKAVAEFLNRQHSD-PNEQVTRKNMLLATKQICKEFTDLLAQDRSPLGNSRPNPILEP 370 Q92481|AP2B_HUMAN FPAKAVSEYLNRQHTD-PSDLHSRKNMLLATKQLCKEFTDLLAQDRTPIGNSRPSPILEP 389 Q92754|AP2C_HUMAN FPSKPVAEYLTRPHLGGRNEMAARKNMLLAAQQLCKEFTELLSQDRTPHGTSRLAPVLET 384 Q6VUC0|AP2D_HUMAN FPAKAAAEYLCRQHAD-PGELHSRKSMLLAAKQICKEFADLMAQDRSPLGNSRPALILEP 377 Q7Z6R9|AP2E_HUMAN FPAKAVGEHLARQHME-QKEQTARKKMILATKQICKEFQDLLSQDRSPLGSSRPTPILDL 370 **:*...*.* * * : :**.*:**::*:**** :*::***:* *.** :*:

P05549|AP2A_HUMAN GIQSCLTHFNLISHGFGSPAVCAAVTALQNYLTEALKAMDKMYLS-----NNP-NSHTDN 424 Q92481|AP2B_HUMAN GIQSCLTHFSLITHGFGAPAICAALTALQNYLTEALKGMDKMFLN-----NTTTNRHTSG 444 Q92754|AP2C_HUMAN NIQNCLSHFSLITHGFGSQAICAAVSALQNYIKEALIVIDKSYMN-----PGD-QSPADS 438 Q6VUC0|AP2D_HUMAN GVQSCLTHFSLITHGFGGPAICAALTAFQNYLLESLKGLDKMFLS-----SVG-SGHGET 431 Q7Z6R9|AP2E_HUMAN DIQRHLTHFSLITHGFGTPAICAALSTFQTVLSEMLNYLEKHTTHKNGGAADSGQGHANS 430 .:* *:**.**:**** *:***::::*. : * * ::* . .

P05549|AP2A_HUMAN N----AKSSDKEEKHRK----- 437 Q92481|AP2B_HUMAN EGP-GSKTGDKEEKHRK----- 460 Q92754|AP2C_HUMAN N-----KTLEKMEKHRK----- 450 Q6VUC0|AP2D_HUMAN K------ASEKDAKHRK----- 442 Q7Z6R9|AP2E_HUMAN EKAPLRKTSEAAVKEGKTEKTD 452 : : : *. *

Figure S3

UHRF1 (ICBP90):Downloaded LWNEVLASLK from genome.cshlp.org (2+) onUSP7: October DLLQFFKPR 3, 2021 - Published (2+) by Cold SpringDNMT1: Harbor LSIFDANESGFESYEALPQHK Laboratory Press (3+)

588.85 100.0 584.35 100.0 100.0 796.39

90.0 90.0 90.0

80.0 80.0 80.0

70.0 796.05 70.0 584.85 70.0 e c e 589.35 e n c 60.0 c a n n 60.0 60.0 d a a n d d u n n 796.72 u 50.0 u A

50.0 50.0 A A

v i t v v i i a t t l 40.0 a

a 40.0 40.0 e l l e e R e b R e b R e b 797.06 30.0 30.0 30.0 589.85

20.0 20.0 585.35 20.0

582.33 797.40 582.83 795.72 10.0 794.71 10.0 10.0 795.37 586.84 590.34 585.85 587.34 583.84

587.0 588.0 589.0 590.0 582.0 583.0 584.0 585.0 586.0 795.0 796.0 797.0 m/z m/z m/z

KIAA0101: YAGGNPKVCVRPTPK (2+) ACTL8 (FLJ32777): TVIIDHGSGFLK (2+) POLR2E: ALIVVQQGMTPSAK (2+)

760.40 645.87 729.90 100.0 100.0 100.0

731.92 90.0 90.0 90.0

732.41 80.0 760.90 80.0 80.0

646.37 70.0 70.0 70.0 e e e c c c n n n 60.0 60.0 60.0 a a a 730.40 d d d n n n u u u 50.0 50.0 50.0 A A A

v v v i i i t t t a a a 40.0 732.92

40.0 l 40.0

l 730.91 l e e e 761.40 R e b R e b R e b 646.87

30.0 30.0 30.0

758.41 20.0 20.0 20.0 758.88 643.86 761.90 10.0 10.0 10.0 644.36 645.37 647.38 647.87

758.0 759.0 760.0 761.0 762.0 644.0 645.0 646.0 647.0 648.0 730.0 731.0 732.0 733.0 m/z m/z m/z

PCNA: DLSHIGDAVVISCAK (2+)

794.92 100

90 795.42

80 e c n

a 70 d n

u 60 b A

50 v i t

a 795.92 l 40 e R e 30

20 792.91 793.41 793.91 796.43 10 794.42 796.92 0 7 93.0 7 94.0 7 95.0 7 96.0 7 97.0 m/z Figure S4 Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

KIAA0101 PCNA DNMT1

UHRF1/ICBP90 ZBTB33/Kaiso USP7

MTA2 CpG island Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

Table S1

TFAP2 in-gel digestion

Protein Name Uniprot MW (kD) Ratio Quantified Peptides

TFAP2A Q13777 48.06 6.75 10

TFAP2C Q92754 49.18 5.00 4

PURA Q00577 34.91 4.22 4

PURB Q96QR8 33.24 6.73 11

Table S2

ESRRA in-gel digestion

Protein Name Uniprot Ratio Ratio Unique Quantified Norm. Significance Peptides Peptides

ERR-alpha (ESRRA) P11474 9.02 1.09E-43 18 14

NR2F6 P10588 4.93 3.56E-23 7 3

ATP citrate lyase P53396 2.37 3.13E-09 3 2 (ACLY)

Pur-beta (PURB) Q96QR8 2.37 2.06E-07 3 2

Interleukin enhancer- Q12906-1 1.93 2.47E-06 5 5 binding factor 3 (ILF3)

Arsenite-resistance Q9BXP5-2 1.91 4.06E-06 2 2 protein 2 (ARS2)

TFII-I (GTF2I) P78347-1 1.85 7.65E-06 12 6

Eukaryotic translation O43324 2.03 2.53E-05 5 4 elongation factor 1 epsilon-1 (EEF1E1)

KH-type splicing Q92945 1.68 1.44E-04 8 2 regulatory protein (KHSRP)

Ubiquitin-activating P22314 1.59 4.97E-04 9 3 enzyme E1 (UBA1)

Downloaded from genome.cshlp.org on October 3, 2021 - Published by Cold Spring Harbor Laboratory Press

A SILAC-based DNA protein interaction screen that identifies candidate binding proteins to functional DNA elements

Gerhard Mittler, Falk Butter and Matthias Mann

Genome Res. published online November 17, 2008 Access the most recent version at doi:10.1101/gr.081711.108

Supplemental http://genome.cshlp.org/content/suppl/2009/01/06/gr.081711.108.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; accepted Manuscript manuscript is likely to differ from the final, published version.

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Copyright © 2008, Cold Spring Harbor Laboratory Press