Discovery of Functional Noncoding Elements by Digital Analysis of Chromatin Structure

Discovery of functional noncoding elements by digital analysis of chromatin structure Peter J. Sabo*, Michael Hawrylycz*, James C. Wallace*, Richard Humbert*, Man Yu†, Anthony Shafer*, Janelle Kawamoto*, Robert Hall*, Joshua Mack*, Michael O. Dorschner*, Michael McArthur*, and John A. Stamatoyannopoulos*‡ *Department of Molecular Biology, Regulome, 2211 Elliott Avenue, Suite 600, Seattle, WA 98121; and †Division of Medical Genetics, University of Washington, 1705 NE Pacific Street, Seattle, WA 98195 Communicated by Stanley M. Gartler, University of Washington, Seattle, WA, October 6, 2004 (received for review August 10, 2004) We developed a quantitative methodology, digital analysis of individual DNase I cutting events, (ii) can be applied to any cell chromatin structure (DACS), for high-throughput, automated map- type, and (iii) is self-validating (i.e., it can be applied to auto- ping of DNase I-hypersensitive sites and associated cis-regulatory matically map HSs at a high confidence level without requiring sequences in the human and other complex genomes. We used a subsequent, independent molecular validation step). 19͞20-bp genomic DNA tags to localize individual DNase I cutting Here we describe an approach that combines molecular and events in nuclear chromatin and produced Ϸ257,000 tags from computational methods to achieve these aims. We used 19͞ erythroid cells. Tags were mapped to the human genome, and a 20-bp genomic DNA tags to localize individual DNase I cutting quantitative algorithm was applied to discriminate statistically events in nuclear chromatin and discovered DNase I HSs by significant clusters of independent DNase I cutting events. We identifying statistically significant tag clustering events by using show that such clusters identify both known regulatory sequences a quantitative algorithm. This method, digital analysis of chro- and previously unrecognized functional elements across the ge- matin structure (DACS), provides the framework for genome- nome. We used in silico simulation to demonstrate that DACS is wide localization of DNase I HSs and associated cis-regulatory capable of efficient and accurate localization of the majority of sequences in an efficient, quantitative, and automated fashion, DNase I-hypersensitive sites in the human genome without requir- opening the door for systematic exposition of the regulatory ing an independent validation step. A unique feature of DACS is genome. that it permits unbiased evaluation of the chromatin state of regulatory sequences from widely separated genomic loci. We Methods found surprisingly large differences in the accessibility of distant Cell Culture and DNase I Digestion. We cultured K562 (ATCC) cells regulatory sequences, suggesting the existence of a hierarchy of in RPMI medium 1640 (Invitrogen) supplemented with 10% nuclear organization that escapes detection by conventional chro- FBS to a target density of 5 ϫ 105 cells per ml. We performed matin assays. DNase I (Roche Applied Sciences, Indianapolis) digestions (0.5–2 units per ml) according to the protocol described in ref. GENETICS cis-regulatory elements ͉ DNase I-hypersensitive sites ͉ gene regulation 15 and purified DNA by using the Puregene system (Gentra Systems). omprehensive delineation of functional noncoding se- Cquences in complex genomes is a major goal of modern Creation of DACS Libraries. We developed a tagging method for biology. The activation and function of regulatory sequences is identifying individual DNase I cut sites in nuclear chromatin. linked to focal alterations in chromatin structure (1, 2), which Fig. 1 shows a schematic of the protocol, further illustrated in may be detected experimentally through hypersensitivity to Fig. 6, which is published as supporting information on the PNAS DNase I in the context of nuclear chromatin. DNase I-hyper- web site. Additional detailed protocol information, including sensitive sites (HSs) are the sine qua non of classical cis- sequences of all oligonucleotides, is provided in the Supporting Text regulatory elements, including promoters, enhancers, silencers, , which is published as supporting information on the PNAS web site. After digestion of isolated nuclei with DNase I under insulators, and locus control regions (3–9). Systematic mapping hypersensitive treatment conditions, DNase-cut ends were re- of DNase I HSs across the genome should therefore yield a paired and ligated to a biotinylated linker adaptor containing comprehensive library of cis-regulatory elements, but is intrac- dual restriction sites for a type IIs restriction endonuclease table with conventional approaches. (MmeI) and a four-cutter enzyme (MluI), oriented such that the The feasibility of cloning DNase I HSs has recently been direction of cleavage of MmeI is toward the genomic DNA demonstrated by using both direct vector-assisted end cloning ligand. After linker ligation, the preparation was digested to (10) and a subtractive enrichment approach (11). The former completion with MmeI, which released the linker plus 19–20 bp method is limited to quiescent cells and therefore cannot be used of genomic DNA sequence (owing to a common 1-bp wobble in in the context of well-studied and widely used cell lines or other the MmeI indirect cut site). Linker͞genomic DNA tag fragments proliferating tissues. Moreover, neither method is well suited to were then isolated and purified over streptavidin-coated mag- efficient recovery of DNase I HSs on a genomewide scale netic particles (Dynal, Great Neck, NY). A second linker adaptor because of the 1:1 mapping between cloning events and se- containing a BsiWI site was ligated to the exposed genomic DNA quences; both require large amounts of sequencing and, criti- end of each fragment bound to the beads. By using primers cally, an independent molecular validation step for each candi- complementary to each of the linker sequences, the genomic date clone. DNA tags were PCR-amplified in situ while attached to the The application of tag-based or ‘‘digital’’ methodologies has revolutionized the study of transcriptome biology (12–14), en- abling both the generation of genomewide data sets and insight Abbreviations: DACS, digital analysis of chromatin structure; HS, hypersensitive site; into the tremendous dynamic range of gene expression. HSqPCR, hypersensitivity quantitative PCR; n-cluster, n-tag cluster; PPV, positive predictive Genomewide localization of DNase I HSs and associated value; TSS, transcriptional start site. cis-regulatory sequences requires development of a high- ‡To whom correspondence should be addressed. E-mail: [email protected]. throughput approach that (i) can efficiently map millions of © 2004 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0407387101 PNAS ͉ November 30, 2004 ͉ vol. 101 ͉ no. 48 ͉ 16837–16842 Downloaded by guest on September 29, 2021 enables 1- to 2-bp mismatch recognition, which is sufficient to account for sequencing errors and genomic variants. Primer Selection. We designed primers to amplify Ϸ250-bp genomic segments spanning candidate HS sequences by using PRIMER3 (18). Analysis of DNase I Hypersensitivity by Hypersensitivity Quantitative PCR (HSqPCR). We used a real-time quantitative PCR-based method to quantify DNase I hypersensitivity in K562 cells as described in refs. 11 and 19. Conventional DNase I Hypersensitivity Assays. Conventional DNase I hypersensitivity studies were performed by using the indirect end-label technique (20) according to a standard protocol described in ref. 21. Microarray Expression Analysis. Gene expression analysis was performed on a Human 1A Oligo Microarray (Agilent, Palo Alto, CA). Total RNA was isolated from 5 ϫ 107 K562 cells with an RNeasy total RNA isolation kit (Qiagen, Valencia, CA). Results Overview of Digital Analysis of Chromatin Structure. DACS is a hybrid molecular–computational methodology comprising two discrete phases: (i) production of a genomic DNA tag library encompassing individual DNase I-hypersensitive cut sites in nuclear chromatin and (ii) computational analysis of genomic tag distributions to identify statistically significant clusters and derive the underlying HS sequence. A schematic of the process Fig. 1. Schematic of DACS library creation. See Methods and Supporting Text of creating DACS tag libraries from DNase I-treated nuclear for description and Fig. 6 for additional illustration. chromatin is provided in Fig. 1 and illustrated in Fig. 6. DACS tag libraries provide base-pair resolution of DNase I cutting events and are therefore highly complex compared with RNA- beads. The products of this amplification were collected and derived libraries. DACS also differs fundamentally from previ- separated by electrophoresis through a 10% polyacrylamide gel, ously described genomic DNA tag-based methods (13, 22, 23) purified, and minimally reamplified. The second linker adaptor used to study gene expression or copy number in which tags are was released after BsiWI digestion, and the tags were recaptured generated relative to restriction enzyme sites, and subsequent on streptavidin-coated beads. Tags were cleaved from the beads tag localization in the genome relies explicitly on prior knowl- by digestion with MluI. After purification, tags were ligated to edge of the restriction site ‘‘scaffold.’’ form high-molecular-weight concatemers and size-selected over a 1.5% agarose gel. Fragments Ͼ500 bp in length were isolated Application of DACS to K562 Erythroid Cells. Using the method and cloned into pGEM5z (Promega)

Load more