Chip-On-Chip Significance Analysis Reveals Large-Scale Binding and Regulation by Human Transcription Factor Oncogenes
Total Page:16
File Type:pdf, Size:1020Kb
ChIP-on-chip significance analysis reveals large-scale binding and regulation by human transcription factor oncogenes Adam A. Margolina,b,c,1, Teresa Palomerod,e, Pavel Sumazinb, Andrea Califanoa,b,d,2,3, Adolfo A. Ferrandod,e,f,2,3, and Gustavo Stolovitzkyb,c,2,3 aDepartment of Biomedical Informatics, bJoint Centers for Systems Biology, dInstitute for Cancer Genetics, eDepartment of Pathology, and fDepartment of Pediatrics, Columbia University, New York, NY 10032; and cFunctional Genomics and Systems Biology Group, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 Edited by Barry H. Honig, Columbia University, New York, NY, and approved November 1, 2008 (received for review July 9, 2008) ChIP-on-chip has emerged as a powerful tool to dissect the complex tions. Although several studies have experimentally validated novel network of regulatory interactions between transcription factors target collections produced at a given statistical threshold (8–12), and their targets. However, most ChIP-on-chip analysis methods these studies likely miss a large number of true binding events, use conservative approaches aimed at minimizing false-positive obscuring the full complexity of transcriptional processes. transcription factor targets. We present a model with improved Using an empirically determined model of the distribution of sensitivity in detecting binding events from ChIP-on-chip data. Its intensity ratios for non-IP-enriched probes in ChIP2 experiments, application to human T cells, followed by extensive biochemical we developed an analytical method called ChIP2 Significance validation, reveals that 3 oncogenic transcription factors, NOTCH1, Analysis (CSA). When applied to ChIP2 data from the NOTCH1, MYC, and HES1, bind to several thousand target gene promoters, MYC, and HES1 protooncogenes in human T cell acute lympho- up to an order of magnitude increase over conventional analysis blastic leukemia (T-ALL) cells, CSA increased the number of methods. Gene expression profiling upon NOTCH1 inhibition detected binding sites by up to an order of magnitude compared shows broad-scale functional regulation across the entire range of with other routinely used methods. Both binding site analysis and predicted target genes, establishing a closer link between occu- biochemical validation demonstrate quantitative agreement with pancy and regulation. Finally, the increased sensitivity reveals a CSA-predicted false-positive rates. Analysis of gene expression combinatorial regulatory program in which MYC cobinds to virtu- signatures indicates functional regulation by NOTCH1 across the ally all NOTCH1-bound promoters. Overall, these results suggest an entire range of predicted targets. Finally, the increased sensitivity unappreciated complexity of transcriptional regulatory networks reveals that virtually all NOTCH1-bound promoters are also bound and highlight the fundamental importance of genome-scale anal- by MYC. Overall, these results highlight the power of the proposed ysis to represent transcriptional programs. analysis framework for the identification of transcriptional net- works and provide an improved and fundamentally different pic- regulatory networks ͉ T cell lymphoblastic leukemia ͉ ture of the transcriptional programs controlled by NOTCH1, transcriptional regulation ͉ systems biology HES1, and MYC in T-ALL. he dysregulated activity of oncogenic transcription factors Results T(TFs) contributes to neoplastic transformation by promoting Probe Statistics Are Accurately Modeled by CSA. T-ALL is a malig- aberrant expression of target genes involved in regulating cell nant tumor characterized by the aberrant activation of oncogenic homeostasis. Therefore, characterization of the regulatory net- TFs (13). We recently demonstrated that constitutive activation of works controlled by these TFs is a critical objective in understanding NOTCH1 signaling due to mutations in the NOTCH1 gene acti- the molecular mechanisms of cell transformation. ChIP-on-chip vates a transcriptional network that controls leukemic cell growth (ChIP2) (1) has emerged as a promising technology in the dissection (11, 14–16). These studies also demonstrated a fundamental role of transcriptional networks by providing high-resolution maps of for HES1 and MYC as transcriptional mediators of NOTCH1 genome-wide TF–chromatin interactions. signals (15, 17). To characterize the structure of the oncogenic ChIP2 uses microarray technology to measure the relative abun- transcriptional network driven by activated NOTCH1 in T cell dance of genomic fragments derived from an immunoprecipitate transformation, we sought to identify the direct transcriptional (IP) sample, which is enriched in fragments bound by an immuno- precipitated protein (usually a TF), and a whole-cell extract (WCE) sample, containing fragments derived from a total chromatin Author contributions: A.A.M., P.S., A.C., A.A.F., and G.S. designed research; A.A.M., T.P., and P.S. performed research; T.P. and A.A.F. contributed new reagents/analytic tools; preparation (input control) or an immunoprecipitation with a A.A.M. and P.S. analyzed data; and A.A.M., T.P., P.S., A.C., A.A.F., and G.S. wrote the paper. nonspecific control antibody (2). The 2 samples may either be The authors declare no conflict of interest. hybridized to different arrays or labeled with different dyes and This article is a PNAS Direct Submission. hybridized to the same array. Correct interpretation of ChIP2 data Freely available online through the PNAS open access option. depends critically on an accurate statistical model to compute the Data deposition: The microarray data have been deposited in the Gene Expression Omnibus probability that a given IP/WCE ratio is produced by a binding (GEO) Database, www.ncbi.nlm.nih.gov/geo (accession no. GSE12868). ChIP2 data is at event rather than experimental noise. http://wiki.c2b2.columbia.edu/califanolab/PNASAM2009/. 2 Recently, several elegant ChIP analysis methods have been 1Present address: The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, proposed to tackle problems such as integrating measurements MA 02142. from adjacent probes (3–6) or inferring binding site locations at 2A.C., A.A.F., and G.S. contributed equally to this work. subprobe resolution (7). However, the lower-level problem of 3To whom correspondence may be addressed. E-mail: [email protected], califano@ developing an accurate error model to define meaningful statistical c2b2.columbia.edu, or [email protected]. thresholds has received comparably little attention [see SI and Fig. This article contains supporting information online at www.pnas.org/cgi/content/full/ 1]. Thus, ChIP2 data analysis methods often use highly conservative 0806445106/DCSupplemental. approaches aimed at minimizing the rate of false-positive predic- © 2008 by The National Academy of Sciences of the USA 244–249 ͉ PNAS ͉ January 6, 2009 ͉ vol. 106 ͉ no. 1 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0806445106 Downloaded by guest on September 27, 2021 accurate description of the individual and combinatorial regulatory programs controlled by these TFs. We first generated an empirical model of the distribution of IP/WCE intensity ratios for probes associated with unbound frag- ments (see Materials and Methods), and we used it to assign a P value to each probe in the analysis of ChIP2 assays representing replicate experiments for NOTCH1, MYC, and HES1. ChIP2 assays for these TFs were performed in HPB-ALL cells, a well-characterized T-ALL cell line with high expression levels of activated NOTCH1, MYC, and HES1. For NOTCH1, ChIP2 assays were also performed in CUTLL1 cells, another NOTCH1-dependent T-ALL cell line. The magnitude versus amplitude plots (Fig. 2A) of the intensity- dependent distributions of probe-ratio values showed marked dif- ferences for the four experiments. In each case CSA accurately modeled the left tail of the probe ratio probability distribution, where the contribution from bound probes is expected to be Fig. 1. Modeling errors of methods that use whole-dataset statistics for minimal (Fig. 2 A and B). We note that if bound-probe ratios are either normalization or significance detection. Blue bars represent a histo- well separated from the experimental noise, the P value distribution 2 gram of log2 IP/WCE probe ratio values from a MYC ChIP experiment. The for all probes should be uniform between zero and one (unbound histogram displays distinct, overlapping distributions for bound and unbound probes) with a single peak near zero (bound probes). Importantly, probes. The dotted red curve shows the log2 ratio values after mean centering, CSA accurately captured these statistical properties (see SI). a common normalization technique that, for this experiment, adjusts the mean of the null distribution to be negative to compensate for the large 2 number of high-ratio values for the bound probes. The green curve represents Improved ChIP Sensitivity by CSA. CSA then incorporates the probe a Gaussian fitted to the overall distribution, demonstrating that analysis significance model with an analytical method that integrates the methods that fit a global error model to these data will significantly overes- statistics for replicate experiments and probes with nearby genomic timate the variance of the null distribution and will incur a high false-negative locations (to account for ChIP2 fragmentation lengths, see Materials rate, as shown by the black arrow, which represents 2 standard deviations and Methods). We used CSA to compute the false