Identifying Dnase I Hypersensitive Sites As Driver Distal Regulatory Elements in Breast Cancer
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE DOI: 10.1038/s41467-017-00100-x OPEN Identifying DNase I hypersensitive sites as driver distal regulatory elements in breast cancer Matteo D′Antonio 1, Donate Weghorn2, Agnieszka D′Antonio-Chronowska3, Florence Coulet4,10, Katrina M. Olson5,6, Christopher DeBoever 7, Frauke Drees4, Angelo Arias4, Hakan Alakus4,11, Andrea L. Richardson8,12, Richard B. Schwab1,9, Emma K. Farley5,6, Shamil R. Sunyaev2 & Kelly A Frazer1,3,4 Efforts to identify driver mutations in cancer have largely focused on genes, whereas non-coding sequences remain relatively unexplored. Here we develop a statistical method based on characteristics known to influence local mutation rate and a series of enrichment filters in order to identify distal regulatory elements harboring putative driver mutations in breast cancer. We identify ten DNase I hypersensitive sites that are significantly mutated in breast cancers and associated with the aberrant expression of neighboring genes. A pan-cancer analysis shows that three of these elements are significantly mutated across multiple cancer types and have mutation densities similar to protein-coding driver genes. Functional characterization of the most highly mutated DNase I hypersensitive sites in breast cancer (using in silico and experimental approaches) confirms that they are regulatory elements and affect the expression of cancer genes. Our study suggests that mutations of regulatory elements in tumors likely play an important role in cancer development. 1 Moores Cancer Center, University of California, La Jolla, San Diego, CA 92093, USA. 2 Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA. 3 Institute for Genomic Medicine, University of California, La Jolla, San Diego, CA 92093, USA. 4 Department of Pediatrics, University of California, La Jolla, San Diego, CA 92093, USA. 5 Department of Medicine, Division of Cardiology, University of California, La Jolla, San Diego, CA 92093, USA. 6 Division of Biological Sciences, Section of Molecular Biology, University of California, La Jolla, San Diego, CA 92093, USA. 7 Bioinformatics and Systems Biology, University of California, La Jolla, San Diego, CA 92093, USA. 8 Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA. 9 Department of Medicine, School of Medicine, University of California, La Jolla, San Diego, CA 92093, USA. 10Present address: Department of Genetics, Pitie-Salpetriere Hospital, Pierre and Marie Curie University, Paris 75013, France. 11Present address: Department of General, Visceral and Cancer Surgery, University of Cologne, Cologne 50937, Germany. 12Present address: The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA. Matteo D′Antonio and Donate Weghorn contributed equally to this work. Correspondence and requests for materials should be addressed to E.K.F. (email: [email protected]) or to S.R.S. (email: [email protected])orto K..F. (email: [email protected]) NATURE COMMUNICATIONS | 8: 436 | DOI: 10.1038/s41467-017-00100-x | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-00100-x fforts to identify driver mutations have so far largely focused Supplementary Table 1). We investigated the mutational on coding genes, although there have been several recent landscape of these tumors to determine if they are similar to E 1 15 fi analyses focused on non-coding sequences. Weinhold et al. previously analyzed breast cancers. Using MuTect we identi ed analzyed non-coding functional sequences, identifying 193 193,958 high confidence somatic mutations corresponding to an regulatory elements with elevated mutation densities, and then average of 4,127 mutations (range 3,241 to 41,714) per patient characterized in depth driver mutations in the promoters of (Fig. 1a, Supplementary Table 2, Supplementary Data 1). three genes (SDHD, PLEKHS1 and WDR74). Nik-Zainal et al.2 On average, there are 53 mutations (range 10 to 225) per tumor analyzed the mutational landscape of 560 breast cancers and also in coding sequences. Examining 12 genes previously reported as found putative driver mutations in the promoters of PLEKHS1 harboring driver mutations in breast cancer8, we identified and WDR74, as well as in the promoter of TBC1D12. Fredriksson mutations at the expected density in TP53 (33 mutations in et al.3 focused on regions immediately upstream of transcription 32 samples, 68.1%) PIK3CA (12 mutations, 25.5%), CDH1 start sites, identified 17 recurrent promoter mutations and (2 mutations, 4.3%), MLL3 (2 mutations, 4.3%) and GATA3 (one characterized mutations in the TERT promoter. Melton et al.4 mutation, 2.1%)8 (Fig. 1b). We did not observe mutations in six developed a model to identify driver mutations as outliers of a genes (CTCF, MAP2K4, PIK3R1, PTEN, RUNX1 and TBX3) Poisson distribution. Using this model, they analyzed recurrently known to be recurrently mutated at lower frequency (<5%), or mutated positions, detected nine putative driver regulatory MAP3K1 that is mutated at high frequency (13%) only in luminal elements and experimentally confirmed by reporter assays that A tumors8. TP53 has a high incidence of frameshift and nonsense the recurrent mutations in three of these elements near TERT, mutations (6 and 4, respectively, Supplementary Data 2), as GP6 and BCL11B result in altered expression. A study by Puente previously observed8. Among non-driver genes mutated at high et al.5, which focused on chronic lymphocytic leukemia, identified frequency in our discovery set (>3 samples, 8%), we detected an enhancer of PAX5 as a putative driver element. Overall, in several known to be hypermutated in many cancer types (TTN, spite of some recent progress, the analytic approaches for MUC17, MUC16, OBSCN), mostly due to their length (>4,000 identifying driver regulatory elements are not as developed as codons, with long introns often spanning more than 1 Mb) and – those for identifying driver genes and relatively few driver believed to primarily harbor passenger mutations (Fig. 1c)14 16. regulatory elements have been characterized. Examining the somatic substitution frequencies for all samples, Breast cancer is a highly heterogeneous disease characterized the most common substitution is CG–TA (Fig. 1d), comprising by four major clinically relevant phenotypes6 that have specific 32% of all somatic SNVs, similar to previous observations17. gene expression patterns7, but overlapping mutational profiles8. Recently, breast cancers were shown to often display kataegis, i.e. Mutational analyses of exomes have shown that combining all localized hypermutation, proposed to result from cytosine breast cancer subtypes together results in increased sensitivity for deaminations catalyzed by APOBEC proteins18. We inspected the – identifying driver genes8 10. In this study, we have analyzed distribution of distances between mutations in the discovery whole genome sequences for 657 breast cancer samples and more samples (Fig. 1e, f) and observed 69 kataegis loci in 29 samples than one thousand tumors across 19 additional cancer types to (59.6%, Supplementary Table 3)19, levels similar to previous detect non-coding driver mutations. We focused on analyzing reports (~50%)18. Overall, these data show that the mutational DNase I hypersensitive sites (DHSs) as previous pan-cancer rates and patterns in the 47 discovery samples are generally analyses of non-coding sequences have shown that regulatory consistent with breast cancer samples analyzed in previous elements associated with DHSs have decreased somatic mutation genome-wide experiments16, 17. rates compared with the rest of the non-coding genome, suggesting that mutations in these regions have a driver role in – cancer11 13. Given the fact that local mutation density is Identifying DHSs significantly mutated in breast cancer. extremely variable across the genome14, 15, in this study we Several groups have previously shown that the local mutation developed a statistical method that takes into account the density is lower in regulatory elements that are active in the cell- influence of DNA sequence characteristics, replication timing, type of origin of the tumor compared to regulatory elements in and chromatin on local mutation rates15. We then applied a series other cell types11. This observation supports the hypothesis that of enrichment filters resulting in the identification of ten mutations in functional genomic regions are usually deleterious significantly mutated DHSs that are both associated with the and negatively selected. Therefore, increased local mutation aberrant expression of neighboring cancer genes and mutated in density relative to the expected density under a neutral model of two independent sets of breast tumors. A pan-cancer analysis evolution is an evidence of positive selection, suggesting that the showed that three of the regulatory elements are putative drivers mutations are functional and hence referred to as drivers20, 21. in multiple tumor types. We functionally characterized the four Based on these assumptions, we focused on the two breast cell DHSs most highly mutated in breast cancer with a combination lines in ENCODE22, the breast cancer cell line T47-D and a of in silico and experimental approaches, including CRISPR and human mammary fibroblast (HMF) line, and derived a list of animal models, and confirmed they are regulatory elements of DHSs. Of note, HMFs are not in the same lineage as the epithelial known cancer genes. cells that lead to ductal