Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905 Cancer Review Research

Mapping and Making Sense of Noncoding Mutations in the Genome Jiekun Yang and Mazhar Adli

Abstract

Whole-genome sequencing efforts of tumors and normal introduction to the world of noncoding mutations. tissues have identified numerous genetic mutations, both We discuss recent progress in identifying noncoding muta- somatic and germline, that do not overlap with coding tions and the analytic and experimental approaches uti- genomic sequences. Attributing a functional role to these lized to interpret their functional roles. We also highlight noncoding mutations and characterizing them using exper- the potential mechanisms by which a noncoding mutation imental methods has been more challenging compared may exert its effect and discuss future challenges and with coding mutations. In this review, we provide a brief opportunities.

Introduction to Germline and Somatic and cancer evolution is appreciated more and being intensely Variants studied. By whole-genome sequencing (WGS) of tens of thousands of No two humans are genetically identical. The genomes of two human tumors and matched controls (7–9), it becomes clear that unrelated individuals can be as much as 20 million base pairs both germline and somatic variants are not evenly distributed different from each other (or 0.6% of the total 3.2 billion base along the genome and they show similar patterns across critical pairs; ref. 1). We inherit these sequence variants from our parents. genomic elements at large scale. The genomic elements of concern They are present in all the cells of our body and are called germline usually include three types of open chromatin regions: RNA or variants. Certain germline variants have been linked to complex -coding body regions, promoters, and enhancers. The traits and disorders, including cancer (2). In addition to these latter two are major types of noncoding regulatory regions that inherited mutations, our cells also acquire novel mutations every control and fine tune the expression of . A promoter, located time they divide starting from a fertilized egg. Approximately near the transcription start sites (TSS) of a gene, is a region of DNA three mutations are estimated to occur when a normal human that initiates gene transcription, while an enhancer is a distal stem cell divides (3). These variants that are not inherited from a DNA-regulatory element that may be located as far as 1 Mb away parent and are usually not transmitted to offspring are called from the gene TSS. The enhancer elements regulate gene expres- somatic variants. These minor variations in the genomic sequence sion temporally and spatially, thus allowing cell type and tissue- may arise due to endogenous factors such as replication errors, specific . For germline variants, the protein-coding reactive oxygen species (ROS), aldehydes, mitotic errors, DNA genomic regions tend to accumulate far fewer variants than repair machinery deficiency, retrotransposons, or genome- intergenic regions, indicating stronger evolutionary conservation. modifying enzymes (4). In addition, environmental factors Interestingly, with the same standard applied, enhancer elements including chemicals, UV light, ionizing radiation, and viruses are more conserved than promoter elements (9). The spatial can also lead to genetic alterations in the somatic cells (4). As we distribution of somatic mutations in the cancer genome has age, somatic mutations accumulate in all our cells. A fraction of similar patterns. In human cancer cells, regions of open chromatin these mutations may positively contribute to cellular survival and exhibit decreased somatic mutation density compared with gene- proliferation. Thus, the cells harboring these mutations will be poor and heterochromatic genomic regions (10, 11). This selected within a tissue, which may end up acquiring the hall- decrease is likely due to the increased accessibility of these active marks of cancer (5). Because the somatic variation pool is large, no regions to the DNA repair machinery (12). It is notable that within two individuals' cancers are identical (6). As the understanding of the open chromatin regions, mutations tend to accumulate near how normal cells transform into cancer cells improves, the the transcription factor (TF)-binding sites such as at the promo- significance of germline and somatic variants in tumorigenesis ters (13). This may be caused by impaired DNA repair activity due to the physical presence of a TF (14). Alternatively, the TF-binding activity may directly create a local physical constraint on DNA, which may result in DNA strand breaks or exposure of ssDNA to Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia. APOBEC-like DNA-modifying enzymes (15). Various endoge- nous and exogenous factors such as replication errors, ROS, UV Corresponding Author: Mazhar Adli, University of Virginia, 1340 Jefferson Park light, and viruses contribute to the uneven variant Ave, Pinn Hall, Room: 6233, Charlottesville, VA 22902. Phone: 434-243-8567; – Fax: 434-924-5069; E-mail: [email protected] distribution (16 18), but not all the variants present are func- tional. From the millions of germline and somatic variants in the Cancer Res 2019;XX:XX–XX noncoding genomic region, the task of pinpointing a functional doi: 10.1158/0008-5472.CAN-19-0905 role for a specific noncoding variant and linking it to a biological 2019 American Association for Cancer Research. phenotype is not a trivial effort. Below, we will discuss some of the

www.aacrjournals.org OF1

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905

Yang and Adli

efforts to identify such variants and potential approaches that can moters of PLEKHS1 (6.5%), WDR74 (4.2%), SDHD (10.2%), be taken to study the impact of noncoding variants. KIAA0907 (10.2%), and YAE1D1 (9.3%), and at the 50 UTR of TBC1D12 (15.2%) in tumors of different tissue origins (26, 27). Identification of Noncoding Mutations in Recently, small genomic windows with higher mutation rates are observed in specific cancer types. For example, noncoding muta- the Genome tions near DHX34 are observed in 43% of diffuse large B-cell Although theoretically, sequence variants between any two lymphoma and 29% of lymphomas and 17% of liver cancers have cells from an individual could be determined without accurate noncoding mutations in the regulatory region of TUBBP5 (28). It single-cell sequencing technologies, at present, only clonally should be noted that because multiple enhancers can regulate a expanded somatic mutations, for example, in the cancer context, single gene, combining mutations within a set of regulatory are reliably detected (4). Among the detected variants, identifying regions would further increase detection power (12). For exam- functional ones is another level of challenge. To this end, genome- ple, noncoding somatic and germline variants in several regula- wide association study(GWAS) and cancer genomics studies tory elements show a combinatorial effect on ESR1 expression in þ started with the basic assumption that functional variants are ER breast cancer (29). disproportionately presented in individuals (germline variants) or cells (somatic variants) with the trait of interest compared with Understanding Functional Roles of their control counterparts. As of February 2019, the NHGRI-EBI Catalog of published GWAS (https://www.ebi.ac.uk/gwas/) con- Noncoding Mutations tains 3,764 publications and 107,785 unique germline SNP that In the context of tumor evolution, recurrence of a genetic are associated with various biological traits, among which 5,256 alteration indicates that the particular mutation selectively con- are for cancer (19). It is notable that the vast majority of these tributes to cellular survival and proliferation. The real challenge is SNPs (94%) are within the noncoding genomic regions. GWASs to separate positively selected causal or driver variants from have been facilitated by the development of relatively inexpen- neutral passenger ones and mildly deleterious ones, and identify sive SNP arrays, which could be easily imputed to a large fully a functional role and the mechanism of action of such recurrent sequenced reference panel exploiting linkage disequilibri- noncoding mutations. um (20). However, cancer genomes rely on the identification In this respect, one proven approach is to integrate noncoding of somatic mutations through DNA sequencing. Currently, mutational data with other genomic data. Genomic, transcrip- approximately 6 million coding mutations have been identified tomic and epigenomic assays boost our understanding of the for approximately 1.5% of the genome, compared with 20 noncoding genome and its regulation of the coding genomic million noncoding mutations for approximately 98.5% of the regions. Taking advantage of the high-throughput assays, large- genome based on the Catalogue Of Somatic Mutations In scale projects such as the Encyclopedia of DNA Elements Cancerv86(COSMIC,https://cancer.sanger.ac.uk;ref.21).This (ENCODE; ref. 30) and Roadmap Epigenomics (31), revealed is largely due to the fact that a large fraction of cancer genome that up to 80% of the noncoding genome has regulatory func- sequencing efforts focused on exomes rather than whole tions. Thus noncoding mutations are highly likely to reside in gnomes due to the cost of sequencing (2). An increasing regulatory regions, which may alter target gene expression by number of studies are now performing WGS. The most recent affecting regulatory mechanisms. For instance, by linking recur- Pan-Cancer Analysis of Whole Genomes (PCAWG) effort has rently mutated loci to putative target genes, Zhang and collea- collected and systematically analyzed >2,500 whole cancer gues (28) performed somatic eQTL (expression quantitative trait genomes (22). These efforts identified approximately 23 mil- loci) analysis using 930 tumor whole genomes and matched lion somatic mutations (https://dcc.icgc.org/), among which, transcriptomes. This approach determined 193 somatic eQTLs, 96.6% of the mutations reside in noncoding regions. 107 of which were validated in a second cohort from the Inter- Among the millions of noncoding mutations in cancer, few are national Cancer Genome Consortium. Importantly, the somatic observed at significantly higher frequencies in tumors compared eQTL network was disrupted in 88% of tumors, suggesting a with normal tissues, indicating a "driver" functional role in widespread impact of noncoding mutations in cancer. More tumorigenesis. The well-known examples are the two mutations recently, TCGA profiled the chromatin accessibility landscape within the promoter of the telomerase gene (TERT), which are of 410 primary tumor samples using ATAC-seq (32). Integration observed in approximately 65%–75% of melanomas (23, 24). of chromatin accessibility with WGS data enables allele-specific These mutations were first discovered due to their frequencies and ATAC-seq analysis. It was discovered that a novel single-base cosegregation patterns in cancer-prone families, which enabled mutation located 12-kb upstream of the FGD4 gene generates detection in approximately 100 sporadic cancer cases or a large a putative de novo–binding site for an NKX TF and thus increases pedigree with 14 patients with melanoma (23, 24). Later in the the chromatin accessibility and FGD4 gene expression in PCAWG and other studies, the TERT promoter was observed bladder cancer (32). In addition to allele-specific chromatin significantly mutated in tumor types other than melanoma, that accessibility patterns, researchers have noticed genetic variation is, bladder, head and neck, liver and thyroid cancer and glioblas- affecting the methylation state of neighboring cytosines in cis for a toma (GBM) and medulloblastoma (22, 25). In addition to the decade, especially at enhancers (33, 34). Recent allele-specific TERT promoter mutations, other noncoding recurrent mutations methylation (ASM) analysis from 49 whole-genome bisulfite- have been reported albeit at lower frequencies. Certain regulatory sequenced methylomes (33) revealed extensive sequence- elements show significant accumulation of somatic mutations dependent CpG methylation imbalances at thousands of hetero- within a small genomic window (26, 27). When proximal non- zygous regulatory loci. These ASM sites contain a large number of coding mutations are grouped (within 50 bp of each other), GWAS hits and rare variants (allele frequency < 1%), which recurrent cancer-associated mutations are observed at the pro- suggests ASM-mediated functional importance of the noncoding

OF2 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905

Making Sense of Noncoding Mutations

variants (33). These large-scale studies together with a large cancer cell line) results in decreased TF binding, increased chro- number of publicly available genomic and epigenomic datasets, matin loop formation, and increased expression of multiple distal tools such as OncoCis (35), Funseq2 (36), RegulomeDB (37), genes. Such long-range chromatin features cannot be studied by and GAWVA (38) should greatly help researchers to prioritize reporter assays. Integrating CRISPR-based manipulations with research on noncoding mutations and infer their cis-regulatory single-cell technologies may enable us to characterize a large potential in silico. number of noncoding mutations in parallel. Recently Gasperini Another approach of identifying a functional role of a non- and colleagues (44) combined CRISPR/Cas9-mediated perturba- coding mutation is to study mutational processes and cancer- tions and single-cell RNA-seq to perturb 5,920 candidate enhan- associated mutational signatures. The analysis of cancer-type cers and link them to specific genes. Such efforts of combining specific WGS data identified unique sets of mutational signatures CRISPR perturbations with single-cell RNA-seq, ATAC-seq (45), that inform about the potentially dominant mutational processes methylation profiling (46), or Hi-C (47) will expand our ability to in those tumors (17, 18, 39). Although the mechanisms under- better interrogate functional roles of various noncoding elements lying a majority of the mutational signatures are yet to be and their embedded mutations in the genome. identified (17, 18), for some cancers, the mutational processes driving the signatures are relatively well understood. For example, Potential Mechanisms of Action of melanomas are associated with increased C>T mutations at CCN or TCN trinucleotides due to UV-light exposure, lung cancer Noncoding Mutations shows transcriptional strand bias for C>A mutations due to A mutation in the noncoding part of the genome may exert a tobacco smoking, and breast cancer is characterized by C>T and biological function through multiple mechanisms (Fig. 1). It C>G mutations at TCN trinucleotides, indicating aberrant activity may affect TF-binding activity, alter epigenetic states, alter of members of the APOBEC family of cytidine deaminases (17). chromatin topology, change copy number or genomic arrange- The noncoding mutations associated with a certain mutational ments of the regulatory element (structural variation), or affect process or signature may arise due to overall regulatory activity in ncRNAs. It should be noted that these mechanisms are not the genome and confer a combined effect. Through integration of independent of each other. A given noncoding mutation may þ 560 WGS and >20 ER ChIP-seq data from primary ER breast interrupt multiple regulatory mechanisms simultaneously. For tumors, Yang and colleagues (15) recently discovered that a large example, the noncoding mutations at the TERT promoter fraction of noncoding somatic mutations are confined to ER- generate de novo consensus binding motifs for E-twenty-six binding sites. Notably, the highly mutated ER-binding sites are (ETS) TFs. In reporter assays, the mutations increased transcrip- associated with more frequent chromatin loop contacts (ChIA- tional activity from the TERT promoter by 2- to 4-fold (23, 24). PET and Hi-C data), and the associated distal genes are expressed Inspired by this finding, Weinhold and colleagues (26) at a higher level. Recently, Hung and colleagues (40) revealed that screened regulatory regions for mutations that either created indels with the mismatch repair (MMR) signature led to allele- or disrupted ETS-binding sites. Among all the significant biased enhancer activation and selection of enhancers in micro- regions they identified, SDHD, which encodes subunit D of satellite instable colorectal cancer. These studies highlight that the succinate dehydrogenase complex, showed recurrent muta- noncoding mutations induced by a certain mutational process tions damaging existing ETS-binding sites. In patients with may function collectively. Thus, they may individually be cate- melanoma, the mutations are associated with significantly gorized as passenger mutations (sequence variants that do not reduced SDHD expression and patients with SDHD mutations contribute to cancer growth), but the combined effect of multiple had a significantly shorter overall survival (26). such noncoding mutations can be a significant contributor to Somatic mutations may affect gene expression by altering local tumor evolution and cancer pathogenesis. epigenetic states and creating de novo regulatory elements (48, 49). The ultimate test for the functional role of a noncoding muta- Mansour and colleagues (48) demonstrate that noncoding indels tion is experimental validation. To this end, high-throughput at 7.5-kb upstream of TAL1, an oncogene associated with T-cell experimental methods capable of testing multiple noncoding acute lymphoblastic leukemia (T-ALL), create de novo binding mutations in parallel are greatly needed to explore the function- motifs for the MYB TF, which then recruits additional epigenetic ality of numerous noncoding mutations. Historically, reporter modifiers and create a super enhancer to drive TAL1 overexpres- assays have been extensively used to assess the effects of a sion in T-ALL (48). Single nucleotide substitutions may also alter mutation in promoters or a known enhancer (2). However, this the epigenetic state of a regulatory element. An example of this method does not capture the endogenous genomic features such mechanism is the CpG island methylator phenotype associated as epigenetic states and 3D structures and for many with IDH mutations (42). The hypermethylation at cohesin and enhancers, the gene target is not known. The advances in the CCCTC-binding factor (CTCF)-binding sites alters CTCF-binding CRISPR-based tools are expected to provide great flexibility and activities. The reduced CTCF binding results in loss of insulation robustness to interrogate the regulatory impacts of noncoding between topological domains and aberrant gene activation. In mutations in an endogenous genomic context (15, 40–43). theory, noncoding point mutations such as C to T conversions CRISPR tools can be used to perform gene expression regulation, within the CTCF motif may show a similar long-range effect on epigenome editing, live-cell chromatin imaging, and manipula- the chromatin. Multiple cancer types accumulate somatic muta- tion of chromatin topology (41). Moreover, with WT CRISPR tions at CTCF/cohesin-binding sites (50), specifically at bound- Cas9 or CRISPR base editors, one can engineer cell lines with a aries of constitutive CTCF–CTCF loops (51). In T-ALL, Hnisz and specific mutation and study its functional role. We recently used colleagues (51) identified six recurrent short deletions (<500 kb) the CRISPR base editor (15) to specifically introduce a cancer- overlapped at least one boundary of insulated neighborhoods, associated recurrent C to T conversion at the noncoding region of which contain T-ALL pathogenesis genes including TAL1 and þ ZNF143. The single change in MCF-7 cells (an ER breast LMO2. Deletion of the TAL1 and LMO2 neighborhood boundary

www.aacrjournals.org Cancer Res; 2019 OF3

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905

Yang and Adli

TF Cohesin Enhancer mRNA

CTCF Promoter Gene

XX XX XX

XX XX XX Create a de novo enhancer Create a de novo TF binding Alter TAD structure Alter TF binding

© 2018 American Association for Cancer Research

Figure 1. Potential mechanisms of action of noncoding mutations. A noncoding mutation may affect target gene expression by creating a de novo TF-binding site, which may result in a de novo enhancer depending on the epigenomic context. A noncoding mutation may also alter a topologically associating domain (TAD), thus disrupting enhancer-promoter interaction, or TF binding by changing the corresponding motif. Different types of molecules and genomic features are color- coded. Noncoding mutations are represented using red crosses.

disrupts insulation and allows activation of TAL1 and LMO2 by PCAT1 by increasing binding of a novel AR-interacting TF at a regulatory elements outside of the DNA loop (51). distal enhancer that loops to the PCAT1 promoter (58). Finally, a Noncoding mutations may also result in functional structural noncoding mutation could also confer its effect by modulating variations. A recurrent somatic chromosomal duplication in a ncRNA or mRNA synthesis and decay by affecting alternative regulatory element 1.47 Mb distant from MYC form a NOTCH1- splicing, polyadenylation, and promoter usage. Genome is reg- regulated enhancer that then alters the MYC promoter activity in ulated through various mechanisms and genetic sequence is T-ALL (52). Similarly, copy number gains in noncoding regions arguably the most essential layer of regulation. The abovemen- harboring super-enhancers near KLF5, USP12, PARD6B, and MYC tioned mechanisms are far from being the most exhaustive list of are associated with overexpression of these cancer-related genes in all potential mechanisms of how a noncoding variant may exert various cancer types (53, 54). Noncoding mutations may also its function. derive chemoresistance in cancer. A somatically amplified andro- gen receptor (AR) enhancer located 650-kb upstream to the AR drives therapeutic resistance in advanced prostate cancer (43). Discussion and Future Directions Besides copy number changes of regulatory elements, somatic Noncoding mutations are widespread in the genome. Identi- structural rearrangements may bring the target gene adjacent to a fying them and characterizing their function is not a trivial regulatory element directly. One example would be the rearran- challenge. To this end, utilization of the WGS technology and gements that juxtapose GFI1 or GFI1B coding sequences proximal integration of mutational data with transcriptomic and epige- to active enhancer elements, which instigate oncogenic activity in nomic maps have been a proven path to not only identify medulloblastoma (55). noncoding mutations but also attribute them with potential Finally, noncoding mutations may exert biological functions biological functions. Various studies have determined functional by altering the DNA sequence of ncRNAs including miRNAs, noncoding mutations with low frequency by integrating muta- (lncRNAs; which are >200 nucleotides), and transcribed pseudo- tions with transcriptomes and epigenomes (15, 28, 32, 40). These genes (2). ncRNAs act through various mechanisms to modulate successes encourage comprehensive characterization of tumors gene expression in normal and malignant settings. On the basis of using a compendium of high-throughput assays. To this end, a comprehensive lncRNA profile, 3,900 lncRNAs overlap disease- increasing the number of WGS datasets from various tissues and associated SNPs (56), which may confer effects by disrupting developing more sophisticated computational methods to map ncRNA structures and thus their binding to target , DNA, and call variants accurately (22) will enhance our ability to and RNA molecules (2). Some of these disease-associated variants identify additional functional noncoding mutations. In line with may directly alter the expression levels of lncRNAs. For instance, a this, obtaining WGS data from distinct ethnic groups is necessary prostate cancer risk-associated variant that maps to the promoter to identify race-specific coding and noncoding mutations and of a short isoform of lncRNA PCAT19 (PCAT19-short), which is in understand their contribution to the racial disparities in the third of the long isoform (PCAT19-long) mediate cancer (59). promoter-enhancer switch by altering TF-binding activity (57). Noncoding mutations may contribute to tissue homeostasis, Therefore, the risk variant is associated with decreased and tumorigenesis, and evolution in a different way than coding increased levels of PCAT19-short and PCAT19-long, respective- mutations. Instead of each having a significant effect, noncoding ly (57). Similarly, another variant results in upregulation of mutations may collectively function to confer a survival/

OF4 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905

Making Sense of Noncoding Mutations

proliferative benefit to cells. The noncoding mutations at ER- regulatory interactions in a high-throughput fashion. These data- binding sites in breast cancer and with the MMR signature in sets will be valuable resources for the development of new colorectal cancer support this mechanism. These noncoding algorithms, which can predict noncoding mutation functions mutations may individually act as a "mini driver" in cancer (60). more accurately. Multiple "mini drivers" together confer a larger effect on tumor- By systematically integrating transcriptomic, epigenomic, and igenesis and evolution (60). Although these "mini drivers" may genome structural data with WGS mutations, more cancer- show great intertumor heterogeneity due to their low frequencies, associated noncoding mutations are expected to emerge. At the they may be induced by the same mutational processes. However, same time, this integration reflects the essence of precision med- without a deeper understanding of the mutational processes and icine. Once the noncoding mutations are robustly mapped and signatures in normal tissues, we are not able to determine if these their mechanisms of action are understood, the next phase would "mini drivers" truly contribute to tumorigenesis in vivo. Interpre- be to develop potential therapeutic approaches to target the driver tation of coding driver mutations is facing the same problem. For noncoding mutations in cancer and other diseases. Such efforts example, the prevalence of known driver mutations and muta- are already ongoing to target the mutant TERT promoter. Because tional signatures are similar between aged normal and cancer the TERT mutations create a de novo TF-binding site, inhibiting the tissues for blood and skin (4). This suggests that, at least in these TFs that bind to these mutant sites can potentially rectify the cancers, other mechanisms exist in addition to the so-called aberrant role of these mutations (63, 64). Similar efforts on other "driver mutations" that transform normal cells. It is plausible driver noncoding mutations will likely improve precision med- that a collection of noncoding mutations can be one of the icine and tailored patient care in the future. mechanisms in this process. The efforts such as The PreCancer Atlas (PCA) initiative (61) will broaden our understanding of the Disclosure of Potential Conflicts of Interest mutation processes in normal tissues and will likely enable a No potential conflicts of interest were disclosed. better understanding of the driver coding and noncoding muta- tions that contribute to tumorigenesis and evolution. Similarly, experimental assays taking advantage of the newest technologies Acknowledgments This research is supported by National Science Foundation award (MCB are also part of the PCA initiative to facilitate noncoding mutation 1715183 to M. Adli), University of Virginia Cancer Center Team Science Award studies. Current cutting-edge frameworks such as CRISPR tech- (to M. Adli), and Trainee Award (to J. Yang). nologies, massively parallel reporter assays (62), and mapping gene regulation via single-cell–level genetic screens (44) aid our Received March 18, 2019; revised April 30, 2019; accepted May 21, 2019; validation of mutation functions and location of important cis- published first August 7, 2019.

References 1. The 1000 Genomes Project Consortium. A global reference for human 14. Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, Lopez-Bigas N. genetic variation. Nature 2015;526:68–74. Nucleotide excision repair is impaired by binding of transcription factors 2. Khurana E, Fu Y, Chakravarty D, Demichelis F, Rubin MA, Gerstein M. Role to DNA. Nature 2016;532:264–7. of non-coding sequence variants in cancer. Nat Rev Genet 2016;17: 15. Yang J, Wei X, Tufan T, Kuscu C, Unlu H, Farooq S, et al. Recurrent 93–108. mutations at estrogen receptor binding sites alter chromatin topology 3. Tomasetti C, Li L, Vogelstein B. Stem cell divisions, somatic mutations, and distal gene expression in breast cancer. Genome Biol 2018;19: cancer etiology, and cancer prevention. Science 2017;355:1330–4. 190. 4. Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. 16. Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, et al. Science 2015;349:1483–9. Genome-wide patterns and properties of de novo mutations in humans. 5. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell Nat Genet 2015;47:822–6. 2011;144:646–74. 17. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin 6. Melton C, Reuter JA, Spacek DV, Snyder M. Recurrent somatic mutations AV, et al. Signatures of mutational processes in human cancer. Nature in regulatory regions of human cancer genomes. Nat Genet 2015;47: 2013;500:415–21. 710–6. 18. Alexandrov L, Kim J, Haradhvala NJ, Huang MN, Ng AWT, Boot A, et al. The 7. The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, repertoire of mutational signatures in human cancer. bioRxiv 2018; Mills GB, Shaw KRM, Ozenberger BA, et al. The Cancer Genome Atlas Pan- 322859. doi: https://doi.org/10.1101/322859. Cancer analysis project. Nat Genet 2013;45:1113–20. 19. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, 8. The International Cancer Genome Consortium. International network of et al. The NHGRI-EBI GWAS Catalog of published genome-wide associ- cancer genome projects. Nature 2010;464:993–8. ation studies, targeted arrays and summary statistics 2019. Nucleic Acids 9. Telenti A, Pierce LCT, Biggs WH, di Iulio J, Wong EHM, Fabani MM, et al. Res 2019;47:D1005–12. Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A 20. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 2016;113:11901–6. 10 years of GWAS discovery: biology, function, and translation. Am J Hum 10. Polak P, Lawrence MS, Haugen E, Stoletzki N, Stojanov P, Thurman RE, Genet 2017;101:5–22. et al. Reduced local mutation density in regulatory DNA of cancer genomes 21. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. is linked to DNA repair. Nat Biotech 2014;32:71–5. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 11. Schuster-Bockler€ B, Lehner B. Chromatin organization is a major 2019;47:D941–7. influence on regional mutation rates in human cancer cells. Nature 22.RheinbayE,NielsenMM,AbascalF,TiaoG,HornshøjH,HessJM, 2012;488:504–7. et al. Discovery and characterization of coding and non-coding driver 12. Cuykendall TN, Rubin MA, Khurana E. Non-coding genetic variation in mutations in more than 2,500 whole cancer genomes. bioRxiv 2017; cancer. Curr Opin Syst Biol 2017;1:9–15. 237313. 13. Perera D, Poulos RC, Shah A, Beck D, Pimanda JE, Wong JWH. Differential 23. Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA. Highly DNA repair underlies mutation hotspots at active promoters in cancer recurrent TERT promoter mutations in human melanoma. Science 2013; genomes. Nature 2016;532:259–63. 339:957–9.

www.aacrjournals.org Cancer Res; 2019 OF5

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905

Yang and Adli

24. Horn S, Figl A, Rachakonda PS, Fischer C, Sucker A, Gast A, et al. TERT 45. Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, promoter mutations in familial and sporadic melanoma. Science 2013; et al. Multiplex single-cell profiling of chromatin accessibility by combi- 339:959–61. natorial cellular indexing. Science 2015;348:910–4. 25. Fredriksson NJ, Ny L, Nilsson JA, Larsson E. Systematic analysis of non- 46. Mulqueen RM, Pokholok D, Norberg SJ, Torkenczy KA, Fields AJ, Sun D, coding somatic mutations and gene expression alterations across 14 tumor et al. Highly scalable generation of DNA methylation profiles in single types. Nat Genet 2014;46:1258–63. cells. Nat Biotechnol 2018;36:428–31. 26. Weinhold N, Jacobsen A, Schultz N, Sander C, Lee W. Genome-wide 47. Ramani V, Deng X, Qiu R, Gunderson KL, Steemers FJ, Disteche CM, analysis of noncoding regulatory mutations in cancer. Nat Genet 2014; et al. Massively multiplex single-cell Hi-C. Nat Methods 2017;14: 46:1160–5. 263–6. 27. Araya CL, Cenik C, Reuter JA, Kiss G, Pande VS, Snyder MP, et al. Iden- 48. Mansour MR, Abraham BJ, Anders L, Berezovskaya A, Gutierrez A, Durbin tification of significantly mutated regions across cancer types highlights a AD, et al. An oncogenic super-enhancer formed through somatic mutation rich landscape of functional molecular alterations. Nat Genet 2016;48: of a noncoding intergenic element. Science 2014;346:1373–7. 117. 49. Puente XS, Bea S, Valdes-Mas R, Villamor N, Gutierrez-Abril J, Martín- 28. Zhang W, Bojorquez-Gomez A, Velez DO, Xu G, Sanchez KS, Shen JP, et al. Subero JI, et al. Non-coding recurrent mutations in chronic lymphocytic A global transcriptional network connecting noncoding mutations to leukaemia. Nature 2015;526:519–24. changes in tumor gene expression. Nat Genet 2018;50:613. 50. Katainen R, Dave K, Pitk€anen E, Palin K, Kivioja T, V€alim€aki N, et al. CTCF/ 29. Bailey SD, Desai K, Kron KJ, Mazrooei P, Sinnott-Armstrong NA, Treloar cohesin-binding sites are frequently mutated in cancer. Nat Genet 2015;47: AE, et al. Noncoding somatic and inherited single-nucleotide variants 818–21. converge to promote ESR1 expression in breast cancer. Nat Genet 2016;48: 51. Hnisz D, Weintraub AS, Day DS, Valton A-L, Bak RO, Li CH, et al. Activation 1260–6. of proto-oncogenes by disruption of chromosome neighborhoods. Science 30. The Encode Project Consortium. An integrated encyclopedia of DNA 2016;351:1454–8. elements in the . Nature 2012;489:57–74. 52. Herranz D, Ambesi-Impiombato A, Palomero T, Schnell SA, Belver L, 31. Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Wendorff AA, et al. A NOTCH1-driven MYC enhancer promotes T cell Bilenky M, Yen A, et al. Integrative analysis of 111 reference human development, transformation and acute lymphoblastic leukemia. Nat Med epigenomes. Nature 2015;518:317–30. 2014;20:1130–7. 32. Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. The 53. Zhang X, Choi PS, Francis JM, Gao GF, Campbell JD, Ramachandran A, chromatin accessibility landscape of primary human cancers. Science et al. Somatic superenhancer duplications and hotspot mutations lead to 2018;362:eaav1898. oncogenic activation of the KLF5 transcription factor. Cancer Discov 2018; 33. Onuchic V, Lurie E, Carrero I, Pawliczek P, Patel RY, Rozowsky J, et al. 8:108–25. Allele-specific epigenome maps reveal sequence-dependent stochastic 54. Zhang X, Choi PS, Francis JM, Imielinski M, Watanabe H, Cherniack AD, switching at regulatory loci. Science 2018;361:eaar3146. et al. Identification of focally amplified lineage-specific super-enhancers in 34. Cheung WA, Shao X, Morin A, Siroux V, Kwan T, Ge B, et al. Functional human epithelial cancers. Nat Genet 2016;48:176–82. variation in allelic methylomes underscores a strong genetic contribution 55. Northcott PA, Lee C, Zichner T, Stutz€ AM, Erkek S, Kawauchi D, et al. and reveals novel epigenetic alterations in the human epigenome. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Genome Biol 2017;18:50. Nature 2014;511:428–34. 35. Perera D, Chacon D, Thoms JA, Poulos RC, Shlien A, Beck D, et al. OncoCis: 56. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, et al. The annotation of cis-regulatory mutations in cancer. Genome Biol 2014;15: landscape of long noncoding RNAs in the human transcriptome. 485. Nat Genet 2015;47:199–208. 36. Fu Y, Liu Z, Lou S, Bedford J, Mu XJ, Yip KY, et al. FunSeq2: a framework for 57. Hua JT, Ahmed M, Guo H, Zhang Y, Chen S, Soares F, et al. Risk SNP- prioritizing noncoding regulatory variants in cancer. Genome Biol 2014; mediated promoter-enhancer switching drives prostate cancer through 15:480. lncRNA PCAT19. Cell 2018;174:564–75. 37. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. 58. Guo H, Ahmed M, Zhang F, Yao CQ, Li S, Liang Y, et al. Modulation of long Annotation of functional variation in personal genomes using Regulo- noncoding RNAs by risk SNPs underlying genetic predispositions to meDB. Genome Res 2012;22:1790–7. prostate cancer. Nat Genet 2016;48:1142–50. 38. Ritchie GRS, Dunham I, Zeggini E, Flicek P. Functional annotation of 59. Jaratlerdsiri W, Chan EKF, Gong T, Petersen DC, Kalsbeek AMF, Venter PA, noncoding sequence variants. Nat Methods 2014;11:294–6. et al. Whole-genome sequencing reveals elevated tumor mutational bur- 39. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine den and initiating driver mutations in African men with treatment-na€ve, K, et al. Mutational processes molding the genomes of 21 breast cancers. high-risk prostate cancer. Cancer Res 2018;78:6736–46. Cell 2012;149:979–93. 60. Castro-Giner F, Ratcliffe P, Tomlinson I. The mini-driver model of poly- 40. Hung S, Saiakhova A, Faber ZJ, Bartels CF, Neu D, Bayles I, et al. Mismatch genic cancer evolution. Nat Rev Cancer 2015;15:680–5. repair-signature mutations activate gene enhancers across human colorec- 61. Srivastava S, Ghosh S, Kagan J, Mazurchuk R, Boja E, Chuaqui R, et al. The tal cancer epigenomes. eLife 2019;8:e40760. making of a PreCancer atlas: promises, challenges, and opportunities. 41. Adli M. The CRISPR tool kit for genome editing and beyond. Nat Commun Trends Cancer 2018;4:523–36. 2018;9:1911. 62. Kircher M, Xiong C, Martin B, Schubach M, Inoue F, Bell RJ, et al. Saturation 42. Flavahan WA, Drier Y, Liau BB, Gillespie SM, Venteicher AS, Stemmer- mutagenesis of disease-associated regulatory elements. bioRxiv 2018; Rachamimov AO, et al. Insulator dysfunction and oncogene activation in 505362. IDH mutant gliomas. Nature 2016;529:110–4. 63. Bell RJA, Rube HT, Kreig A, Mancini A, Fouse SD, Nagarajan RP, et al. The 43. Takeda DY, Spisak S, Seo J-H, Bell C, O'Connor E, Korthauer K, et al. A transcription factor GABP selectively binds and activates the mutant TERT somatically acquired enhancer of the androgen receptor is a noncoding promoter in cancer. Science 2015;348:1036–9. driver in advanced prostate cancer. Cell 2018;174:422–32. 64. Mancini A, Xavier-Magalh~aes A, Woods WS, Nguyen K-T, Amen AM, Hayes 44. Gasperini M, Hill AJ, McFaline-Figueroa JL, Martin B, Kim S, Zhang MD, JL, et al. Disruption of the b1L isoform of GABP reverses glioblastoma et al. A genome-wide framework for mapping gene regulation via cellular replicative immortality in a TERT promoter mutation-dependent manner. genetic screens. Cell 2019;176:377–90. Cancer Cell 2018;34:513–28.

OF6 Cancer Res; 2019 Cancer Research

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905

Mapping and Making Sense of Noncoding Mutations in the Genome

Jiekun Yang and Mazhar Adli

Cancer Res Published OnlineFirst August 6, 2019.

Updated version Access the most recent version of this article at: doi:10.1158/0008-5472.CAN-19-0905

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/early/2019/08/07/0008-5472.CAN-19-0905. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research.