Mapping and Making Sense of Noncoding Mutations in the Genome Jiekun Yang and Mazhar Adli
Total Page:16
File Type:pdf, Size:1020Kb
Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905 Cancer Review Research Mapping and Making Sense of Noncoding Mutations in the Genome Jiekun Yang and Mazhar Adli Abstract Whole-genome sequencing efforts of tumors and normal introduction to the world of noncoding mutations. tissues have identified numerous genetic mutations, both We discuss recent progress in identifying noncoding muta- somatic and germline, that do not overlap with coding tions and the analytic and experimental approaches uti- genomic sequences. Attributing a functional role to these lized to interpret their functional roles. We also highlight noncoding mutations and characterizing them using exper- the potential mechanisms by which a noncoding mutation imental methods has been more challenging compared may exert its effect and discuss future challenges and with coding mutations. In this review, we provide a brief opportunities. Introduction to Germline and Somatic and cancer evolution is appreciated more and being intensely Variants studied. By whole-genome sequencing (WGS) of tens of thousands of No two humans are genetically identical. The genomes of two human tumors and matched controls (7–9), it becomes clear that unrelated individuals can be as much as 20 million base pairs both germline and somatic variants are not evenly distributed different from each other (or 0.6% of the total 3.2 billion base along the genome and they show similar patterns across critical pairs; ref. 1). We inherit these sequence variants from our parents. genomic elements at large scale. The genomic elements of concern They are present in all the cells of our body and are called germline usually include three types of open chromatin regions: RNA or variants. Certain germline variants have been linked to complex protein-coding gene body regions, promoters, and enhancers. The traits and disorders, including cancer (2). In addition to these latter two are major types of noncoding regulatory regions that inherited mutations, our cells also acquire novel mutations every control and fine tune the expression of genes. A promoter, located time they divide starting from a fertilized egg. Approximately near the transcription start sites (TSS) of a gene, is a region of DNA three mutations are estimated to occur when a normal human that initiates gene transcription, while an enhancer is a distal stem cell divides (3). These variants that are not inherited from a DNA-regulatory element that may be located as far as 1 Mb away parent and are usually not transmitted to offspring are called from the gene TSS. The enhancer elements regulate gene expres- somatic variants. These minor variations in the genomic sequence sion temporally and spatially, thus allowing cell type and tissue- may arise due to endogenous factors such as replication errors, specific gene expression. For germline variants, the protein-coding reactive oxygen species (ROS), aldehydes, mitotic errors, DNA genomic regions tend to accumulate far fewer variants than repair machinery deficiency, retrotransposons, or genome- intergenic regions, indicating stronger evolutionary conservation. modifying enzymes (4). In addition, environmental factors Interestingly, with the same standard applied, enhancer elements including chemicals, UV light, ionizing radiation, and viruses are more conserved than promoter elements (9). The spatial can also lead to genetic alterations in the somatic cells (4). As we distribution of somatic mutations in the cancer genome has age, somatic mutations accumulate in all our cells. A fraction of similar patterns. In human cancer cells, regions of open chromatin these mutations may positively contribute to cellular survival and exhibit decreased somatic mutation density compared with gene- proliferation. Thus, the cells harboring these mutations will be poor and heterochromatic genomic regions (10, 11). This selected within a tissue, which may end up acquiring the hall- decrease is likely due to the increased accessibility of these active marks of cancer (5). Because the somatic variation pool is large, no regions to the DNA repair machinery (12). It is notable that within two individuals' cancers are identical (6). As the understanding of the open chromatin regions, mutations tend to accumulate near how normal cells transform into cancer cells improves, the the transcription factor (TF)-binding sites such as at the promo- significance of germline and somatic variants in tumorigenesis ters (13). This may be caused by impaired DNA repair activity due to the physical presence of a TF (14). Alternatively, the TF-binding activity may directly create a local physical constraint on DNA, which may result in DNA strand breaks or exposure of ssDNA to Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia. APOBEC-like DNA-modifying enzymes (15). Various endoge- nous and exogenous factors such as replication errors, ROS, UV Corresponding Author: Mazhar Adli, University of Virginia, 1340 Jefferson Park light, and viruses contribute to the uneven variant Ave, Pinn Hall, Room: 6233, Charlottesville, VA 22902. Phone: 434-243-8567; – Fax: 434-924-5069; E-mail: [email protected] distribution (16 18), but not all the variants present are func- tional. From the millions of germline and somatic variants in the Cancer Res 2019;XX:XX–XX noncoding genomic region, the task of pinpointing a functional doi: 10.1158/0008-5472.CAN-19-0905 role for a specific noncoding variant and linking it to a biological Ó2019 American Association for Cancer Research. phenotype is not a trivial effort. Below, we will discuss some of the www.aacrjournals.org OF1 Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2019 American Association for Cancer Research. Published OnlineFirst August 6, 2019; DOI: 10.1158/0008-5472.CAN-19-0905 Yang and Adli efforts to identify such variants and potential approaches that can moters of PLEKHS1 (6.5%), WDR74 (4.2%), SDHD (10.2%), be taken to study the impact of noncoding variants. KIAA0907 (10.2%), and YAE1D1 (9.3%), and at the 50 UTR of TBC1D12 (15.2%) in tumors of different tissue origins (26, 27). Identification of Noncoding Mutations in Recently, small genomic windows with higher mutation rates are observed in specific cancer types. For example, noncoding muta- the Genome tions near DHX34 are observed in 43% of diffuse large B-cell Although theoretically, sequence variants between any two lymphoma and 29% of lymphomas and 17% of liver cancers have cells from an individual could be determined without accurate noncoding mutations in the regulatory region of TUBBP5 (28). It single-cell sequencing technologies, at present, only clonally should be noted that because multiple enhancers can regulate a expanded somatic mutations, for example, in the cancer context, single gene, combining mutations within a set of regulatory are reliably detected (4). Among the detected variants, identifying regions would further increase detection power (12). For exam- functional ones is another level of challenge. To this end, genome- ple, noncoding somatic and germline variants in several regula- wide association study(GWAS) and cancer genomics studies tory elements show a combinatorial effect on ESR1 expression in þ started with the basic assumption that functional variants are ER breast cancer (29). disproportionately presented in individuals (germline variants) or cells (somatic variants) with the trait of interest compared with Understanding Functional Roles of their control counterparts. As of February 2019, the NHGRI-EBI Catalog of published GWAS (https://www.ebi.ac.uk/gwas/) con- Noncoding Mutations tains 3,764 publications and 107,785 unique germline SNP that In the context of tumor evolution, recurrence of a genetic are associated with various biological traits, among which 5,256 alteration indicates that the particular mutation selectively con- are for cancer (19). It is notable that the vast majority of these tributes to cellular survival and proliferation. The real challenge is SNPs (94%) are within the noncoding genomic regions. GWASs to separate positively selected causal or driver variants from have been facilitated by the development of relatively inexpen- neutral passenger ones and mildly deleterious ones, and identify sive SNP arrays, which could be easily imputed to a large fully a functional role and the mechanism of action of such recurrent sequenced reference panel exploiting linkage disequilibri- noncoding mutations. um (20). However, cancer genomes rely on the identification In this respect, one proven approach is to integrate noncoding of somatic mutations through DNA sequencing. Currently, mutational data with other genomic data. Genomic, transcrip- approximately 6 million coding mutations have been identified tomic and epigenomic assays boost our understanding of the for approximately 1.5% of the genome, compared with 20 noncoding genome and its regulation of the coding genomic million noncoding mutations for approximately 98.5% of the regions. Taking advantage of the high-throughput assays, large- genome based on the Catalogue Of Somatic Mutations In scale projects such as the Encyclopedia of DNA Elements Cancerv86(COSMIC,https://cancer.sanger.ac.uk;ref.21).This (ENCODE; ref. 30) and Roadmap Epigenomics (31), revealed is largely due to the fact that a large fraction of cancer genome that up to 80% of the noncoding genome has regulatory func- sequencing efforts focused on exomes rather than whole tions. Thus