Mutational Signatures: Emerging Concepts, Caveats and Clinical Applications
Total Page:16
File Type:pdf, Size:1020Kb
REVIEWS Mutational signatures: emerging concepts, caveats and clinical applications Gene Koh 1,2, Andrea Degasperi 1,2, Xueqing Zou1,2, Sophie Momen1,2 and Serena Nik-Zainal 1,2 ✉ Abstract | Whole-genome sequencing has brought the cancer genomics community into new territory. Thanks to the sheer power provided by the thousands of mutations present in each patient’s cancer, we have been able to discern generic patterns of mutations, termed ‘mutational signatures’, that arise during tumorigenesis. These mutational signatures provide new insights into the causes of individual cancers, revealing both endogenous and exogenous factors that have influenced cancer development. This Review brings readers up to date in a field that is expanding in computational, experimental and clinical directions. We focus on recent conceptual advances, underscoring some of the caveats associated with using the mutational signature frameworks and highlighting the latest experimental insights. We conclude by bringing attention to areas that are likely to see advancements in clinical applications. The concept of mutational signatures was introduced or drug therapies, in an attempt to decipher causes7–9. in 2012 following the demonstration that analy However, the origins of many signatures remain cryp sis of all substitution mutations in a set of 21 whole tic. Furthermore, while earlier analyses reported single genomesequenced (WGS) breast cancers could reveal signatures, for example of UV radiation (that is, SBS7)2, consistent patterns of mutagenesis across tumours1 more recent studies reported multiple versions of these (FIG. 1a). These patterns were the physiological imprints signatures (that is, SBS7a, SBS7b, SBS7c and SBS7d)3, of DNA damage and repair processes that had occurred leading the community to question whether some find during tumorigenesis and could distinguish BRCA1null ings are reflective of biology or are simply mathemat and BRCA2null tumours from sporadic breast cancers. ical artefacts. Efforts to experimentally validate these Subsequently, a landmark study applied this principle abstract mathematical results are therefore warranted. on ~500 WGS and ~6,500 wholeexomesequenced Nonexpert users of mutational signatures require tumours across 30 cancer types and revealed 21 distinct insights into practical issues and caveats in the use of singlebase substitution mutational signatures (SBSs)2. signature analysis frameworks. For clinicians, using Recently, an updated analysis of ~4,600 WGS and mutational signatures reliably for clinical stratification ~19,000 wholeexomesequenced samples raised the is critical. number of known SBSs to 49 (REF.3). Further complexity, This Review appraises topical developments in the including possible tissue specificities for some mutational field of mutational signatures. We do not address signatures, has also been demonstrated4. the pros and cons of various mathematical models, Today, mutational signature analyses have become a which have been reviewed elsewhere10,11; rather, we seek standard component of genomic studies because they to emphasize biological principles, highlight recent can reveal environmental and endogenous sources of experimental work and discuss developments towards 1Department of Medical mutagenesis in each tumour. Indeed, this nascent field clinical applications. Genetics, School of Clinical is gaining prominence and heading towards being used Medicine, University of Mutational signatures: current knowledge Cambridge, Cambridge, UK. in a clinically meaningful way. While these are positive trends, it is appropriate to In this section, we bring readers up to date with muta 2MRC Cancer Unit, School of Clinical Medicine, University ask whether there are limitations to this substantially tional signatures, focusing on signatures of different of Cambridge, Cambridge, UK. broadening field. As an increasing number of signa classes. In trying to identify mutational signatures in a 3–6 ✉e-mail: [email protected] tures of different mutation classes are being reported data set, a ‘global’ approach can be adopted, where sig https://doi.org/10.1038/ (Fig. 1), correlations are being drawn between them and natures from all cancers, irrespective of the tissue type, s41568-021-00377-7 various factors, such as age and exposures to acid reflux are aggregated and averaged to derive a set of consensus NATURE REVIEWS | CANCER VOLUME 21 | OCTOBER 2021 | 619 0123456789();: REVIEWS a 21 breast cancer COSMIC mutational signatures • Experimental validation of gene-knockout signatures92 COSMIC mutational genomes: 5 SBSs1 v2: 30 SBSs (no publication) • 91 ovarian cancer genomes: 7 copy number signatures (36 channels)31 signatures v3.2: 53 SBSs 2012 2013 2015 2016 2018 2019 2020 2021 507 genomes + • Mutational strand asymmetries176 A compendium of • 4,645 genomes + 19,184 exomes 6,535 exomes: • 560 breast cancer genomes: mutational signatures (ICGC pan-cancer analysis; COSMIC 21 SBSs2 6 RSs (32 channels)24 of environmental mutational signatures v3): 49 SBSs , 11 • Topography34 mutagens13 DBSs (78 channels), 17 IDs (83 channels)3 • 2,559 genomes: 16 RSs (45 channels)5 • Tissue specificities in signatures4 b Collapsed representation Expanded representation SBS C>A C>G C>T T>A T>C T>G 100 47.5 SBS1 35.6 50 23.7 Frequency (%) Frequency (%) Frequency 11.9 0 0 T A A C G G TTT TTT > ATT TTA TTC > > > ATC ATC > > TCA ACT ACT CTC TCC TCG GTC ACA ACA ACC CCA GCA ACG GCA CCC CCG GCG T C T T C C TCT 6 channels ACT 96 channels ACC ACC DBS AC>NN AT>NN CC>NN CG>NN CT>NN GC>NN TA>NN TC>NN TG>NN TT>NN 80 60 DBS5 45 40 30 Frequency (%) Frequency Frequency (%) Frequency 15 0 0 TT AT GT TG GT AA AA AA CA AG AG NN NN NN NN NN NN NN NN NN NN GC > > > > > > > > > > AA AG AC GA AC AT CC CG CT GC TA TC TG TT 78 channels 10 channels >1-bp deletion at repeats >1-bp insertion at repeats Microhomology Small insertion/deletion signature 1-bp deletion 1-bp insertion (deletion length) (insertion length) (deletion length) 80 32 C T C T 2 3 4 5+ 2 3 4 5+ 2 3 4 5+ 24 ID6 40 16 Frequency (%) Frequency (%) Frequency 8 0 0 123 4 56+ 123 4 56+ 0 12 3 45+ 0 12 3 45+ 123 4 56+ 123 4 56+ 123 4 56+ 123 4 56+0 12 3 45+0 12 3 45+0 12 3 45+0 12 3 45+1 12 12 3 12 3 45+ 1-bp del.1-bp ins. Complex Homopolymer length Number of repeat units Microhomology length >1-bp>1-bp del. rep.>1-bp ins. rep. del. MH 6 channels 83 channels RS Clustered rearrangements Non-clustered rearrangements 100 Deletion TD Inversion Trans. Deletion TD Inversion Trans. 60 Breast RS3 50 30 Frequency (%) Frequency Frequency (%) Frequency 0 0 kb kb kb kb kb kb kb kb kb kb kb kb TD Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Mb Deletion Inversion >10 >10 >10 >10 >10 >10 1–10 1–10 1–10 1–10 1–10 1–10 kb–1 kb–1 kb–1 kb–1 kb–1 kb–1 1–10 1–10 1–10 1–10 1–10 1–10 Translocation 10–100 10–100 10–100 10–100 10–100 10–100 4 channels 100 100 100 32 channels 100 100 100 620 | OCTOBER 2021 | VOLUME 21 www.nature.com/nrc 0123456789();: REVIEWS ◀ Fig. 1 | Conceptual developments and visualization of mutational signatures. deficiencyassociated SBS26 and SBS44 (REF.3), and some a | Chronological account of how concepts in the field of mutational signatures have endogenous signatures are highly distinctive and easily 1–5,13,24,31,34,92,176 evolved over time . Key proof-of-concept experimental studies focusing discernible, such as the apolipoprotein B mRNAediting on human samples are also shown. b | With adequate power, the preferred method for enzyme, catalytic polypeptidelike (APOBEC)associated presenting single-base substitution mutational signatures (SBSs; for example, SBS1) is (REFS2,3,20) via the 96-channel method (see Supplementary Fig. 1 for fully labelled channels). It is also SBS2 and SBS13 . Rare mutational processes possible to expand the method to 1,536 channels (not shown) or to reduce it to only six that are present at low population frequencies may be channels. Double-base substitution signatures (DBSs; for example, DBS5) can be defined more challenging to extract and will reveal themselves by 78 strand-agnostic channels, or when the burden is low, by ten duplet motifs. Small only if they are present in the specific cohort examined insertion or deletion (indel) (less than 100 bp) signatures (for example, indel signature 6 (for example, treatmentassociated signatures). (ID6)) are broadly classified by type (that is, insertion, deletion or complex) and — when Our purpose in this Review is to focus on guiding of a single base — as C or T, and according to the length of the mononucleotide repeat principles and not to discuss all signatures individu tract where they occur. Longer indels are classified by whether they occur at repeats ally. This information is accessible from various online or have microhomology at indel junctions. With enough power, the motif sizes and resources, such as COSMIC or Signal, where exact nucleotides affected can also be considered. Rearrangement signatures (RSs; for contents may change over time. A reference set that is example, breast RS3) can be categorized on the basis of the four types of rearrangements and how they are regionally clustered and with further consideration of the size of the often used in analyses is the COSMIC v2 collection of rearranged fragment. Collapsed presentations may be necessary where the mutation 30 signatures. The COSMIC v3.1 collection, released in 3 burden is low (for example, in experimental systems). del., deletion; ins., insertion; 2020, brought the total number of signatures to 49 (REF.