bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Mutational signatures are jointly shaped by DNA damage and repair

1 2 2,3 2 Nadezda V Volkova* ,​ Bettina Meier* ,​ Víctor González-Huici* ,​ Simone Bertolini ,​ Santiago 1,3 ​ 4 ​ 4 ​ 4-6 ​ #2,7,8 Gonzalez ,​ Federico Abascal ,​ Iñigo Martincorena ,​ Peter J Campbell ,​ Anton Gartner ​ ​ ​ ​ ​ and Moritz Gerstung#1,9 ​

* these authors contributed equally # to whom correspondence should be addressed

1) European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK. 2) Centre for Gene Regulation and Expression, University of Dundee, Dundee, Scotland. 3) Current affiliation, Institute for Research in Biomedicine (IRB Barcelona), Parc Científic de Barcelona, Barcelona, Spain. 4) , Ageing and Somatic , Wellcome Sanger Institute, Hinxton, UK. 5) Department of Haematology, University of Cambridge, Cambridge, UK. 6) Department of Haematology, Addenbrooke’s Hospital, Cambridge, UK. 7) Center for Genomic Integrity, Institute for Basic Science, Ulsan, Republic of Korea ​ 8) Department of Biological Sciences, School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea 9) European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.

Correspondence:

Dr Anton Gartner Centre for Gene Regulation and Expression University of Dundee Dow Street Dundee, DD1 5EH Scotland. Tel: +44 (0) 1382 385809 E-mail: [email protected]

Dr Moritz Gerstung European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) Hinxton, CB10 1SA UK. Tel: +44 (0) 1223 494636 E-mail: [email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Highlights

● Combining exposure to DNA damaging agents and DNA repair deficiency in C. ​ elegans leads to altered mutation rates and new mutational signatures ​ ● Mutagenic effects of genotoxic exposures are generally exacerbated by DNA repair deficiency ● of UVB and alkylating agents is reduced in translesion synthesis polymerase deficiencies ● Human cancer genomes contain examples of DNA damage/repair interactions, but in DNA repair genes usually only associate with moderate mutator phenotypes, in line with evolutionary theory

2 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Summary

Mutations arise when DNA lesions escape DNA repair. To delineate the contributions of DNA damage and DNA repair deficiency to mutagenesis we sequenced 2,721 genomes of 54 C. elegans strains, each deficient for a specific DNA repair gene and wild-type, upon ​ exposure to 12 different genotoxins. Combining genotoxins and repair deficiency leads to differential mutation rates or new mutational signatures in more than one third of experiments. Translesion synthesis polymerase deficiencies show dramatic and diverging effects. Knockout of Polκ dramatically exacerbates the mutagenicity of alkylating agents; conversely, Polζ deficiency reduces alkylation- and UV-induced substitution rates. Examples of DNA damage-repair deficiency interactions are also found in cancer genomes, although cases of hypermutation are surprisingly rare despite signs of positive selection in a number of DNA repair genes. Nevertheless, cancer risk may be substantially elevated even by small increases in mutagenicity according to evolutionary multi-hit theory. Overall, our data underscore how mutagenesis is a joint product of DNA damage and DNA repair, implying that mutational signatures may be more variable than currently anticipated.

3 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Introduction

A cell’s DNA is constantly altered by a multitude of genotoxic stresses including environmental toxins and radiation, DNA replication errors and endogenous metabolites, all rendering the maintenance of the genome a titanic challenge. Organisms thus evolved an armamentarium of DNA repair processes to detect and mend DNA damage, and to eliminate or permanently halt the progression of genetically compromised cells. Nevertheless, some DNA lesions escape detection and repair or are mended by error prone pathways, leading to mutagenesis — the process that drives evolution but also inheritable disease and cancer.

The multifaceted nature of mutagenesis results in distinct mutational spectra, characterized by the specific distribution of single and multi-nucleotide variants (SNVs and MNVs), short insertions and deletions (), large structural variants (SVs), and copy number alterations. Studying the combined mutational patterns can yield insights into the nature of DNA damage and DNA repair processes. Indeed, large scale cancer genome and exome sequencing allowed to computationally deduce more than 50 mutational signatures of base substitutions (Alexandrov et al., 2018, 2013) in cancer and normal cells. Some of these ​ ​ signatures, generally deduced by computational pattern recognition, have evident associations with exposure to known such as UV light, smoke, the food contaminants aristolochic acid and aflatoxins (Alexandrov et al., 2016; Helleday et al., 2014; ​ Poon et al., 2013), or with DNA repair deficiency syndromes and compromised DNA ​ replication. The latter include defects in (HR), mismatch repair (MMR), or DNA polymerase epsilon (POLE) . However, the etiology of many of the computationally extracted signatures still has to be established and it is not clear if these signatures have a one to one relationship to exposure or DNA repair defects.

The association between mutational spectra and their underlying mutagenic process is further complicated given that mutations arise from the action of primary DNA lesions, that include a multitude of base modifications and DNA adducts, single and double strand breaks, intra- and inter-strand DNA crosslinks, and counteraction of the DNA repair machinery. This implies that there are at least two unknowns that contribute to a mutational spectrum. The effects of such interactions are exemplified by with combined MMR deficiency and POLE mutations that lead to DNA replication errors, which display a ​ ​ profoundly different spectrum compared to samples with the same POLE deficiency but ​ ​ intact MMR (Haradhvala et al., 2018). The importance of DNA repair capacity is also ​ ​

4 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

evident when considering the apparent tissue specificity of some cancer-associated DNA repair syndromes; for example inherited defects in HR are commonly associated with breast and ovarian cancer, while Lynch syndrome characterized by MMR deficiency is primarily linked to (Sieber et al., 2005). It is established that thousands of DNA ​ ​ lesions occur during a single cell cycle, even in healthy cells (Tubbs and Nussenzweig, 2017). ​ ​ However, only a tiny proportion of primary DNA lesions ultimately result in mutation, further highlighting the importance of the mechanisms that mend DNA damage in the genesis of mutational spectra.

Here we experimentally investigate the counteracting roles of genotoxic processes and the DNA repair machinery. Using C. elegans whole genome sequencing, we determine the ​ ​ mutational spectra resulting from exposure to 12 genotoxic agents in wild-type and 53 DNA repair defective lines, encompassing most known DNA repair and DNA damage response pathways. Experiments combining genotoxin exposure and DNA repair deficiency show signs of altered mutagenesis, signified by either higher or lower rates of mutations as well as altered mutation spectra. These interactions highlight how DNA lesions arising from the same genotoxin are mended by a number of DNA repair pathways, often specific towards a particular type of DNA damage and therefore changing mutation spectra usually in subtle but sometimes also dramatic ways.

In addition, analysing 9,946 cancer exomes from the TCGA collection we reveal that loss of function mutations in DNA repair pathways are common across cancer types. Yet except for rare and extreme cases these mutations seemingly have moderate mutagenic effects, which appear small compared to the observed inter-sample heterogeneity. Still a number of cases displaying genotoxin-repair interactions are found, such as MGMT silencing and DNA ​ ​ alkylation by temozolomide. Despite these small effects, DNA repair defects drive , as evidenced by an excess of missense and truncating mutations in DNA repair and damage response genes. These apparently contradictory findings can be explained by the fact that cancer transformation usually requires a combination of 2-10 driver gene mutations, the chances of which rise steeply even for small increase in mutation rates. It thus emerges that seemingly subtle mutagenic phenomena can be important for cancer development, underscoring the need to understand the contributions to mutagenesis in great detail.

5 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Results

Experimental mutagenesis in C. elegans ​

To determine how the interplay between mutational processes and DNA repair status defines mutational spectra we used 54 C. elegans strains, including wild-type and 53 lines defective ​ ​ for DNA repair and DNA damage sensing (Figure 1A). These cover all major repair ​ ​ pathways including direct damage reversal (DR), (BER), nucleotide excision repair (NER), mismatch repair (MMR), double strand break repair (DSBR), translesion synthesis (TLS), DNA crosslink repair (CLR), DNA damage sensing checkpoints (DS), non-essential components of the DNA replication machinery, helicases that act in various DNA repair pathways, telomere replication genes, apoptosis regulators and components of the spindle checkpoint involved in DNA damage response (Arvanitis et al., ​ 2013; Bertolini et al., 2017; Boulton et al., 2002; Lans and Vermeulen, 2011; Tijsterman et al., 2002) (Supplementary Table 1). To inflict DNA damage we used 12 genotoxic agents ​ ​ ​ that included UVB-, X- and γ-radiation; alkylating agents such as the ethylating agent ethyl-methane-sulfonate (EMS), methylating agents dimethyl-sulfate (DMS) and methyl-methane-sulfonate (MMS); aristolochic acid and aflatoxin-B1, which form bulky DNA adducts; hydroxyurea as a replication fork stalling agent; and cisplatin, mechlorethamine (nitrogen mustard) and mitomycin C known to form DNA intra- and inter-strand crosslinks.

To investigate the level of background mutagenesis for each genotype in the absence of exogenous mutagens, we clonally propagated wild-type and DNA repair defective C. elegans ​ lines over 5-40 generations in mutation accumulation experiments (Methods, Figure 1A). ​ ​ In addition, to measure the effects of genotoxic agents, nematodes were exposed to target both male and female germ cells and mutational spectra were analysed by sequencing clonally derived progeny (Meier et al., 2014). Both approaches take advantage of the C. ​ ​ ​ elegans 3 days generation time and its hermaphroditic, self-fertilizing mode of reproduction. ​ In both experimental setups, single zygotes provide a single cell bottleneck, and subsequent self-fertilization yields clonally amplified damaged DNA for subsequent whole genome sequencing of the 100Mbp genome (Figure 1A). Experiments were typically done in ​ ​ triplicate with dose escalation for genotoxin treatments.

6 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Overall, we analysed a total of 2,721 whole genome sequenced samples, comprising 478 mutation accumulation samples, 237 samples from genotoxin-treated DNA repair proficient wild-type lines and 2,006 samples from interaction experiments combining genotoxin-treatment with DNA repair deficiency (Figure 1B). These harboured a total of ​ ​ 166,624 acquired mutations (average of ~60 mutations per sample) comprising 139,136 SNVs, 1,151 MNVs, 24,337 indels and 2,000 SVs. Attribution of observed mutations to each genotoxin, to distinct DNA repair deficiencies and to their combined action was done using a Negative Binomial regression analysis leveraging the explicit knowledge of genotoxin dose, genotype, and the number of generations across all samples to obtain absolute mutation rates and signatures, measured in units of mutations/generation/dose (Methods). ​ ​

Baseline mutagenic effects of DNA damage and repair deficiency in C. elegans ​

Comparing the genomes of the first and last generations from the 478 mutation accumulation samples, we found a generally low rate of mutagenesis in the range of 0.8 t0 2 mutations per generation as reported previously (Meier et al., 2014, 2018) ​ (Supplementary Figure 1). Notable exceptions are lines carrying knockouts of genes ​ ​ affecting MMR, TLS and DSBR (Figure 2B). Previously described mlh-1 MMR defective line ​ ​ ​ ​ showed high base substitution rates combined with a characteristic pattern of 1bp insertions and deletions at homopolymeric repeats (Meier et al., 2018). polh-1 and rev-3 TLS ​ ​ ​ ​ ​ ​ polymerases knockouts yielded a characteristic signature dominated by 50-400bp deletions ​ ​ (Roerink et al., 2014), and smc-6 mutants, defective for the Smc5/6 Cohesin complex ​ ​ ​ involved in HR, showed an increased incidence of SVs (Figure 2B). Supplementary ​ ​ ​ Figure 1 and Supplementary Table 2 provide a high-level summary of the experimental ​ ​ ​ mutational signatures for all 54 genetic backgrounds.

Laying out the multi-dimensional mutation spectra in two dimensions based on their similarity revealed some general trends: Genotoxins generally appear to have a stronger influence on the mutational spectrum of a sample than its genetic background (Figure 2A, ​ Supplementary Figure 2, for genotoxic effects in wild-type). Notable clusters correspond ​ to the methylating agents MMS and DMS, which led to characteristic T>A and T>C mutations, as well as a cluster of samples characterized by C>T mutations caused by the ethylating agent EMS (Figure 2A, C). Ionizing radiation (UVB, X- and ɣ-rays) caused single ​ ​ and double nucleotide substitutions, but also deletions and structural variants. UVB

7 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

irradiation showed characteristic C>T transitions in a YpTpH context (Y=C/T; H=A/C/T), and also an unexpected prevalence of T>A in a TpTpA context. Bulky adducts created by aristolochic acid and aflatoxin-B1 showed mutational spectra similar to those observed in exposed human cancers and cell lines with typical C>A and T>A mutations, respectively (Huang et al., 2017; Poon et al., 2013). Lastly, cisplatin exposure induced C>A ​ ​ transversions in a CpCpC and CpCpG context as well as deletions and structural variants, in agreement with previous studies (Figure 2 A,C) (Boot et al., 2018; Kucab et al., 2019; Meier ​ ​ ​ et al., 2014; Szikriszt et al., 2016). A summary of all mutational spectra associated with the 12 ​ genotoxic exposures in wild-type is shown in Supplementary Figure 2 and ​ ​ Supplementary Table 2. At a mean cosine similarity of 0.63 (range 0.20-0.84) these ​ experimental signatures generally display a good, although not perfect level of similarity with their human experimental counterparts (Kucab et al., 2019) and also with computationally ​ ​ derived cancer signatures with the same suspected origins (Supplementary Figure 3). ​ ​ This similarity reflects the fact that the majority of DNA repair pathways are highly conserved among eukaryotes, but also that DNA repair capacity and genotoxin metabolism may differ moderately between nematodes, human cell lines and cancer cells.

Interactions of DNA damage and DNA repair shape the mutational spectrum in C. elegans ​

Mutation rates and signatures can change considerably when combining genotoxin exposure and DNA repair deficiency. These phenomena are exemplified by experiments of MMS exposure and translesion synthesis polymerase polk-1 knockouts: In wild-type C. elegans, ​ ​ ​ DNA methylation by MMS leads to T>M (T>A or T>C) mutations at a rate of approximately 230 mutations/mM/generation (Figure 3A, top). Deficiency of polymerase κ increases the ​ ​ total mutation rate 17x to approximately 3,800 mutations/mM/generation (Figure 3A, ​ right). This increase also coincides with a distinct change in the mutational spectrum with a ​ prominent peak of T>A transversions in a TpTpT context (Figure 3A, middle). ​ ​ Quantifying these changes relative to the wild-type mutation spectrum and accounting for possible genotoxin-independent effects of polymerase κ deficiency (Methods) reveals that ​ ​ the rate of T>M mutations is approximately 10-100x higher, especially in TpTpN and CpTpN contexts (Figure 3A, bottom). These figures indicate that 90-99% of MMS-induced ​ ​ mutations are avoided by Polκ driven TLS. T>M mutagenesis is likely to be caused by N3-methyladenosine, which stalls replicative polymerases and is mended by several TLS polymerases (Yoon et al., 2017). Our data indicate that TLS by Polκ in wild-type C. elegans ​ ​ ​

8 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

occurs at a low error rate; in the polk-1 mutant, however, bypass has to be achieved by other, ​ ​ error-prone TLS polymerases, resulting in largely increased T>M mutagenesis. A similar change of T>M SNVs in the mutation spectrum of polk-1 mutants was observed for DNA ​ ​ ethylation by EMS, at a rate of 10x (Supplementary Figure 4). ​ ​

O6-methylguanine, which mispairs with thymine resulting in C>T transitions, is a further DNA modification induced by MMS. Treatment of wild-type C. elegans with MMS, however, ​ ​ yields less than 10 C>T mutations/mM (Figure 3B). In contrast, combining MMS exposure ​ ​ with alkyl-transferase agt-1 deficiency increases C>T mutations by a factor of 12, while ​ ​ leaving the rate of T>M mutations entirely unchanged. This demonstrates that AGT-1 specifically reversemost O6-methyl- adducts, thus acting as the functional C.elegans ​ ortholog of the human methylguanine DNA methyltransferase MGMT. Interestingly, a ​ ​ similar, but weaker interaction of agt-1 was observed with the ethylating agent EMS, leading ​ ​ to an increase in C>T transitions by a factor of 1.5, indicating that AGT-1 is involved in, but less efficient at, removing ethyl groups, consistent with reports in E. coli (Taira et al., 2013) ​ ​ (Supplementary Figure 4). ​ ​

Entirely distinct phenomena occur upon UVB irradiation. Error prone TLS is a key mechanism to overcome UV-induced DNA damage such as cyclobutane pyrimidine dimers, which stall replication polymerases. UVB exposure led to approximately 10

2 mutations/100J/m ,​ comprised mostly of C>T and T>C transitions (Figure 3C). ​ ​ ​ Paradoxically, knockout of rev-3, which encodes the catalytic subunit of TLS Polζ, reduced ​ ​ UVB mutation rates 1.5-fold, mostly by a reduction in base substitutions, which outweighs a 4-fold increase in deletions > 50bp (Figure 3C). To protect the genome from such ​ ​ mutations, which are likely to be deleterious, it is believed that TLS bypasses UV lesions at the cost of a higher base substitution rate (Yoon et al., 2019). A similar phenomenon was ​ ​ observed for rev-3 mutants under MMS exposure, which led to an approximately 20% ​ ​ decrease of single nucleotide variants but increased deletions (Supplementary Figure 4), ​ ​ indicating that DNA repair in some cases adds to overall mutagenicity.

In contrast to the above examples where DNA repair deficiency changes mutational signatures, knockouts of the NER genes xpf-1 and xpc-1 uniformly increased the rate of UVB ​ ​ ​ ​ induced mutagenesis by a factor of 20 and 32, respectively, across the entire mutation spectrum (Figure 3D). This uniform increase indicates that NER is involved in repairing the ​ ​ large majority of UVB damage including both single- and multi- nucleotide lesions.

9 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Interestingly, xpf-1 and xpa-1 knockouts also uniformly increased the mutation burden ​ ​ ​ ​ resulting from EMS and MMS alkylation by a factor of 2 indicating that NER also contributes to error-free repair of alkylation damage (Supplementary Figure 5). Similarly, xpf-1 ​ ​ ​ mutants showed a five-fold increase in mutations after aristolochic acid exposure, demonstrating that NER repairs both small and bulky DNA adducts (Supplementary ​ Figure 4). ​

These examples of genotoxin-repair interactions are not uncommon: In total, 72/196 (37%, at FDR of 10%) of combinations displayed an interaction between DNA repair status and genotoxin treatment (Figure 4A). Conversely, more than half of combinations produced ​ ​ mutation spectra which could be fully described as the superposition of the wild-type genotoxin signature and that of the DNA repair deficiency background, indicating that the two processes acted independently. Usually genotoxin-repair interactions increase the numbers of mutations obtained for a given dose of mutagen, leaving the mutational spectrum mostly unchanged, others have a profound impact on mutational spectra (Figure ​ 4B, for a comprehensive list of effects see Supplementary Figure 5 and Supplementary ​ ​ ​ ​ Table 3). The emergence of distinct mutational spectra depending on DNA repair status ​ reflects how multiple repair enzymes and pathways contribute to mending damaged DNA as in the case of DNA alkylation. If DNA repair is predominantly achieved by one pathway such as by NER for UVB damage and bulky DNA adducts, the number of mutations is increased, but the signature tends to remain unchanged.

To illustrate the overall magnitude of these interaction effects, we estimate that of the 141,004 mutations we observed upon treatment with genotoxins in DNA repair deficient strains 23% of mutations were attributed to the endogenous mutagenicity of DNA repair deficiency genotypes independent of genotoxic exposure, and 62% of mutations would be attributed to genotoxic exposures independent of the genetic background. Of these, 2% were not observed due to negative interactions of genotoxins and DNA repair deficiency, such as UVB exposure of TLS knockout strains, and 17% of mutations were added because of the positive interactions, which increased mutagenicity (Figure 4C). ​ ​

Mutagenic consequences of DNA repair and damage sensing deficiency in human cancers

10 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Mutation is the driving force of cancer. Studying mutagenic interactions between DNA damage and repair in primary cancers, however, is more challenging than in controlled mutagenesis experiments because the history of DNA repair pathway deficiency and the cumulative genotoxic exposures of a given sample are generally unknown. To establish a catalogue of potential DNA repair pathway deficiency in human cancers we annotated missense and loss-of-function mutations, as well as epigenetic gene silencing in 81 core genes of the 9 consensus DNA repair pathways (Knijnenburg et al., 2018; Pearl et al., 2015) ​ across 9,946 samples and 30 cancer types available from TCGA (Supplementary Table 4). ​ ​ As reported previously, we found that heterozygous mutations with at least one gene of a given repair pathway being mutated are very common across DNA repair pathways, reaching nearly 100% for the DNA damage sensing (DS) pathway in ovarian cancers and uterine carcinosarcomas due to very high rates of TP53 mutations (Knijnenburg et al., 2018); ​ ​ ​ ​ Supplementary Figure 5A, Supplementary Table 4). In contrast, biallelic DNA repair ​ gene deficiency is generally rare (Supplementary Figure 5B), and most prevalent in the ​ ​ DS pathway (11% pan-cancer), but also MMR (7% of endometrial cancers), as well as HR (6% of ovarian cancers). The age-adjusted mutation burden associated with mono- and biallelic DNA repair deficiency broadly displayed the expected hypermutator phenotypes, including biallelic MMR deficiency, monoallelic POLE exonuclease mutations, and also biallelic ​ ​ mutations in HR (Supplementary Figure 5C, D). These patterns are consistent with the ​ ​ findings of our experimental screen where most DNA repair deficiencies exhibited only a weak mutator phenotype unless further probed by genotoxic exposures (Figure 2A). ​ ​

In order to infer interactions between suspected genotoxin exposures and DNA repair deficiency, we next adapted the Poisson model used in our mutagenesis screen to simultaneously extract signatures and DNA repair effects thereon (see Methods). Applying ​ ​ this approach to a range of cancers with suspected genotoxic exposures, we found that the observed DNA repair and damage interaction effects were usually moderate. Yet a number of noteworthy examples exist, illustrating the actions of different DNA repair pathways, including DR, MMR, and TLS, in response to particular types of DNA damage in vivo ​ (Figure 5A). The strongest interaction, which is even therapeutically exploited, occurs ​ ​ between the human agt-1 ortholog MGMT and temozolomide in temozolomide-treated ​ ​ ​ ​ glioblastomas (Figure 5B) (Kim et al., 2015). This is in good agreement with our ​ ​ ​ ​ experimental findings that the nature of mutation spectra detected upon EMS, DMS or MMS alkylation depends on the status of agt-1 (Figure 3B). We note that the signature associated ​ ​ ​ ​

11 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

with temozolomide exposure in MGMT deficient cancers leads to a more characteristic C>T ​ ​ spectrum in a NpCpY context (Y = C or T).

A further noteworthy interaction arises between MMR deficiency and a range of other mutational processes. These include concurrent POLE mutations manifesting in an increased ​ ​ C>A mutagenesis in a NpCpT context (Figure 5C), as noted previously (Haradhvala et al., ​ ​ ​ 2018; Shlien et al., 2015). Moreover, MMR deficiency changes the rates of mutational ​ processes causing mononucleotide insertions and deletions in homopolymeric stretches (Supplementary Figure 6D), which arise at variable rate across tissues in MMR ​ ​ ​ ​ proficient samples (Alexandrov et al., 2018). Compared to MMR proficient samples, the rate ​ ​ of single base deletions per year was on average 5-fold higher in MMR deficient samples following the same trend across different tissues; the rate of single base insertions was increased on average 4-fold, and the rate of C>T at CpG sites was 3-fold higher in the samples with MMR defects compared to the tissue baseline. These data also square well with the observation that MMR deficiency is associated with a number of additional mutational signatures of unknown aetiology (Meier et al., 2018), suggesting that these may reflect ​ ​ different mutagenic processes which are exacerbated by MMR deficiency.

A last example of DNA damage-repair interaction is conferred by APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) and error-prone TLS-driven BER. APOBEC deaminates single-stranded to uracil, which pairs with during replication, leading to C>T mutations. Uracil is thought to be removed by Uracil DNA Glycosylase UNG; subsequent synthesis by error-prone TLS REV1 leads to C>G mutations (Morganella et al., 2016; Roberts and Gordenin, 2014). Indeed, a lack of C>G mutations has ​ been observed in a cancer cell line with UNG silencing (Kim et al., 2015; Petljak et al., 2019). ​ ​ ​ ​ In human cancers, evidence of REV1 and UNG mutation or silencing contributing to ​ ​ ​ ​ APOBEC mutagenesis is weaker, but confirms the expected trend: On average, samples with defective REV1 or UNG display an 8% decrease in C>G mutations (Figure 5D). However, ​ ​ ​ ​ ​ ​ REV1 and UNG status at diagnosis only explains a small fraction of APOBEC mutations with ​ ​ ​ low rates of C>G , suggesting that either other processes contribute to these variants or that the underlying cause may not be detectable at the time of diagnosis anymore.

Notably no effect was observed for NER variants in lung and skin cancers, although one might expect NER involvement in repairing bulky DNA adducts derived from tobacco smoke and UV light (Figure 5A). Similarly NER-deficient bladder cancers with recurrent ​ ​

12 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

mutations in ERCC2/XPD did not show a strong effect on the mutation burden, despite ​ ​ reports of a mild increase of a signature similar to COSMIC signature 5 (Kim et al., 2016). ​ ​ Taken together these data demonstrate that definitive examples of mutagenic interactions between DNA damage and repair can be found in human cancers, but these are restricted to cases with the strongest effects. The unknown lifetime exposure and unknown timing of somatically acquired DNA repair deficiency makes it very challenging to establish associations of weaker effects in cancers. Indeed, weak interactions between primary DNA lesions and repair defects are also most common in our C. elegans experimental screen ​ ​ (Figure 4A). ​ ​

Selection analysis confirms a cancer-promoting effect of DNA repair mutations

Given that the association between DNA repair pathway defects and mutation phenotypes is weaker than expected, we took on an evolutionary approach to investigate if changes in DNA repair status nevertheless affect oncogenesis. A high-level of recurrence of non-silent variants in a gene across cancer samples indicates that the given gene is positively selected and promotes cancer progression. To assess whether DNA repair pathways mutations are positively selected in cancers, we calculated the rate of non-silent to silent mutations dN/dS relative to its expected ratio based on silent variants and epigenetic covariates (Greenman et ​ al., 2006; Martincorena et al., 2017) for a set of 248 DNA repair genes and members of DNA ​ repair pathways (Supplementary Table 4). A value of dN/dS = 1 indicates no selection, ​ ​ while values greater or less than 1 signify positive and negative selection, respectively. Indeed, a variety of DNA repair genes and pathways analysed showed a high dN/dS ratio, indicating positive selection (Figure 6A). Pathway-level analysis confirmed the pan-cancer ​ ​ excess of non-silent mutations in genes encoding for DNA damage sensing (DS) proteins including TP53, ATR and ATM (Supplementary Table 5). Equally, nonsense mutations ​ ​ ​ ​ ​ ​ ​ ​ are enriched in MMR genes in uterine and stomach cancers. Overall, 7/248 genes involved in DNA repair displayed significant signs of positive selection in missense mutations, and 17/248 DNA repair genes are enriched in truncating mutations (FDR<10%; Figure 6A, ​ ​ ​ Supplementary Table 5). Among these the majority are known cancer genes such as ​ TP53, IDH1, PTEN, ATM, ATRX, SMARCA4, BRCA1/2, MLH1, but we also detected new ​ ​ ​ ​ ​ genes with mild signs of positive selection, namely the DNA damage sensing protein kinase PRKDC (dN/dS = 2 for nonsense variants; q-value = 0.01), the pre-MRNA processing factor ​ PRPF19, involved in TCR, (dN/dS = 4.6 for nonsense variants; q-value = 0.005) and the ​

13 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

homologous recombination repair gene SWI5 (dN/dS = 3.2 for missense and 5.3 for ​ ​ nonsense variants; q-value = 0.05 and 0.07, respectively).

Small changes in mutation rate may cause large cancer risk

The apparent discrepancy between the strong genetic evidence for DNA repair deficiency driving cancer and the commonly moderate effects on mutational spectra can be explained from the perspective of somatic evolution. Given that mutations occur throughout a cancer patient’s lifetime (Gerstung et al., 2017; Tomasetti et al., 2013), a DNA repair deficiency ​ ​ acquired at a late stage during tumour development may not have enough time to substantially impact on the mutational burden or signature. In addition, it is important to consider the relationship between mutation rate and cancer risk, mindful that a small number of driver mutations are needed for cancer transformation. Classic studies by (Armitage and Doll, 1954; Nordling, 1953) postulated that there are about 6-7 rate limiting ​ steps needed for cancer formation, based on the observation that cancer risk increases approximately as the fifth power of age, a prediction in good agreement with modern estimates of 2-10 driver gene point mutations in cancer genomes (Martincorena et al., 2017; ​ Vogelstein and Kinzler, 2015). As the chance to independently mutate multiple driver genes ​ in the same cell becomes a power of the mutation rate, a relatively small change in the mutation rate leads to a dramatic increase in the incidence rate of a cancer at any given age:

6 Increasing the mutation rate by a factor of 2 leads to an approximately 2 =​ 64 fold increased ​ probability of 6 co-occurring driver mutations; clonal expansions reduce the exponent, but preserve the overall exponential scaling between mutation rate and cancer risk (Tomasetti et ​ al., 2015). ​

Indeed, the observed relation of relative risk and mutation rate increases in a number of cancer predisposition syndromes, but also smoking-related lung cancers follow this trend (Figure 6B). As noted previously, MMR-deficient colorectal cancers have an ~8-10 fold ​ ​ higher mutation burden than MMR-proficient carcinomas, inherited MMR deficiency increases colorectal cancer risk >115 fold. Similarly, HR-deficient breast cancers display a ~3-fold higher mutation burden, despite 20-40 fold increased risks for BRCA1/2 mutation ​ ​ carriers (Antoniou et al., 2003). Even more so, XP patients display an 100-fold increase of ​ ​ mutation rate (Zheng et al., 2014), but have an approximately 10,000 fold increased rate of ​ ​ skin cancers (Bradford et al., 2011). Under this model, high-impact SNPs in NER genes, ​ ​

14 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

which lead to ~2-fold increase in risk (Wheless et al., 2012) would be expected to ​ ​ cause a 20% increase in mutation rate, in agreement with our observations (Figure 6B). ​ ​ While the exact number of driver gene mutations remains unknown in each cancer type an implication of theses data show that small changes in mutation rates can have a large impact on cancer risk, and conversely noticeable risk factors may derive from rather moderate mutagenic effects.

It thus emerges that also small mutagenic effects may be important for cancer development, explaining why genes with no obvious mutator phenotype may be positively selected in cancers. These findings also underscore the need to accurately determine seemingly small and subtle mutagenic effects, which would be challenging to detect in cancer genomes, but are often found in our experimental screen (Figure 3B,C). ​ ​

Discussion

Taken together, our experimental screen and data analysis showed that mutagenesis is fundamentally driven by the counteraction of DNA damage and repair. A consequence of this interplay is that the resulting mutational spectra vary in a number of ways, ranging from a uniform increase of the genotoxic mutation profiles to changes in the mutation spectrum and even a reduction in the rate of mutations for certain combinations of DNA damage and repair. Each of these phenomena sheds light on the mode of action of DNA repair. Uniform increases across the entire mutation spectrum suggest that the repair gene is involved in repairing all the DNA lesions generated by the genotoxin. Conversely, mutation spectra change if divergent repair pathways are involved in repairing specific subsets of DNA lesions introduced by the same genotoxin. This was observed here for alkylating agents and aristolochic acid, implying that the same biochemical adducts at different bases and residues are repaired by distinct pathways, which may be a consequence of their impact on DNA helix shape and stability. In the case of DNA methylation our data corroborate the notion that the mutagenicity of O6-methylguanine stems from mis-pairing with thymine, and is repaired by methylguanine methyltransferases, while N3-methyladenine stalls replication, which is resolved by translesion synthesis polymerase Polκ. Lastly, TLS Polζ deficiency reduced the mutagenicity of UVB – and to a lesser extent also of alkylating agents – seemingly replacing a large number of single nucleotide variants with a smaller number of large deletions. This finding illustrates how functional error-prone TLS prevents the accumulation of potentially more damaging large deletions at the cost of an increased base substitution rate, which may

15 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

present the lesser of two evils for a cell having to deal with a large number of genotoxic lesions.

Our analysis of human cancers showed that, while non-silent mutations in DNA repair genes are common, bi-allelic inactivation of both copies, which is generally required for loss of function, appears to be a relatively rare phenomenon. About 10% of DNA repair genes display signals of positive selection, signifying a causal contribution to cancer progression. Surprisingly few DNA repair genes, however, were found to exhibit a measurable mutational phenotype on their own or in relation to other mutational processes. While the frequent lack of strong mutational signature appears puzzling at first sight, it seems less surprising when seen through the lense of somatic evolution. If DNA repair deficiency is acquired at a late stage there may simply not have been enough time to accumulate substantial amounts of mutations. Also relatively large increases of cancer risk – as seen for a number of inherited DNA repair deficiency syndromes – may stem from a relatively moderate increase in mutation burden. These findings agree well with the observation of our screen, where many DNA repair knockouts did not yield dramatic mutation rate increases unless further probed by genotoxic exposures. A notable exception to this trend was mismatch repair deficiency, possibly because MMR mends a large number of endogenous DNA alterations, which arise in the absence of exogenous genotoxin exposures, including T·G mismatches resulting from spontaneous of 5meC and mispairing resulting from replication slippage. Similarly, we observed that APOBEC-induced C>G mutagenesis depends on both UNG-driven uracil excision and translesion synthesis by REV1, leading to a 20% reduction of C>G mutations when either repair component is deficient.

The analysis of mutational signatures has gained much attention in recent years and dramatically deepened our understanding of the range and common types of mutation patterns observed in human cancer and normal tissues. These signatures have revealed the mutagenic consequences of a number of endogenous and exogenous genotoxin exposures and signatures indicative of DNA repair deficiency conditions, which can be therapeutically exploited. Some mutational signatures are very promising biomarker candidates, such as MMR deficiency for or HR deficiency forPARP inhibitor treatment. However, our study suggests that finding signatures of other DNA repair deficiencies may be more challenging as the resulting mutation signal may be faint – because small mutation rate increases can cause a large cancer risk – and variable due to DNA-damage repair interactions. Seemingly, for a number of DNA repair conditions one might have to combine

16 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

the analysis of mutational profiles with genetic and molecular testing of repair genes in order to establish whether or not a DNA repair pathway is truly compromised.

Looking ahead it appears important to recognise the variable nature of mutagenesis due to the eternal struggle between DNA damage and repair. Small shifts of these antagonistic forces in favour of mutation may drive cancer progression. Still, our ability to observe the consequences of many genotoxic processes is limited by the overall high efficiency and redundancy of DNA repair processes which only leave a small fraction of damage visible as mutations. Thus, catalogues of mutational signatures reported in cancer cells may only ever represent the most striking exemplars of specific mutagenic constellations requiring further and thorough experimental dissection.

17 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Methods

Data generation

The experimental C. elegans data was generated as described previously (Meier et al., 2014), ​ ​ ​ ​ with strains and mutagen doses described in Supplementary Table 1. Hermaphrodites ​ ​ were treated with DNA damaging agents at the late L4 and early adult stages to target both male and female germ cells. Resulting zygotes provide a single cell bottleneck where mutations are fixed before being clonally amplified during C. elegans development and ​ ​ passed on to the next generation in a mendelian ratio. The baseline of mutagenesis in wild-type and DNA repair defective strains was determined by independently propagating 3-5 self-fertilizing hermaphrodites per genotype typically for 20 or 40 generations as described (Meier et al., 2018). For these mutation accumulation experiments, the P0 ​ ​ (parental) or F1 (1st filial) generation and the final generation were sequenced.

Data acquisition

DNA from C. elegans samples were prepared as described previously (Meier et al., 2014). ​ ​ ​ ​ Samples were sequenced using Illumina HiSeq 2000 and X10 short read sequencing platforms at 100 bp paired-end with a mean coverage of 50x. TCGA human cancer data was taken from NCI GDC (https://gdc.cancer.gov, (Grossman et al., 2016)) and respective ​ ​ ​ ​ studies (Cancer Genome Atlas Network, 2015; Cancer Genome Atlas Research Network, ​ 2012, 2014a, 2014b; Cancer Genome Atlas Research Network et al., 2013; Kim et al., 2015; Ng et al., 2017). ​

Mutation calling

C. elegans samples were run through the Sanger Cancer IT pipeline including CaVEMan for ​ SNV calling (Nik-Zainal et al., 2012), Pindel for indel calling (Ye et al., 2009), and BRASS ​ ​ ​ ​ (Nik-Zainal et al., 2012) and DELLY (Rausch et al., 2012) calling consensus for structural ​ ​ ​ variant identification. Mutation calls were filtered as described (Meier et al., 2014). The ​ ​ clustering and classification of SVs was constructed based on (Li et al., 2017) as described in ​ ​ Supplementary Methods. Additionally, duplicated mutations across samples were ​ removed (Supplementary Methods). Mutation counts in all samples are listed in ​ ​ Supplementary Table 1. ​

18 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Variants were classified into 96 SNV classes (6 base substitution types per 16 trinucleotide contexts), 2 MNV classes (dinucleotide variants and longer variants), 14 indels classes (6 classes of deletions based on deletion length and local context (repetitive region or not), similar for insertions, and two classes for short and long complex indels), and 7 SV classes (complex variants, deletions, foldbacks, interchromosomal events, inversions, tandem duplications and intrachromosomal translocations). Data visualisation was performed using t-SNE (Maaten and Hinton, 2008) based on the ​ ​ ​ ​ cosine distance of the 119-dimensional mutation spectra, averaged across replicates.

Extraction of interaction effects in experimental data

For each sample i = 1, ..., 2721 and mutation class j = 1, ..., 119 we counted the respective number of mutations per sample Y ij . Using matrix notation, the counts Y ∈ ℕ2721 × 119 were modeled by a negative binomial distribution, Y ~ NB(μ, φ = 100), with scalar overdispersion parameter ϕ = 100 selected based on the estimates of

2721×119 overdispersion in the dataset, and matrix-variate expectation μ ∈ ℝ+ . The basic structure of the expectation is

μ = μG · (g + αI ) + d · μM · βI , where g · μ ∈ ℝ2721×119 describes the expected mutations due to the genotype G in the G + ​ ​ 2721×119 absence of a genotoxin after g generations, while d · μM ∈ ℝ+ denotes the expectation for mutagen M at dose d in wild-type conditions. The terms α ∈ ℝ2721×1 and β ∈ ℝ2721×119 ​ ​ ​ ​ I + I + model how these terms interact (more details follow below). Here the dot product “⋅” is element-wise and we use the convention that a dimension of length 1 is extended by replication to match the length of the corresponding dimension of other factor in the product.

For αI = 0 , expected number of mutations for genotype G in wildtype has the structure

μG = g · (G × SG) , 2721 × 54 where G ∈ ℤ2 is the binary indicator matrix for denoting which one of the 54 genotypes a given sample has ( Gij = 1 if sample i has genotype j , otherwise Gij = 0 ). The matrix ​ ​ ​ ​ 54×119 SG ∈ ℝ+ denotes the mutation spectrum (i.e. signature) across the 119 mutation classes

for each of the 54 genotypes in the absence of genotoxic exposures. SG has a log-normal

prior distribution S ~ logN 0, σ2 iid with a scalar variance σ2 . The vector g ∈ ℝ2721 × 1 G ( G) G + denotes the generation number in every sample, g = g(N) , adjusted for the chances of

19 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

fixation after N generations given the 25%-50%-25% probability of each mutation becoming ​ ​ homozygous, remaining heterozygous, or being lost in each generation.

2721×1 Similarly, the expected mutations for a given mutagen at dose d ∈ ℝ+ reads

μM = d · (M × SM ) , 2721 × 12 with M ∈ ℤ2 being the indicator for the 12 mutagens used in this screen and

S ∈ ℝ12×119 being their signatures in wildtype, with per-column prior S(j) ~ logN 0, σ2 , M + M ( Mj)

where the variances σ2 = σ2 , ..., σ2 have prior distributions σ2 ~ Γ (1, 1) iid. M ( M1 M12) Mj Thus, the matrix-variate expected number of mutations can be written as

μ = (g + αI ) · (G × SG) + d · βI · (M × SM ) .

The vector-variate interaction term α measures how the mutations expected for genotype G I ​ may uniformly increase under mutagen exposure and is taken to be linear with respect to the mutagen dose,

αI = d · ( I × bI ) + ε , 2721 × 196 The symbol I ∈ ℤ2 is the binary indicator matrix for each one of the 196 interactions

196 × 1 tested in the screen multiplied by the interaction rate bI ∈ ℝ+ , which measures how much the mutation signature of the genetic background increases upon genotoxic exposure with dose d. The interaction term is modelled to have a prior log-normal distribution ​ ​ bI ~ logN (0, 0.5) iid. Lastly ε is a random offset to allow for a possible divergence of the genotypes between

experiments, adding εμG mutations with the same spectrum as in the absence of a mutagen

in a dosage-independent fashion. The random offset is modelled as ε = (J × aJ ) where 2721 × 208 J ∈ ℤ2 is an extended binary indicator matrix for each one the 196 interactions and 12

208 × 1 wild-type exposure experiments, with value aI ∈ ℝ+ , aI ~ logN (0, 0.5) iid.

In addition to this scalar interaction, the wild-type spectrum of the mutagen SM may change

2721 × 119 by the factors βI ∈ ℝ+ measuring the fold-change of each mutation type which is expressed as

βI = I × SI , 196 × 119 where SI ∈ ℝ+ is the fold-change for each mutation class in a given interaction, with prior distribution per column logS(j) ~ Laplace 0, σ2 , where the variances I ( Ij)

σ2 = σ2 , ..., σ2 have prior distribution σ2 ~ Γ (1, 1) iid. I ( I1 I196) Ij

20 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2 2 Bayesian estimates of the parameters SG, SM , SI , aJ , bI and hyperparameters σM , σI were calculated via Hamiltonian Monte Carlo using the Greta R package (Golding 2018), where * first the posterior distribution for SG is estimated on mutation accumulation samples only; all other estimates are subsequently calculated conditional on this distribution (Supplementary Methods). Mean estimates along with their credible intervals may be ​ ​ found in Supplementary Tables 2-3. ​ ​

TCGA/PCAWG human cancer data analysis

The susceptibility of DNA pathway to alteration was defined as having altered expression or high impact mutations in relevant genes (Supplementary Methods, Supplementary ​ Table 4). The interaction effect was then estimated using Hamiltonian Monte Carlo ​ sampling (Neal and Others, 2011) for the following model for matrix of mutation counts in N ​ ​ ​ samples with R mutation classes Y ∈ Mat (ℕ ⋃ {0}) : ​ ​ NxR Y ~ NB(μ, ϕ), (−K) (K) μ = S · α(−K) + S · α(K) · f G, Xβ f G = e , α ~ Unif (0, M) , β ~ N (0, 0.5) , where α is the matrix of exposures (number of mutations assigned to each signature), S is a matrix of K signatures (each column is a signature, and is normalized to sum to 1), and f is ​ ​ G the interaction factor which alters the signature. It consists of X - a binary matrix of labels, ​ ​ and β which is a matrix of spectra of the interaction effects. M is the maximal number of ​ ​ mutations per sample in the dataset. The overdispersion parameter ϕ = 50 was chosen based on variability estimates across all cancer samples.

Analysis of selection in DNA repair genes

dN/dS analysis of the ratios of numbers of expected vs observed non-synonymous and synonymous variants were performed using the trinucleotide model according to (Martincorena et al., 2017). Background mutation rates were estimated using all the genes ​ except for consistently undercovered genes (Wang et al., 2018). For single-gene analysis, all ​ ​ samples with less than 1000 coding mutations per exome were selected. For pathway-level analysis, all samples with less than 1000 coding mutations per exome from the selected cancer type were considered. The dN/dS ratio in the gene (for single-gene analysis) or set of genes (for pathway-level analysis) of interest was compared to 1 (the dN/dS ratio under the

21 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

neutral evolution model), and the significance of selection was assessed by comparing the

2 log(dN/dSgene ) χ2 statistic of the gene’s dN/dS ratio, 2 , to χ2 distribution with df = 1 , with σ(log(dN/dSgene )) FDR control performed by Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). ​ ​

Acknowledgements

This work was supported by the Wellcome Trust COMSIG consortium grant RG70175, a Wellcome Trust Senior Research award (090944/Z/09/Z), a Worldwide Cancer Research grant (18-0644), and by the Korean Institute for Basic Science (IBS-R022-A1-2019) to AG. ​ ​ We thank the Mitani Lab funded by the National Bio-Resource Project of the MEXT, Japan, and the Caenorhabditis Genetics Center funded by the NIH Office of Research Infrastructure Programs (P40 OD010440) for providing strains. Nadezda Volkova is a member of Lucy Cavendish College, University of Cambridge. We thank Nuria Lopez-Bigas for critical comments on the manuscript.

Data availability

Sequencing data are available under ENA Study Accession Numbers ERP000975 and ERP004086. Code for downstream analyses and figures can be downloaded from github.com/gerstung-lab/signature-interactions.

Author contributions

NV, BM, AG and MG wrote the manuscript, which was approved by all authors. NV analysed all data with input from BM. BM performed mutation accumulation and mutagenesis experiments. VGH and SB performed mutagenesis experiments. SG analysed somatic mutation data. FA and IM contributed to selection analysis. PC and AG conceived the study. AG and MG supervised the analysis.

22 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figures

Figure 1

Experimental design of the study A. Left: Wild-type and C. elegans mutants of selected genes from 10 different DNA ​ ​ repair pathways, including mismatch repair (MMR), base excision repair (BER), nucleotide excision repair (NER), double-strand break repair (DSBR), DNA crosslink repair (CLR), translesion synthesis (TLS), telomere replication (TR), and direct damage reversal by O-6-Methylguanine-DNA Methyltransferase (MGMT), were propagated over several generations or exposed to increasing doses of different classes of genotoxins. Genomic DNA was extracted from samples before and after accumulation/ without and with treatment and whole genome sequenced to determine mutational spectra. Right: representative experimental mutational profile plots across 119 mutation classes: 96 single base substitutions classified by change and ‘5 and 3’ base context, di- and multi-nucleotide variants (MNV), 6 types of deletions of different length and context, 2 types of complex indels, 6 types of insertions and 7 classes of structural variants. B. Number of observed mutations per replicate (grey dots) and experiment (average across replicate, black dot) for base substitutions, indels and structural variants across different experimental types: mutation accumulation in wild-type and DNA repair mutants, genotoxin treatment of wild-type C. elegans, and interaction ​ ​ experiments investigating the effect of genotoxin treatment in different DNA repair deficient backgrounds. Horizontal lines indicate the medians across experiments.

23 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Experimental setup

54 Genotypes 12 Genotoxins 2,721 Mutagenesis experiments Mutational spectra (2-3 different doses) 1 generation mutagen exposure, or 400 mlh-1–/– wild-type 20th generation ↯ 200 Knockouts: Alkylating MMR agents 10-40 generations mutation accumulation 0 BER 40 Aflatoxin WT + EMS NER 12mM DSBR Aristolochic … 20 × acid CLR (FA) TLS Crosslinking in triplicates each. 0 agents –/–

Number of mutations xpc-1 + UV 200J Helicases 100 Checkpoints Ionizing radiation TR Whole genome sequencing 50 UV-B (first and last generation / MGMT ... 0 treated and untreated samples) C>A C>G C>T T>A T>C T>G MNV del. del/ins ins.

B Observed mutations across different experiment types

Substitutions Indels Structural variants Value per sample 12 Average per experiment

● ● ● ● ● ● ● Median across experiments ● ● ● ● ● ● ● ● ● ● ● ● 1000 1000 ● ● ● ● ● ● ● ●●● 10 ● ●●●●● ●●● ● ● ●●●● ● ● ●●● ●●● ●●● ● ●● ●● ●● ●● ●● ● ● ● ● ● ●● ●● ● ● ●●● ●● ● ● ● ● ●●●● ●● ● ● ● ● ●● ●●● ● ● ●●● ● ● ●●● ● ●● ●● ●●●●● ●● ● ●● ●● ●● ●●●●●●● ● ● ●●●● ●●●●● ● ●●●● ●● ●●●● ●●●● ● ●● 8 ● ●●● ●●● ● ●●●●●● ●●●● ● ●●●● ● ●●●●● ●● ●●●●●● ● ●●●●●●●● ●●●●●●●● ● ●●●●● ● ● ●● ●●●● ● ●●●● ● ●● ●●●●●●●● ● ●●●●● ● ● ●●●●●●● ● ●●●●●● ●●●●●● ● 100 ●●●● 100 ● ● ●●●●● ●●●●●● ●● ● ● ●●●● ● ●●●●● ●● ●●●● ● ● ●●●●● ● ● ●●●● ● ● ●●●● ●●●●●●● ●●●●●●●● ● ● ● ●●●●●●●● ● ●●●●●●●●●●● 6 ● ● ● ●●● ● ● ● ●●●●●●●●●● ● ● ●●● ●●●●●●●●●●● ●●● ●● ●●●●●●●●● ● ●●●● ● ●● ●●●●●●●●●● ● ● ●●●●● ●●●●●●●●●● ●● ●●●●●● ● ●●●● ● ●● ●●●●●●●●●●●●● ● ●● ● ●●●●●●●●●●●●●●●●● ● ● ●●●● ● ●●●●●●●●●●● ● ●● ●●● ●● ● ●●●● ● ● ● ● ● ● ●● ● ●● ● ●●●●●●●●●●●●●●● ● ●● ●●●● ● ●●●●●●● ●● ● ● ● ● ●● ● ● ●●●●●●●●●● ● ●● ● ●● ● ● ● ●●●●●●●●●●●●● ●● ●●● ● ● ● ●● ● ●●● ● ● ●●●●●●● ●●● ●●●● ● ●● ● ●●● ●● ●●●●●●●●●●●●● ● ●● ●● ● ●●●●●●● ● ● ● ●●●●●●●●●●●●● ● ● ● ●● ●●●●●●●●● ● ●● ● ● ●●● ●●●●●●●●●●●●●●●● ●●● ● ●● ●●●●● ● ● ●●● ● ● ●●●●●●●●●●●●●● ●● ● ● ●●●● ● ● ●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●● ● ● ●●● ●● ●● ●●● ●● ● ●● ●●●●●●●●●●●●●●●●●●● ●●● ●● ●●● ● ●● ●●●●●●● ●● ● ●●● ●●●●●●●●●●●●●●●● ●● ●●● ●●●●●● ● ● ● ●● ●● ● ● ● ●●●●●●●●●●●●●●●● ● ●●●●●●●● ● ● ● ●●●●● ●●●●●●●● ● ● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●● ●● ●● ● ●● ●●●●●●●●●●● ●● ● ● ● ●●●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●●●●●● 4 ● ●● ● ● ● ●● ●●●● ●●● ●●●●●●●●●●●●●●● ● ● ●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ●●●●● ●●● ●● ● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●● ● ● ● ●● ● ●● ●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ●●●●●●●●●●●●●● ● ●●●●●● ●● ●● ● ● ● ●● ●●●●● ●●●●●●●●●●●●●●●●● ● ● ● ●● ●● ● ● ●●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●●●●● ● ● ● ●● ● ●●●●●●●●●●●●●●●● ● ● ●● ●●● ● ● ●● 10 ● ●●●●●●● ●●●●● ●●● ●●●●● ● ● ● ● ●● ●●●●● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● 10 ● ● ●●●●● ●● ● ●● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●● ●●● ●●● ● ● ●● ●●●●● ●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●● ● ●● ●●●●●●●●●●●●●●●●●●●● ●● ●● ● ● ● ● ●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●●● ●●●●●● ● ● ●● ● ●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● ●●● ●●● ● ● ● ●●●●● ●● ●● ● ●● ●● ● ●●●●●●●●●● ●●●●●●● ● ●● ● ●● ●● ●●●●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ●●●● ●●●●●●●●●●●●●●●●●● ●●● ●● ●●● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●● ● ●●● ● ●●● ● ●● ● ● ●● ●●●●●●●●●●●●●●●●● ●●●● ● ● ●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●● ●●● ●●●●●●●●●●●●●●● ●● ● ● ●● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● Mutations / sample ● ● ● ●● ● ●●● ●●●● ● ●●● ● ●●●● ●● ●● ●●●●●● ●●●● ●● ●● ● ● ● ●●● ● ●● ●●●●●●●●●●●●●●●●●● ●● ● ● ●●● ●●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ● ● ●●●● ●●● ●● ●●●● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●● ● ●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ●●● ●●● ●●●●●● ● ● ●● ●●●●●● ●● ●●●●● ● ● ●● ●●●● ● ●●● ●●●●●●●●● ●●● ●●● ● ● ●●●●●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● 2 ● ●●●●● ●●●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●●●● ●● ● ● ●●● ● ●● ● ●● ● ●● ●● ●● ● ●●●●●●● ● ●● ●● ●●●●●●●● ● ● ● ● ●● ● ● ●● ●●●●●●●●●●●●● ●● ●●● ●● ● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●● ●● ●● ●●●●●●●●●●●●● ●● ● ●● ●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●● ●●●● ●●●●●●●● ●●● ●● ● ● ● ● ● ●●●● ●●●●●● ●●●●●●●●●●●●● ● ● ● ●● ● ● ● ● ●●●● ●● ●● ● ●●●● ●●●●●●●● ●●●●●● ● ● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ● ●●● ●●●●●● ●● ●● ●●●● ● ● 1 ●●●●●●●● ● ● ● ● ●●● ● ● ●●●●●●●●●●●●●●● ●●●● 1 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ● ●●●●●●● ●●●●●●● ●● ●●● ●●●●● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 0 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●● ● ● ● ●●●●●●●●●●●●●●●●●●● ●●●● ● ● ●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ● ● MA Mutagen Interaction MA Mutagen Interaction MA Mutagen Interaction

Experiment Figure 1. Experimental design of the study bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2

Overview of mutational signatures of DNA repair deficiency and genotoxin exposures. A. 2D t-SNE representation of all C. elegans samples based on the cosine similarity ​ ​ between mutational profiles. Each dot corresponds to one sequenced sample, the size of the dot is proportional to the number of observed mutations. Colors depict the genotoxin exposures, no color represents mutation accumulation samples. Highlighted are selected genotoxins and DNA repair deficiencies. EMS, ethyl methanesulfonate; MMS, methyl methanesulfonate; DMS, dimethyl sulfate. B. Experimental mutational signatures of selected DNA repair deficiencies in the absence of genotoxic exposures. C. Experimental mutational signatures of selected genotoxin exposures in wild-type C. ​ elegans.

24 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A t-SNE plot of C.elegans mutation spectra B Experimental signatures of selected DNA repair knockouts TLS C>A C>G C>T T>A T>C T>G MNV del. del/ins ins. SV ● ●● 0.4 ● ● ● agt-1 ● ●● ● Aflatoxin agt-1 + ●●● ● ● ● ●●●●● 0 ●● ● ● ● ● ●● ● ●● MMS 50 ●● ● ●● ● ●● ● ● ●●● ● ● mlh-1 ● ● ● ● ● ● ●●● ● ● ● ● 0 ● ● ●● ● ● ● ●● ●● ● 1 polh-1 ● ● ● ● ● ● 0 ● Cisplatin ● ● EMS ● 1 Aristolochic ● ● ●●● ●●● ●● ● ● rev-3 ●●● ● ● ● ● ● ● ● ● acid ● ●●● ●● ● ● ●● ● ● ●● ● 0 ● ● ● ●● ●● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● smc-6 ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● 0 polk-1 + ● ●●● ● ●●●●●● ● ● ● ● ●● 0.4 ● ●●●● ● ●● ● ● xpc-1 EMS ● ● ●

● / generation Number of mutations ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●No.● of mutations● ● C Experimental signatures of selected mutagens Complex Structural ● ●● ●● ●● ●● ● C>A C>G C>T T>A T>C T>G MNV Deletions Insertions ●● ● ● indels Variants Aflatoxin.B1 ● ● ●● ●● ●● ●● ● 10.20 ●● ● ●● ● ● 100 0.15 ● ● UV ● ● ● Aflatoxin ● ● ● 0.10 ●● ● ● ● ● ● ● ●● 0.05 ● ● AristolochicAcid ● ● ● ● 1000●● ●● ● 0 ● ● ● ● ● ● ● ● ● 0.00 ●● ● ● ●●● 0.20 ● ● ● ● ● polk-1 + 2 ● ● ● ●●● 0.15 ● ● ● ● ●●● ● ● Aristolochic acid ●● ● ●●● ● 0.10 ● ● ● ● ● ●● ● ●●● ● DMS/MMS ● ● ● ●● ● 0.05 ● ● ● ●●●●● ● 0 ● ● ●Mutagen● ●● ● ● 0.00 ● ●● ● ●●● ● ● 0.20 ● ●● ● Cisplatin ●● ● ●● ● ●● ● ● ● 0.5 ● ●● ● ● ● ●● ●●●●● ● ● ● ● 0.15 ●●● ● ●● ● ● ● ● Cisplatin ● ● MMR ●● ●● ● ● Aflato●● xin−B1 0.10 ● ● ●● ● ● ● ● ● ● ● ●● ● 0.05 ●●● ● ●● MMS/DMS●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ● 0 ●● AristolochicAcid● ● 0.00 ● ●● ● ●●● ●● 300.20 ●●● ● ● ● ● ● ● EMS ● ● ● ● Cisplatin● ● 0.15 ● ● ● EMS ● ● 0.10 ● ● ● ●● ● ● ●● ● DMS 0.05 ● ● ● ●● ● ●● 0 ●●● ● 0.00 ● ● ● ● ● 200.20 ● 0.15 MMS ● Relative contribution MMS ● ● ● ● 0.10 ● ●● ● ● HU ● ● ● Genotoxin No.Total ofnumber mutations of mutations 0.05 ●● ● ● ● ● ● 0 ● ●● Mechlorethamine 0.00 ●● ● ● ● 30.20 Radiation ● ● ●●●●● ● ● ● Aflatoxin B1 ● 0.15 ●●●●● ● ● Mitomycin 100 γ-rays ● ● ● 100 0.10 ●●●●● ● ● ● ● ● / unit dose Number of mutations ● ● ● MMS 0.05 ●●●●●● ● ●● Aristolochic●● Acid 0 ●● ● 1000 10.00 ●● ●●●●● ● ●Cisplatin ● γ-raysRadiation 1000 0.20 0.15 ● UV ● ● ●●●● ● UV-B ●● ● ● ●● DMS●● ● UV-BUV 0.10 ●● ● 0.05 ●● ● ●● ● ● X-raysXray 0 ●● ● EMS● ●●● 0.00 ● 5’ base ACGTACGTACGTACGT ACGTACGTACGTACGT ACGTACGTACGTACGT ACGTACGTACGTACGT ACGTACGTACGTACGT ACGTACGTACGTACGT T*T T*T T*T T*T T*T T*T A*T T*A A*T ● T*A A*T T*A A*T T*A A*T T*A A*T T*A C*T A*A T*C C*T A*A T*C A*A C*T T*C A*A C*T T*C A*A C*T T*C A*A C*T T*C A*C C*A T*G G*T A*C C*A G*T T*G A*C C*A G*T T*G A*C C*A G*T T*G A*C C*A G*T T*G A*C C*A G*T T*G G*A A*G C*C G*A A*G C*C A*G C*C G*A A*G C*C G*A A*G C*C G*A A*G C*C G*A G*C C*G G*C C*G C*G G*C C*G G*C C*G G*C C*G G*C G*G G*G G*G G*G ●● ● NoneNA 3’ base A C G T A C G T A C G T A C G G*G T A C G T A C G G*G T ● ● Hydroxyurea 2 bp ●● 2bp ● over 2 bp ● ●● Mutagen ●● ● ● Mechlorethamine● ● >=3bp ● ● 1−49bp ● ● ● ● ● ● 6−50 bp 6−50 bp ● ● ● ● Deletions 50−400bp ● ● ● Foldbacks Inversions ● over 50 bp over 50 bp ● ●● ● ● ● ● ● ●● ● ● ●● ● Aflatoxin−B1 andem dup. ● ● ● ●● ● T ●● ● ● Translocations Interchr. events Interchr. ● ● ● ●● 1 bp (non−rep.) 1 bp (non−rep.) 2−5 bp (nonrep) 2−5 bp (nonrep) ● ● Complex events ● ● ● ● ●● ● ● AristolochicAcid 1 bp (in repeats) 1 bp (in repeats) ●●● ● ● ●● ● ●●● ●●● ● ● 2−5 bp (in repeats) 2−5 bp (in repeats) ● ● ● Cisplatin ● ● ● ● ● ● ● ● ●● Figure●● 2. Overview● of mutation spectra and signatures of DNA repair deficiency and genotoxic exposures ● ● DMS ● ●● ● ● EMS bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3

Examples of experimental genotoxin-repair interaction signatures. A. Signatures of MMS exposure in wild-type and polk-1 deficient C. elegans. Top: ​ ​ ​ ​ –/– Mutation spectra of MMS exposure in wild-type and polk-1 ​ background. Error bars ​ ​ denote 95% confidence intervals. Bottom: Signature fold-change per mutation type between the wild-type mutagen and interaction spectra. Black lines denote the maximum a posteriori point estimates; faded color boxes depict 95% CIs. Point estimates and CIs are highlighted if different from 1. Right: Dose response for MMS in wild-type (upper part) and polk-1 deficient (lower part) samples, measuring the ​ ​ total number of mutations (x-axis) versus dose (y-axis) for each replicate. polk-1 ​ ​ ​ ​ ​ deficiency leads to a 10-100x increase of T>A transversions under MMS treatment. Mutation types: MNV refer to multinucleotide variants, del/ins. to deletions with inserted sequences, SV to structural variants. B. Signatures of MMS exposure in wild-type and agt-1 deficient C. elegans. Same layout ​ ​ ​ ​ as in A. MMS treatment induced substantially more C>T transitions in an agt-1 ​ deficient background. C. Signatures of UVB exposure in wild-type and TLS polymerase ζ deficient (rev-3) ​ ​ deficient C. elegans. Same layout as in A. rev-3 knockout leads to a 3-fold reduction ​ ​ ​ ​ in the number of substitutions and a 10-fold increase in the number of indels compared to the values expected without interaction. D. Signatures of UVB exposure in wild-type and xpc-1 NER deficient C. elegans. The ​ ​ ​ ​ knockout of xpc-1 leads to 40-50 fold increase in C>T and T

25 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B Interaction of polk-1 TLS deficiency and MMS treatment Complex Structural Interaction of agt-1 DR deficiency and MMS treatment C>A C>G C>T T>A MNV Deletions Insertions indels Variants 30 Dose [mM] 30 WT + MMS Dose [mM] 20 WT + MMS 0.00 20 0.00 0.15 10 0.15 10 0.40 0 0.40 0 30 1500 –/– polk.1.MMS 0.00 agt-1 + MMS 0.00 20 1000 polk-1–/– + MMS 0.15 0.15

Mutations / mM 10 500 0.40 0.40 0 0 0 1500 0 200 C>AMutationsT>C I Mutations 100 C>G T>G SV 100 C>T 10 T>A I 10 1 1

Fold change<0.1 Mutations / mM Fold change <0.1 C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV

150

D100 xpc.1 C Interaction of rev-3 TLS deficiency and UV exposure Interaction of xpc-1 NER deficiency and UV exposure 3 50 Dose [J/m2] 0 50 100 150Dose [J/m2] 200 250 Number of mutations 4 WT + UV WT + UV 2 UV 3 0 0 0 2 1 T*T T*T T*T T*T T*T T*T T*A A*T A*T A*T A*T A*T A*T T*A T*A T*A T*A T*A T*C A*A C*T T*C A*A C*T T*C A*A C*T T*C A*A C*T T*C A*A C*T T*C A*A C*T T*G A*C C*A T*G A*C C*A T*G A*C C*A T*G A*C C*A T*G A*C C*A T*G A*C C*A G*T G*T G*T G*T G*T G*T C*C C*C C*C C*C C*C C*C A*G A*G A*G A*G A*G A*G G*A G*A G*A G*A G*A G*A C*G C*G C*G C*G C*G C*G G*C G*C G*C G*C G*C G*C 1 G*G G*G G*G G*G G*G G*G 200 2 bp 200 over 2 bp over 0 1−49 bp 0 50−400 bp Deletions Inversions 3 Foldbacks 6−50 bp 6−50 bp

60 dup. Tandem over 50 bp over over 50 bp over

–/– −−:UV –/– Translocations Interchr. events Interchr. rev-3 + UV xpc-1 + UV events Complex 0 1 bp (non−re p.) 0 1 bp (non−re p.) 1 bp (in repeats) 1 bp (in repeats) 2−5 bp (non−re p.) 2−5 bp (non−re p.) 2−5 bp (in repeats) 2 40 2−5 bp (in repeats) 200 200

1 Mutations / 100 J/m2

Mutations / 100 J/m2 20 0 800 0 0 40 0 Mutations xpc.1.UV Mutations 10 100 10 1 1 Fold change <0.1 Fold change <0.1 C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV

Mutation type Interaction terms C>A T>A MNV insertion significant term mean across mutation class C>G T>C deletion SV 95% confidence C>T T>G del/ins. interval

Figure 3. Examples of repair-genotoxin interactions bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4

Widespread and diverse genotoxin-repair interactions in C. elegans ​ A. Estimated fold changes between observed numbers of base substitutions relative to expected numbers based on additive genotoxin and DNA repair deficiency effects. A value of 1 indicates no change. Interactions with fold change significantly over or below 1 are shown in dark grey, the rest as light grey. Overall, about 37% of interactions produced a significant fold change. AA, aristolochic acid. B. Mutation signatures changes in interaction experiment. Black lines denote point estimates. Interactions with cosine distances larger than 0.2 are shown with dark blue confidence intervals, others are shown in light blue. Around 10% of interactions produced a significant change in mutational spectra. C. Numbers of mutations attributed to different causes. Overall, about 17% of mutations in the dataset are attributed to genotoxin-repair interactions, including a reduction of –2% of mutations which would be expected under wild-type repair conditions.

26 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Genotoxin repair-interactions: Substitution rate change B Genotoxin repair-interactions: Signature change C Attribution of mutations xpc−1:UV 100 40 Estimate polk−1:DMS Estimate 30 Confidence interval xpf−1:UV Confidence interval 20 Significant interaction polk−1:MMS Significant difference xpf−1:AA 0.6 polk−1:DMS 62% agt-1:MMS 5 polk-1:MMS rev−3:UV rev−3:MMS 0.4 50 2 Mutations added Less similar 0.2 base substitutions 23%

1 Cosine distance More similar Fold-change in number of Mutations removed 17% 0.5 0 mutations Number of mutations (x1000) added 0 Interaction experiment (sorted by mutation fold change) Interaction experiment (sorted by distance) removed −10 −2%

DNA repair Genotoxins deficienciesinteractions Genotoxin-repair

Figure 4. Widespread and diverse genotoxin-repair interactions in C. elegans bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5

DNA damage-repair interactions in human cancers A. Summary of interaction effects of selected DNA repair pathway deficiencies with DNA damage-associated mutational signatures. Top: Changes in age-adjusted mutation burden. Bottom: Change in mutational signatures. B. Interaction of temozolomide treatment in glioblastoma samples with MGMT. Mutational signatures of expressed (top) and silenced MGMT (middle). Bottom: ​ ​ Mutation rate fold change per mutation type. Right: Total mutation burden in temozolomide-treated samples with silenced and expressed MGMT. DR, direct ​ ​ damage reversal. C. Interactions of DNA damage caused by polymerase epsilon (POLE) exonuclease deficiency and mismatch repair. Mutational signatures of MMR deficiency, POLE deficiency, and combined POLE and MMR deficiency. Microsatellite instability is used as a read-out for MMR deficiency. Right: Mutation burden of POLE exonuclease deficiency in MMR proficient (n=40) and deficient (n = 15) uterine cancers. MSI, microsatellite instable; MSS, microsatellite stable) D. Interactions of APOBEC-induced DNA damage with REV1 and UNG BER. APOBEC mutation signatures in REV1 and UNG wildtype (top) or deficient (middle) samples. ​ ​ ​ ​ Bottom: Fold-change of variant types. Right: Observed difference between C>G and C>T mutations in a TCN context.

27 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Interactions of known DNA damaging B Interaction of temozolomide and MGMT DR and DNA repair processes 0.25 MGMT expressed

0.00 Mutation rate change 0.25 MGMT silenced P < 10-16 ● 10000 ● ●

1000 ●●●● Relative contribution 0.00

● 1000 ● ●●● 500 ● 100 ●●● 100 ●● 100 10 10 ● 1 Number of mutations 10 sil. (6) exp. (11)

5 Fold-change 0.1 C>A C>G C>T T>A T>C T>G D DI I MGMT 1 Interaction of POLE exonuclease variants and MMR (MSI/MSS) 0.1 C Fold change in mutations/year 0.2 POLE+/+ + MSI + MMR P < 10-16 exo ● ●● UV + NER 40000 ●●● ●●●●● ●●●● ●●● ● ●● 0.0 ●●●●●● ●●●●● SmokingPOLE + NER ● APOBEC + BER 0.2 10000 ● ● ●●● exo ●●● POLE + MSS ●● ● ● Temozolomide + MGMT 1000 0.0 ● ● 0.2 POLEexo + MSI

100 ● Relative contribution

0.0 Number of mutations Signature change 100 20 ● exo exo 10 POLE POLE +MSI +MSS 0.8 1

Fold-change <0.1 C>A C>G C>T T>A T>C T>G D DI I 0.6

0.4 D Interaction of APOBEC and REV1 or UNG BER mutations

0.2 −7 0.2 P = 5 x 10 ● wildtype ●●● Cosine distance 0.7 ● ●●●●● ● ●●●●●●● ●●●●● ●●●●● ● ●●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● 0 ●●●●●●●●●●●●●●●●● ●● 0.0 ● 0.5 ● ●●●●●●●● ●●● 0.2 ● ●●●●●● ●●● ● ●●●● +/– +/– ●●●●●●●●●●●● ●●● ●●●●●●●●●●●●● ●●● ●● REV1 or UNG ●●●●●●● ●●● ●●●●●●●●●●●●●● ●● ●●●●● ● + MMRMGMT ●●●●●●●● 0.3 ●●●●● exo ●●●●● ●● ●●●● ●●● UV + NER ●● ●● ●●● ● ●● Relative contribution 0.0 ● ● ● ● Smoking + NER POLE of T[C>G]N Fraction APOBEC + BER 10 0.1 ● ●

Temozolomide + 1 wt (794) mut (48)

REV1 or UNG Fold-change <0.1 C>A C>G C>T T>A T>C T>G D DI I

Figure 5. DNA damage-repair interactions in human cancers bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6

DNA repair deficiency drives cancer evolution A. dN/dS values for 248 DNA repair-associated genes across pan-cancer types. dN/dS measures the excess of observed non-silent mutations relative to the expected number based on observed silent mutations. A value of 1 denotes neutrality, while a value greater than 1 signifies a surplus of non-silent changes, indicating a cancer-promoting effect. dN/dS values per gene were calculated separately for missense and nonsense mutations. dN/dS ratios were also calculated for 9 DNA repair pathways (aggregating all genes) across 30 cancer types (right). Black/colored dots reflect results which reached significance compared to the background (FDR 10% within the respective group) with respective genes and DNA repair pathways indicated on the plot. B. Relationship between relative cancer risk and rate change for different DNA repair pathway gene mutations. Black lines depict the expected power law depency in an evolutionary model with different number of driver gene mutations (Tomasetti et al., 2015). ​

28 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Selection in DNA repair genes and pathways across cancer types B Relationship between cancer risk and mutation rate

DS_OV DS_OV more non-silent 100 DS_ESCA TP53 PTEN DS_LGG ●● DS_ESCA DS_SARC mutations RPA3 DS_SARC ● DS_BRCA ● DS_HNSC than expected typical range of IDH1 HUS1 DS_PRAD DS_LUAD ● POLR2E DS_READ DS_LGG DNA damage-repair multi-hit model prediction DS_UCS ● ● TP53 SWI5 DS_BLCA n = 10 n = 2 drivers ● DS_HNSC ●● DS_READ interaction effects PTEN ATRX ● DS_GBM ● ● HR_PAAD 10000 ● PRPF19 ● ●●● HR_PAAD ●● DS_PAAD ● ●● RFC2 DS_LUAD ● ●● DS_LIHC 10 ●●●● ●● CUL3 DS_BRCA ● DS_LUSC XP, skin cancer ● ●●●● ●● DS_GBM ● ● DS_BLCA ● ● ● ● ATM ● ● ●● ●● ●●●●●●● ● ●● DS_COAD ●●● ● ● DS_COAD ●●●● ● ● ●● ● ● ● ●● MLH1 ●●●● ●●● DS_UCS ●●● ●● ● SWI5 ● ●●● ●●● DS_PAAD ● 1000 ● ●●● BRCA1 ● ● DS_STAD ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ● DS_LIHC ●● ● ●●● ●●●● ●●● ● ● ●● HR_OV ● ● ● BRCA2 ●● ●●●●● SMARCA4● ●●● ● ●●● ●● ●● DS_LUSC●●● ●● ● ●● ●●● ● ● ● ●● ●● ● ● ●●●●● DS_SKCM ●●●●● ●●●● ●● ● ● ●●● ●●● ● ●● ●● ● ●● ●● ● ● ● ●●●●●● ●SMARCA4 ● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ●●●● ● ●● NER_BLCA● ●●●● ●●DS_UCEC ● ● ●● ●●●●●●● ●●● ● ● ● ● ●●● ● ● ●● ●●●●●● ● ●●●●● ●●●● ● ●●●●● ●●●● ● ● ●●●● ● ●●● ● dN/dS ratio ●●● ● ●● ●● ●● ● PRKDC ● ●● ●● ● ●● ATM● ●●●● BRCA2●●●●● ●● ●● ● ●● ●● ● ●● DS_STAD● ●●● ● MMR_UCEC ●●●●●●●●●●●●●●●●●● ●●●● ● ● ●●●●● ●●●● ●●●●● ● ●● ●●● ●●● ● ●●●●●●●●●● ● ●●●●●●●● ●●● ●●●● ● ●●● ●●● ●●● ● ●●● ●● ● ●●● ●●●●●●●●●●●●●●●● ● ● ● POLQ ●●●● ●● ●● ● DS_UCEC ●● ●● HR_BLCA ●●●●● ●● ● ● ●●●●●●● ● ●●● ●● ●● ●●●● ●●●●●●●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●●●● ●● ●●●●● ●●●●●● ● ● ●● 1 ●● ● ●●●●● ● ● ●●● ●●●●●●●● ●● ● ●● ● ●●● MMR, colorectal cancer ●●●●●●●● ●●●●●●● ●●● ●●●● ●●● ● ● ●●● ● ● ●● ● 100 ●●●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●●● ●● ● ●● ●● ● ●● ● ● ●● ●●●● ●● ●●● ● ●● ● ●●● ●● ●● ● ●● ● ●● ●●● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● ● Relative risk ●● ● ●● ● ●● ●●● ● smoking, ● ● ●● ●● ● ● ● 10 ● ● ●● ● BRCA, fewer non-silent 0.1 ● ● mutations than expected 1 1 10 100

● missense missense nonsense Point mutation rate fold change nonsense

per pathway per pathway

Figure 6. Mutations in DNA repair genes drive cancer evolution bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figures

Supplementary Figure 1

Experimental mutational signatures extracted for 54 genetic backgrounds. Each barplot reflects the average number of mutations per generation, the headers contain the estimate for the total number of mutations per generation, and the maximal generation number with respective mutation burden across the samples used in the model. The information on gene knockouts is given in Supplementary Table 1. ​ ​

29 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

agt−1; 1.1 (0.4−2.2) het. mutations per gen., max 1 generation agt−2; 2 (1−3.6) het. mutations per gen., max 20 gen. with 20 (8−35) mutations apn−1; 1 (0.4−2.1) het. mutations per gen., max 40 gen. with 16 (12−22) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

brc−1; 2.3 (1.3−3.8) het. mutations per gen., max 40 gen. with 36 (28−43) mutations brd−1; 1.8 (0.7−3.7) het. mutations per gen., max 40 gen. with 23 (22−24) mutations bub−3 (gt2000); 1.5 (0.5−3.6) het. mutations per gen., max 20 gen. with 15 (9−21) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

bub−3 (ok3437); 0.9 (0.3−2.1) het. mutations per gen., max 20 gen. with 9 (3−14) mutations ced−3; 1 (0.3−2.4) het. mutations per gen., max 20 gen. with 8 (5−12) mutations ced−4; 1.1 (0.3−2.7) het. mutations per gen., max 20 gen. with 7 (2−14) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

cep−1; 1.7 (0.6−3.6) het. mutations per gen., max 20 gen. with 13 (10−16) mutations cku−80; 1.5 (0.3−4.3) het. mutations per gen., max 20 gen. with 11 (7−14) mutations csb−1; 1.3 (0.5−2.7) het. mutations per gen., max 20 gen. with 11 (7−18) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

dna−2; 1.5 (0.3−4.4) het. mutations per gen., max 20 gen. with 12 (5−17) mutations dog−1; 3.5 (2.1−5.5) het. mutations per gen., max 20 gen. with 36 (30−40) mutations Estimated exo−1; 1.3 (0.6−2.6) het. mutations per gen., max 1 generation

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.75 0.4 0.4

0.3 0.50 0.3

0.2 0.2 0.25 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.00 0.0

exo−3; 1.4 (0.6−2.8) het. mutations per gen., max 40 gen. with 24 (22−26) mutations fan−1; 1.6 (0.8−3) het. mutations per gen., max 20 gen. with 12 (4−26) mutations fcd−2; 1 (0.3−2.2) het. mutations per gen., max 40 gen. with 12 (4−17) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

fnci−1; 1.1 (0.3−2.5) het. mutations per gen., max 20 gen. with 10 (5−15) mutations fncm−1; 1.3 (0.5−2.8) het. mutations per gen., max 20 gen. with 13 (9−18) mutations helq−1; 1.4 (0.7−2.5) het. mutations per gen., max 40 gen. with 19 (16−21) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

him−6; 2 (1−3.6) het. mutations per gen., max 40 gen. with 22 (22−23) mutations lem−3; 0.9 (0.3−2.2) het. mutations per gen., max 20 gen. with 8 (5−11) mutations lig−4; 0.9 (0.3−1.9) het. mutations per gen., max 40 gen. with 13 (10−17) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

mlh−1; 118 (107.4−130) het. mutations per gen., max 20 gen. with 1156 (1078−1209) mutations mrt−2; 3.1 (1.4−6) het. mutations per gen., max 20 gen. with 24 (24−24) mutations mus−81; 2 (0.9−3.9) het. mutations per gen., max 40 gen. with 24 (24−24) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 50 0.5 0.5

40 0.4 0.4

30 0.3 0.3

20 0.2 0.2

10 0.1 0.1 Number of mutations Number of mutations Number of mutations

0 0.0 0.0

N2; 1.5 (0.9−2.3) het. mutations per gen., max 40 gen. with 17 (12−24) mutations ndx−4; 1.3 (0.6−2.5) het. mutations per gen., max 40 gen. with 25 (10−43) mutations parp−1; 1.3 (0.5−2.4) het. mutations per gen., max 40 gen. with 16 (14−17) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

parp−2; 1.4 (0.6−2.9) het. mutations per gen., max 40 gen. with 22 (18−25) mutations pole−4; 1.6 (0.8−3) het. mutations per gen., max 40 gen. with 19 (16−22) mutations polh−1; 3.5 (2.4−5.2) het. mutations per gen., max 20 gen. with 10 (8−13) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5

0.4 0.4 1.0 0.3 0.3

0.2 0.2 0.5 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

polk−1; 1.8 (0.9−3.2) het. mutations per gen., max 40 gen. with 25 (13−38) mutations polq−1; 1.1 (0.5−2.4) het. mutations per gen., max 40 gen. with 17 (14−21) mutations Estimated rad−51; 1 (0.4−2.2) het. mutations per gen., max 1 generation

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

rad−54B (gt3308); 1.6 (0.4−4.1) het. mutations per gen., max 20 gen. with 14 (11−16) mutations rad−54B (gt3312); 1.2 (0.3−3.1) het. mutations per gen., max 20 gen. with 10 (7−12) mutations rcq−5; 1.3 (0.5−2.5) het. mutations per gen., max 20 gen. with 12 (9−20) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

rev−1; 1.3 (0.6−2.5) het. mutations per gen., max 40 gen. with 18 (10−25) mutations rev−3; 2 (1.2−3.3) het. mutations per gen., max 20 gen. with 12 (4−15) mutations rfs−1; 2.1 (0.9−4.2) het. mutations per gen., max 20 gen. with 22 (15−28) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.8 0.5

0.4 0.6 0.4

0.3 0.3 0.4 0.2 0.2 0.2 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

rif−1; 1.6 (0.6−3.6) het. mutations per gen., max 20 gen. with 15 (10−18) mutations rip−1; 2.1 (1−3.9) het. mutations per gen., max 40 gen. with 41 (28−49) mutations san−1; 1.8 (0.2−7.2) het. mutations per gen., max 20 gen. with 9 (9−9) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

slx−1; 2 (1.1−3.4) het. mutations per gen., max 40 gen. with 26 (25−28) mutations smc−6; 4.1 (1−11.3) het. mutations per gen., max 5 gen. with 8 (5−13) mutations smg−1; 1.6 (0.4−4) het. mutations per gen., max 20 gen. with 9 (7−10) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5

0.4 0.4 0.4 0.3 0.3

0.2 0.2 0.2

0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

tdp−1; 1.6 (0.5−4) het. mutations per gen., max 20 gen. with 10 (7−12) mutations ung−1; 1.7 (0.8−3.2) het. mutations per gen., max 40 gen. with 29 (23−36) mutations wrn−1; 1.5 (0.6−3.1) het. mutations per gen., max 20 gen. with 13 (7−21) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1 Number of mutations Number of mutations Number of mutations

0.0 0.0 0.0

xpa−1; 2.5 (1.2−4.6) het. mutations per gen., max 40 gen. with 40 (35−45) mutations xpc−1; 2.6 (1.3−4.7) het. mutations per gen., max 20 gen. with 16 (4−26) mutations xpf−1; 2.5 (1.3−4) het. mutations per gen., max 40 gen. with 23 (5−33) mutations

Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions C>A C>G C>T T>A T>C T>G MNV Deletions Insertions indels Variants indels Variants indels Variants 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 .) .) .) .) .) .) .) .) .) .) .) .) Number of mutations Number of mutations Number of mutations T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T T*T A*T T*A A*T A*T T*A A*T T*A A*T A*T T*A A*T A*T T*A A*T T*A A*T A*T T*A A*T A*T T*A A*T T*A A*T T*A T*A T*A T*A T*A A*T T*A A*T T*A T*A A*T T*A T*C A*A A*A T*C A*A A*A A*A A*A C*T A*A A*A T*C A*A A*A A*A A*A T*C A*A A*A T*C A*A A*A A*A A*A T*C C*T T*C C*T C*T C*T T*C C*T T*C C*T T*C C*T C*T C*T T*C C*T T*C C*T C*T T*C C*T C*T C*T T*C C*T T*C C*T T*C T*C T*C G*T T*G A*C C*A G*T A*C C*A G*T T*G A*C C*A A*C C*A G*T A*C C*A G*T A*C C*A A*C C*A G*T A*C C*A G*T T*G A*C C*A A*C C*A G*T A*C C*A G*T A*C C*A G*T T*G A*C C*A G*T A*C C*A G*T T*G A*C C*A A*C C*A G*T A*C C*A G*T A*C C*A G*T T*G T*G G*T T*G T*G T*G G*T T*G T*G T*G G*T T*G T*G T*G T*G T*G C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C C*C A*G A*G A*G A*G A*G A*G G*A A*G A*G A*G A*G A*G A*G A*G A*G A*G A*G A*G A*G G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A G*A C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C C*G G*C G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G G*G 2 bp 2 bp 2 bp over 2 bp over 2 bp over 2 bp over 1−49 bp 1−49 bp 1−49 bp 50−400 bp 50−400 bp 50−400 bp Deletions Deletions Deletions Inversions Inversions Inversions Foldbacks Foldbacks Foldbacks 6−50 bp 6−50 bp 6−50 bp 6−50 bp 6−50 bp 6−50 bp Tandem dup. Tandem dup. Tandem dup. Tandem over 50 bp over over 50 bp over 50 bp over over 50 bp over 50 bp over over 50 bp over Translocations Translocations Translocations r. events Interch r. events Interch r. events Interch r. Complex events Complex events Complex events Complex 1 bp (non−re p 1 bp (non−re p 1 bp (non−re p 1 bp (non−re p 1 bp (non−re p 1 bp (non−re p 1 bp (in repeats) 1 bp (in repeats) 1 bp (in repeats) 1 bp (in repeats) 1 bp (in repeats) 1 bp (in repeats) 2−5 bp (non−re p 2−5 bp (non−re p 2−5 bp (non−re p 2−5 bp (non−re p 2−5 bp (non−re p 2−5 bp (non−re p 2−5 bp (in repeats) 2−5 bp (in repeats) 2−5 bp (in repeats) 2−5 bp (in repeats) 2−5 bp (in repeats) 2−5 bp (in repeats)

Supplementary Figure 1. Experimental signatures of DNA repair deficiencies in C.elegans bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figure 2

Mutational signatures extracted for 12 genotoxins in wild-type C. elegans. Each barplot ​ ​ reflects the number of mutations per average dose of indicated mutagen, headers contain the estimate for the total number of mutations per average dose. For details on doses and mutagen specification see Supplementary Table 1. ​ ​

30 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Aflatoxin-B1, 4.2 mutations on average per 1 muM Mechlorethamine, 15.1 mutations on average per 10 muM Cisplatin, 0.7 mutations on average per 10 muM Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants 0.5 2.0 0.100

0.4 1.5 0.075 0.3 1.0 0.050 0.2 Number of mutations Number of mutations Number of mutations 0.5 0.025 0.1

0.0 0.0 0.000

Gamma Radiation, 57.7 mutations on average per 100 Gy DMS, 32.5 mutations on average per 0.1 muM Xray, 11.4 mutations on average per 100 Gy Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants 2.5 2.0 2.0 1.0 1.5 1.5

1.0 1.0 0.5 Number of mutations Number of mutations Number of mutations 0.5 0.5

0.0 0.0 0.0

MMS, 230.9 mutations on average per 1 mM EMS, 241.8 mutations on average per 10 mM HU, 1.9 mutations on average per 1 mM Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants 30

15 0.20 20 0.15 10 0.10 10 Number of mutations Number of mutations Number of mutations 5 0.05

0 0 0.00

UVB, 10.7 mutations on average per 100 J/m2 Aristolochic Acid, 11.7 mutations on average per 10 muM Mitomycin, 35.3 mutations on average per 1 mM Complex Structural Complex Structural Complex Structural C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants C>A C>G C>T T>A T>C T>G MNV Deletions indels Insertions Variants 0.8 2.0 2.0 0.6 1.5 1.5 0.4 1.0 1.0 Number of mutations Number of mutations Number of mutations 0.2 0.5 0.5

0.0 0.0 0.0

Supplementary Figure 2. Experimental signatures of mutagens in C.elegans bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figure 3

Comparison between experimental signatures of 12 mutagens extracted from C.elegans ​ experiments to the catalogue of mutational signatures of environmental from (Kucab et al., 2019), and the catalogue of mutational signatures in cancer (Alexandrov et al., ​ ​ 2018). C. elegans signatures were adjusted to the human genome trinucleotide frequency. ​ ​ ​ A. Cosine similarities between the humanized C. elegans signatures and experimental ​ ​ (blue) and computational (red) signatures of same or related mutagens in human cells or cancers. “Experimental” signature for ionizing radiation was estimated as an averaged spectrum of 12 radiotherapy-associated secondary malignancies from (Behjati et al., 2016). No counterparts from human data were found for HU, ​ Mitomycin C and MMS. Red line denotes the cut-off of 0.8 beyond which the spectra are considered similar. B. Visual comparison of experimental signatures for selected mutagens. Signature of aflatoxin was similar to the computationally extracted signature 24 (similarity 0.8) but different from experimental one (0.62). C. elegans UVB exposure showed a C>T ​ ​ mutation spectrum somewhat similar to that in cell line experiments and in cancer, albeit with an additional fraction of T>C mutations. C. elegans EMS signature is very ​ ​ different from temozolomide signature in cell lines, but very similar (0.90) to cancer-derived computational signature SBS11 associated with temozolomide. Interestingly, similar phenomena were observed upon exposure with alkylating agents in Salmonella typhimurium (Matsumura et al., 2018). Aristolochic acid ​ ​ ​ ​ signatures were consistent between both experimental systems and cancer. Signature of cisplatin was different from the one identified in cancer or human cell lines. Ionizing radiation and X-rays yielded profiles similar to those in human cancers. Exposures to mechlorethamine and DMS exhibited different mutational spectra in the two model systems.

31 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Distributions of similarities between experimental and computational signatures A of selected mutagens in C.elegans and human Temozolomide and EMS Aristolochic acid

EMS (C.e.) Aristolochic acid (C.e.) 1.0 0.1 0.1

●● ● ● ● 0 0 ● ● ● Temozolomide Aristolochic acid 0.8 0.1 0.1 ● 0 0 ● ● 0.6 ● contribution Relative Signature 11 Signature 22 ● 0.1 0.1

Cosine similarity ● 0 0 0.4 Cisplatin Irradiation ● ● Experimental 0.1 ● Computational Cisplatin (C.e.) Ionising Radiation (C.e.) 0.2 ● 0.1

0 0 UV HU 0.1 Xray EMS DMS

MMS Cisplatin X-ray (C.e.) 0.1 Cisplatin Radiation Mitomycin

Aflatoxin.B1 0 0 0.1 Signature 31 Secondary malignancies AristolochicAcid Mechlorethamine 0.1 Relative contribution Relative

0 0 0.1 Comparison between experimental and computations signatures Signature 35 Signature 40 B of selected mutagens in C.elegans and human 0.1

Aflatoxin-B1 UV 0 0

0.1 Aflatoxin (C.e.) UV-B (C.e.) 0.1 Mechlorethamine DMS

0 0 Mechlorethamine (C.e.) DMS (C.e.) 0.1 Aflatoxin Simulated UV 0.1

0 0 0 0 Mechlorethamine DMS Relative contribution Relative 0.1 Signature 24 Signature 7a

0.1 contribution Relative

0 0 0 0

C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G C>A C>G C>T T>A T>C T>G

Supplementary Figure 3. Comparison between genotoxin signatures in C. elegans and in human cells bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figure 4

Selected genotoxin-repair interaction cases. A. Mutational signature and fold-change per mutation type for EMS exposure under polk-1 deficiency. ​ B. Mutational signature and fold-change per mutation type for EMS exposure under agt-1 deficiency. ​ C. Mutational signature and fold-change per mutation type for MMS exposure under rev-3 deficiency. ​ D. Mutational signature and fold-change per mutation type for aristolochic acid exposure under xpf-1 deficiency. ​ ​ E. Dose responses for experiments described in A-D. The y-axis reflects the dose while ​ ​ ​ ​ the x-axis reflects the number of mutations per sample.

32 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Interaction of polk-1 TLS deficiency and EMS treatment D Interaction of xpf-1 NER deficiency and Aristolochic acid treatment 30 WT + EMS 2.0 WT + Artostolochic acid 20 1.5 10 1.0 0.5 0 30 0.0 polk-1–/– + EMS 30 20 xpf-1–/– + Aristolochic Acid 20 Mutations / 10mM 10 10 0 Mutations / 10 muM 100 0

10 100

1 10 Fold change <0.1 1 C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV Fold change <0.1 B Interaction of agt-1 DR deficiency and EMS treatment C>A C>G C>T T>A T>C T>G MNV del. del/ins ins. SV

40 agt-1–/– + EMS

20

0 E Dose effects for different interactions Mutations / 10mM 100 Dose [mM] Dose [μM] 10 0 0

1 5 WT + EMS 10 WT + AA Fold change <0.1 12 25 C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV 0 C 0 –/– Interaction rev-3 TLS deficiency and MMS treatment –/– xpf-1 + 5 polk-1 + EMS Aristolochic 0.8 Acid 20 12 WT + MMS 2 10 0

–/– 0 20 40 60 80 100 0 5 agt-1 + EMS Dose [mM] 20 rev-3–/– + MMS 12 0

Mutations / mM 10 0 100 200 300 400 500 600 700 0.05 WT + MMS 0 10 Number of mutations 0.15

1 0 rev-3–/– + MMS

Fold change 0.05 <0.1 C>A C>G C>T T>A T>C T>G MNV del. del/insins. SV 0.15

0 20 40 60 80 100 Supplementarty Figure 4. Selected genotoxin-repair interaction cases in C.elegans Number of mutations bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figure 5

Estimated interaction effects for 72 combinations of genetic backgrounds and genotoxins which showed a significant effect either on the total burden or the mutational profile. Each barplot reflects the fold-change in the number of mutations of different types per average dose or mutagen. The information on the genetic background, doses and mutagen specification is given in Supplementary Table 1. ​ ​

33 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

agt-1+EMS agt-1+MMS brc-1+DMS brc-1+Radiation 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

brd-1+Radiation bub.ok3437.-3+Radiation ced.4.EMS ced.4.MMS 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

csb-1+Cisplatin csb-1+MMS dog-1+Radiation exo-1+Radiation 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

fan-1+Cisplatin fan-1+EMS fcd-2+Cisplatin helq-1+Cisplatin 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

helq-1+DMS him.6.EMS him.6.MMS him.6.UV 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

lem-3+Radiation lig.4.Cisplatin mlh-1+DMS mlh-1+EMS 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

mrt-2+Radiation mus.81+Cisplatin mus.81+UV ndx.4.MMS 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

parp-1+EMS parp-2+DMS parp-2+EMS pole.4.UV 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

polh-1+Aflatoxin.B1 polh-1+Cisplatin polh-1+EMS polh-1+Radiation 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

polk-1+DMS polk-1+EMS polk-1+MMS polq-1+Aflatoxin.B1 100 100 100 100 10 10 10 10 1 1 1 1 <0.1 <0.1 <0.1 <0.1

polq-1+UV rad.TG3308..54B.Radiation rad.TG3312+.54B.Radiation rev-1+Aflatoxin.B1 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

rev-1+DMS rev-3+MMS rev-3+UV slx-1+DMS 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

smg-1+AristolochicAcid ung-1+Radiation wrn-1+Radiation xpa-1+Aflatoxin.B1 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

xpa-1+AristolochicAcid xpa-1+EMS xpa-1+MMS xpa-1+Radiation 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

xpc-1+Aflatoxin.B1 xpc-1+AristolochicAcid xpc-1+Cisplatin xpc-1+DMS 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

xpc-1+EMS xpc-1+MMS xpc-1+Radiation xpc-1+UV 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

xpf-1+Aflatoxin.B1 xpf-1+AristolochicAcid xpf-1+Cisplatin xpf-1+DMS 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

xpf-1+EMS xpf-1+MMS xpf-1+Radiation xpf-1+UV 100 100 100 100

10 10 10 10

1 1 1 1

<0.1 <0.1 <0.1 <0.1

Supplementary Figure 5. Selected genotoxin-repair interactions in C.elegans bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figure 6

Somatic mutations and silencing of DNA repair pathway genes in human cancers. A. Percentages of samples with mono- or biallelic defects in core components of the DNA repair pathway genes by cancer type for 9,948 cancers from TCGA. B. Percentages of samples with biallelic inactivation of core components of the respective DNA repair pathway by cancer type. C. Age-adjusted mutation rates in samples without or with mono- or biallelic DNA repair pathway deficiency. Each dot represents the number of mutations per year per Megabase, colored bars denote the medians in each group. Only combinations with Wilcoxon rank sum test FDR under 5% are shown. Stars denote the cancer types where the presence of damaging mutations in the pathway is unlikely to be explained just by the increase in mutational burden (see Supplementary Methods). ​ ​ D. Median mutational signature change between samples with DNA repair pathway with respect to median wild-type signature.

34 Supplementary Figure6.Somaticmutations andsilencing ofDNA genesinhumancancers repair pathway C A Mutations per year per Mb Percent of samples 0.01 0.1 10 bioRxiv preprint 1 20 40 60 80 Samples withatleastonemonoallelicmutationinDNA repairgenes Mutation ratesofsampleswithDNA repairpathwaymutationsorsilencing 0 not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade

* * * * BLCA * * *

BER ● ● Median biallelicinactivation Median monoallelicinactivation Median wildtype ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● BRCA ● ● ● ● GBM ● doi: ● ● ● ● ● ●

DR ● ● ●

HNSC ● ● ● ● ● ● ● ● ● DS ● ● ● ● ● ● ● ● ● https://doi.org/10.1101/686295 ● ● LGG UCS ● LIHC ● ● ● ● DS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PAAD OV SKCM Repair pathway ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

FA ● ● ● ● ● ● ● ● ● ● ● ● ●

DR BRCA available undera LGG ● ● ● BER ● ● ● ● ● HR ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● COAD ● *

BRCA MMR ; ● this versionpostedJune28,2019. ● ● ● HR ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● * * ● ● ● ● ● ● ●

OV ● CC-BY-NC-ND 4.0Internationallicense PRAD ● NER ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● * ● ● ● ● ● ● ● ● ●

BRCA ● * *

MMR COAD ● ● ● ● NHEJ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

OV ● ● ● ● ● ● ● ● ● STAD ● ● ● ●

TLS ● ● UCEC ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● POLE ● ● ● COAD exo UCEC The copyrightholderforthispreprint(whichwas +MMR POLE UCEC **** ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● . exo LUAD LIHC LGG LAML KIRP KIRC KICH HNSC GBM ESCA COAD CESC BRCA BLCA ACC Cancer type ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● UCS UCEC THYM THCA TGCT STAD SKCM SARC READ PRAD PCPG PAAD OV MESO LUSC

Percent of samples B CosineCosine distance distance D

0.0 0.1 0.2 0.3 0.4 0.5 0.6 Percent of samples 0 10 20 30 40 50 60 30 50 10 20 40 60 0 Samples withbiallelicdeactivationofDNA repairgenes ● Signature changesinsampleswithDNA repairpathwaymutation orsilencing ● ● ● ● ● ● Homozygous Heterozygous Biallelic Monoallelic BER ● BER ● ● ● E RD AH M E HJTLS NHEJ NER MMR HR FA DS DR BER ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

DR ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GBM ● ● DR ● ● ● ● ● ● ● ● Percent ofsampleswithbiallelicdeactivation ● ● ● ● ● ● ● ● ● ● ● ● DS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Change oftheaverageChange profile ●

DS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● READ OV UCS ● ● ● ● ●

FA ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

FA ● ● ● ● ● HR ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PRAD ● ● Pathway ● ● ● Repair pathway ● ● ● Pathway ● ● ● ●

MMR ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● HR ● COAD ● ● ● ● UCEC ● STAD ● ● ● UCEC ● BRCA ● ● ● ● ● OV ● ● ● ● ● POLE ● ● ●

exo ● ● ● ● ● MMR ● ● ● ● ● POLE ● ● ● ● ● ● ● ● ● ● UCEC ACC* + MMR ● ● ● ●

exo ● ● ● UCEC STAD ●

NER ● ● ● ● ● ● ● NER ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● NHEJ

NHEJ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

TLS ● ● ●

● ● ● TLS ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● TGCT* ● ● ● ● ● ● ● ● bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Figure 7

Selected DNA damage-repair deficiency interactions in human cancer. A. 5 mutational signatures extracted from Uterine Cancers, and the POLE exonuclease variant signature with effects of different POLE hotspot mutations (P286R and V411L) and interaction with MMR deficiency (POLE + MMRD). B. Mutational signature of UV light in samples with functioning (top panel) and mutated NER (lower panel). The barplot underneath shows the fold-change per mutation type. C. Mutational signature of in lung cancer samples with functioning (top panel) and mutated NER (lower panel). The barplot underneath shows the fold-change per mutation type. D. Changes of MMR deficiency signatures across different tissues. Upper panel represents the appearance of the most widespread MMR signature in different tissues. Lower panel represents the scatterplot of per-year rates of CpG>TpG mutations, 1-bp insertions, and 1-bp deletions in MMR deficient versus MMR proficient samples across tissues. Dotted lines represent the linear slope fitted to the rates of MMR deficient versus proficient samples.

35 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A POLE-exo variants and MMR deficiency B UV and NER deficiency

0.3 30 UV + WT 0.2 MMRD−2 0.1 20 0.0 10 0.3 0.2 HRD 0 0.1 30 UV + NERD 0.0 0.3 20 0.2 APOBEC 10 0.1 Number of mutations/year 0 0.0 0.3 0.2 MMRD-1 2 0.1 0.0 0.3 1 0.2 POLE 0.1 Fold change mut/wt Relative contribution Relative 0.0 0.3 <0.5 POLE + P286R Complex 0.2 C>A C>G C>T T>A T>C T>G Deletions indels Insertions 0.1 0.0 0.3 0.2 POLE + V411L D MMR deficiency signatures across tissues 0.1 0.2 Colorectal 0.0 0.3 0.1 POLE + MMRD 0.2 0.0 0.1 Stomach 0.0 0.1 Complex C>A C>G C>T T>A T>C T>G Deletions indels Insertions 0.0 Smoking and NER deficiency C Liver 1 Smoking WT 0.1 Relative contribution Relative 0.0 0.5 Uterus

0 0.1 1 Smoking + NERD 0.0 C>A C>G C>T T>A T>C T>G Deletions Complex Insertions 0.5 indels Number of mutations / year Number of mutations

0 1bp deletions 1 bp insertions CpG > TpG 1.0 4 12 ● ● COAD COAD 10 ● COAD 0.8 3 8 0.6 STAD● ● UCEC 6 ● 2 ● STAD ● UCEC ● 0.4 STAD UCEC 4 1 ● ● CESC 0.2 ● 2 ● ● BRCA PRAD Mutations/year MMRD Fold change mut/wt ●● Mutations/year MMRD ● ● ● Mutations/year MMRD ●●● ●● ● ● Complex 0 ● 0.0 0 C>A C>G C>T T>A T>C T>G Deletions indels Insertions 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 Mutations/year, wild-type Mutations/year, wild-type Mutations/year, wild-type

Supplementary Figure 7. Selected damage-repair deficiency interactions in human cancer bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Tables

Supplementary Table 1

Description of genetic backgrounds used in the study, sample annotations, mutation counts and design matrix for all C. elegans samples. ​ ​

Supplementary Table 2

Experimental mutational signatures for genetic (including wild-type and DNA repair knockouts) and mutagenic (12 genotoxins used in the study) factors in C. elegans along with ​ ​ their 95% credible intervals.

Supplementary Table 3

Genotoxin-repair interaction effects: log-fold changes of mutagen signature per mutation type, and dose-dependent genotype amplification factors along with their 95% credible intervals.

Supplementary Table 4

Map of DNA repair genes with pathways, TCGA cancer names abbreviations, and TCGA samples with mutations in DNA pathways.

Supplementary Table 5

dN/dS values for DNA repair related genes across all cancers types and for DNA repair pathways in different cancer types.

36 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References

Alexandrov, L., Kim, J., Haradhvala, N.J., Huang, M.N., Ng, A.W.T., Boot, A., Covington, K.R., Gordenin, D.A., Bergstrom, E., Lopez-Bigas, N., et al. (2018). The Repertoire of Mutational Signatures in Human Cancer. bioRxiv.

Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Aparicio, S.A.J.R., Behjati, S., Biankin, A.V., Bignell, G.R., Bolli, N., Borg, A., Børresen-Dale, A.-L., et al. (2013). Signatures of mutational processes in human cancer. Nature 500, 415–421. ​ ​ Alexandrov, L.B., Ju, Y.S., Haase, K., Van Loo, P., Martincorena, I., Nik-Zainal, S., Totoki, Y., Fujimoto, A., Nakagawa, H., Shibata, T., et al. (2016). Mutational signatures associated with tobacco smoking in human cancer. Science 354, 618–622. ​ ​ Antoniou, A., Pharoah, P.D.P., Narod, S., Risch, H.A., Eyfjord, J.E., Hopper, J.L., Loman, N., Olsson, H., Johannsson, O., Borg, A., et al. (2003). Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case Series unselected for family history: a combined analysis of 22 studies. Am. J. Hum. Genet. 72, 1117–1130. ​ ​ Armitage, P., and Doll, R. (1954). The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer 8, 1–12. ​ ​ Arvanitis, M., Li, D.-D., Lee, K., and Mylonakis, E. (2013). Apoptosis in C. elegans: lessons for cancer and immunity. Frontiers in Cellular and Infection Microbiology 3. ​ ​ Behjati, S., Gundem, G., Wedge, D.C., Roberts, N.D., Tarpey, P.S., Cooke, S.L., Van Loo, P., Alexandrov, L.B., Ramakrishna, M., Davies, H., et al. (2016). Mutational signatures of ionizing radiation in second malignancies. Nat. Commun. 7, 12605. ​ ​ Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. ​ ​ Bertolini, S., Wang, B., Meier, B., Hong, Y., and Gartner, A. (2017). Caenorhabditis elegans BUB-3 and SAN-1/MAD3 Spindle Assembly Checkpoint Components Are Required for Genome Stability in Response to Treatment with Ionizing Radiation. G3 7, 3875–3885. ​ ​ Boot, A., Huang, M.N., Ng, A.W.T., Ho, S.-C., Lim, J.Q., Kawakami, Y., Chayama, K., Teh, B.T., Nakagawa, H., and Rozen, S.G. (2018). In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors. Genome Res. 28, 654–665. ​ Boulton, S.J., Gartner, A., Reboul, J., Vaglio, P., Dyson, N., Hill, D.E., and Vidal, M. (2002). Combined functional genomic maps of the C. elegans DNA damage response. Science 295, ​ ​ 127–131.

Bradford, P.T., Goldstein, A.M., Tamura, D., Khan, S.G., Ueda, T., Boyle, J., Oh, K.-S., Imoto, K., Inui, H., Moriwaki, S.-I., et al. (2011). Cancer and neurologic degeneration in xeroderma pigmentosum: long term follow-up characterises the role of DNA repair. J. Med. Genet. 48, ​ ​ 168–176.

Cancer Genome Atlas Network (2015). Genomic Classification of Cutaneous Melanoma. Cell

37 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

161, 1681–1696. ​ Cancer Genome Atlas Research Network (2012). Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525. ​ ​ Cancer Genome Atlas Research Network (2014a). Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550. ​ ​ Cancer Genome Atlas Research Network (2014b). Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322. ​ ​ Cancer Genome Atlas Research Network, Kandoth, C., Schultz, N., Cherniack, A.D., Akbani, R., Liu, Y., Shen, H., Robertson, A.G., Pashtan, I., Shen, R., et al. (2013). Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73. ​ ​ Gerstung, M., Jolly, C., Leshchiner, I., Dentro, S.C., Gonzalez, S., Mitchell, T.J., Rubanova, Y., Anur, P., Rosebrock, D., Yu, K., et al. (2017). The evolutionary history of 2,658 cancers. bioRxiv.

Golding, N. (2018). greta: Simple and Scalable Statistical Modelling in R. R-package v. 0.2.3. http://CRAN.R-project.org/package=greta. ​ Greenman, C., Wooster, R., Futreal, P.A., Stratton, M.R., and Easton, D.F. (2006). Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173, 2187–2198. ​ ​ Grossman, R.L., Heath, A.P., Ferretti, V., Varmus, H.E., Lowy, D.R., Kibbe, W.A., and Staudt, L.M. (2016). Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine 375, 1109–1112. ​ ​ Haradhvala, N.J., Kim, J., Maruvka, Y.E., Polak, P., Rosebrock, D., Livitz, D., Hess, J.M., Leshchiner, I., Kamburov, A., Mouw, K.W., et al. (2018). Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair. Nat. Commun. 9, 1746. ​ ​ Helleday, T., Eshtad, S., and Nik-Zainal, S. (2014). Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598. ​ ​ Huang, M.N., Yu, W., Teoh, W.W., Ardin, M., Jusakul, A., Ng, A.W.T., Boot, A., Abedi-Ardekani, B., Villar, S., Myint, S.S., et al. (2017). Genome-scale mutational signatures of aflatoxin in cells, mice, and human tumors. Genome Res. 27, 1475–1486. ​ ​ Kim, H., Zheng, S., Amini, S.S., Virk, S.M., Mikkelsen, T., Brat, D.J., Grimsby, J., Sougnez, C., Muller, F., Hu, J., et al. (2015). Whole-genome and multisector exome sequencing of primary and post-treatment glioblastoma reveals patterns of tumor evolution. Genome Res. 25, 316–327. ​ Kim, J., Mouw, K.W., Polak, P., Braunstein, L.Z., Kamburov, A., Kwiatkowski, D.J., Rosenberg, J.E., Van Allen, E.M., D’Andrea, A., and Getz, G. (2016). Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606. ​ Knijnenburg, T.A., Wang, L., Zimmermann, M.T., Chambwe, N., Gao, G.F., Cherniack, A.D., Fan, H., Shen, H., Way, G.P., Greene, C.S., et al. (2018). Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas. Cell Rep. 23, ​ ​

38 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

239–254.e6.

Kucab, J.E., Zou, X., Morganella, S., Joel, M., Nanda, A.S., Nagy, E., Gomez, C., Degasperi, A., Harris, R., Jackson, S.P., et al. (2019). A Compendium of Mutational Signatures of Environmental Agents. Cell.

Lans, H., and Vermeulen, W. (2011). Nucleotide Excision Repair inCaenorhabditis elegans. Molecular Biology International 2011, 1–12. ​ ​ Li, Y., Roberts, N., Weischenfeldt, J., Wala, J.A., and Shapira, O. (2017). Patterns of structural variation in human cancer. bioRxiv.

Maaten, L. van der, and Hinton, G. (2008). Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605. ​ ​ Martincorena, I., Raine, K.M., Gerstung, M., Dawson, K.J., Haase, K., Van Loo, P., Davies, H., Stratton, M.R., and Campbell, P.J. (2017). Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 171, 1029–1041.e21. ​ ​ Matsumura, S., Fujita, Y., Yamane, M., Morita, O., and Honda, H. (2018). A genome-wide mutation analysis method enabling high-throughput identification of chemical mutagen signatures. Scientific Reports 8. ​ ​ Meier, B., Cooke, S.L., Weiss, J., Bailly, A.P., Alexandrov, L.B., Marshall, J., Raine, K., Maddison, M., Anderson, E., Stratton, M.R., et al. (2014). C. elegans whole-genome sequencing reveals mutational signatures related to carcinogens and DNA repair deficiency. Genome Res. 24, 1624–1636. ​ ​ Meier, B., Volkova, N.V., Hong, Y., Schofield, P., Campbell, P.J., Gerstung, M., and Gartner, A. (2018). Mutational signatures of DNA mismatch repair deficiency in C. elegans and human cancers. Genome Res. 28, 666–675. ​ ​ Morganella, S., Alexandrov, L.B., Glodzik, D., Zou, X., Davies, H., Staaf, J., Sieuwerts, A.M., Brinkman, A.B., Martin, S., Ramakrishna, M., et al. (2016). The topography of mutational processes in breast cancer genomes. Nat. Commun. 7, 11383. ​ ​ Neal, R.M., and Others (2011). MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 2. ​ ​ Ng, A.W.T., Poon, S.L., Huang, M.N., Lim, J.Q., Boot, A., Yu, W., Suzuki, Y., Thangaraju, S., Ng, C.C.Y., Tan, P., et al. (2017). Aristolochic acids and their derivatives are widely implicated in liver cancers in Taiwan and throughout Asia. Sci. Transl. Med. 9. ​ ​ Nik-Zainal, S., Alexandrov, L.B., Wedge, D.C., Van Loo, P., Greenman, C.D., Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L.A., et al. (2012). Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993. ​ ​ Nordling, C.O. (1953). A new theory on cancer-inducing mechanism. Br. J. Cancer 7, 68–72. ​ ​ Pearl, L.H., Schierz, A.C., Ward, S.E., Al-Lazikani, B., and Pearl, F.M.G. (2015). Therapeutic opportunities within the DNA damage response. Nat. Rev. Cancer 15, 166–180. ​ ​ Petljak, M., Alexandrov, L.B., Brammeld, J.S., Price, S., Wedge, D.C., Grossmann, S., Dawson, K.J., Ju, Y.S., Iorio, F., Tubio, J.M.C., et al. (2019). Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals Episodic APOBEC Mutagenesis. Cell 176, ​ ​

39 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1282–1294.e20.

Poon, S.L., Pang, S.-T., McPherson, J.R., Yu, W., Huang, K.K., Guan, P., Weng, W.-H., Siew, E.Y., Liu, Y., Heng, H.L., et al. (2013). Genome-wide mutational signatures of aristolochic acid and its application as a screening tool. Sci. Transl. Med. 5, 197ra101. ​ ​ Rausch, T., Zichner, T., Schlattl, A., Stütz, A.M., Benes, V., and Korbel, J.O. (2012). DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339. ​ Roberts, S.A., and Gordenin, D.A. (2014). Hypermutation in human cancer genomes: footprints and mechanisms. Nat. Rev. Cancer 14, 786. ​ ​ Roerink, S.F., van Schendel, R., and Tijsterman, M. (2014). Polymerase theta-mediated end joining of replication-associated DNA breaks in C. elegans. Genome Res. 24, 954–962. ​ ​ Shlien, A., Campbell, B.B., de Borja, R., Alexandrov, L.B., Merico, D., Wedge, D., Van Loo, P., Tarpey, P.S., Coupland, P., Behjati, S., et al. (2015). Combined hereditary and somatic mutations of replication error repair genes result in rapid onset of ultra-hypermutated cancers. Nat. Genet. 47, 257–262. ​ ​ Sieber, O.M., Tomlinson, S.R., and Tomlinson, I.P.M. (2005). Tissue, cell and stage specificity of (epi)mutations in cancers. Nat. Rev. Cancer 5, 649–655. ​ ​ Szikriszt, B., Póti, Á., Pipek, O., Krzystanek, M., Kanu, N., Molnár, J., Ribli, D., Szeltner, Z., Tusnády, G.E., Csabai, I., et al. (2016). A comprehensive survey of the mutagenic impact of common cancer cytotoxics. Genome Biol. 17, 99. ​ ​ Taira, K., Kaneto, S., Nakano, K., Watanabe, S., Takahashi, E., Arimoto, S., Okamoto, K., Schaaper, R.M., Negishi, K., and Negishi, T. (2013). Distinct pathways for repairing mutagenic lesions induced by methylating and ethylating agents. Mutagenesis 28, 341–350. ​ ​ Tijsterman, M., Pothof, J., and Plasterk, R.H.A. (2002). Frequent mutations and somatic repeat instability in DNA mismatch-repair-deficient Caenorhabditis elegans. Genetics 161, 651–660. ​ ​ Tomasetti, C., Vogelstein, B., and Parmigiani, G. (2013). Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc. Natl. Acad. Sci. U. S. A. 110, 1999–2004. ​ ​ Tomasetti, C., Marchionni, L., Nowak, M.A., Parmigiani, G., and Vogelstein, B. (2015). Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc. Natl. Acad. Sci. U. S. A. 112, 118–123. ​ ​ Tubbs, A., and Nussenzweig, A. (2017). Endogenous DNA Damage as a Source of Genomic Instability in Cancer. Cell 168, 644–656. ​ ​ Vogelstein, B., and Kinzler, K.W. (2015). The Path to Cancer --Three Strikes and You’re Out. N. Engl. J. Med. 373, 1895–1898. ​ ​ Wang, V.G., Kim, H., and Chuang, J.H. (2018). Whole-exome sequencing capture kit biases yield false negative mutation calls in TCGA cohorts. PLoS One 13, e0204912. ​ ​ Wheless, L., Kistner-Griffin, E., Jorgensen, T.J., Ruczinski, I., Berthier-Schaad, Y., Kessing, B., Hoffman-Bolton, J., Francis, L., Shugart, Y.Y., Strickland, P.T., et al. (2012). A

40 bioRxiv preprint doi: https://doi.org/10.1101/686295; this version posted June 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

community-based study of nucleotide excision repair polymorphisms in relation to the risk of non-melanoma skin cancer. J. Invest. Dermatol. 132, 1354–1362. ​ ​ Ye, K., Schulz, M.H., Long, Q., Apweiler, R., and Ning, Z. (2009). Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871. ​ ​ Yoon, J.-H., Roy Choudhury, J., Park, J., Prakash, S., and Prakash, L. (2017). Translesion synthesis DNA polymerases promote error-free replication through the minor-groove DNA adduct 3-deaza-3-methyladenine. J. Biol. Chem. 292, 18682–18688. ​ ​ Yoon, J.-H., McArthur, M.J., Park, J., Basu, D., Wakamiya, M., Prakash, L., and Prakash, S. (2019). Error-Prone Replication through UV Lesions by DNA Polymerase θ Protects against Skin Cancers. Cell 176, 1295–1309.e15. ​ ​ Zheng, C.L., Wang, N.J., Chung, J., Moslehi, H., Sanborn, J.Z., Hur, J.S., Collisson, E.A., Vemula, S.S., Naujokas, A., Chiotti, K.E., et al. (2014). Transcription restores DNA repair to heterochromatin, determining regional mutation rates in cancer genomes. Cell Rep. 9, ​ ​ 1228–1234.

41