bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Germline APOBEC3B deletion in Asian women increases somatic

2 hypermutation in breast cancer that is associated with Her2 subtype, PIK3CA

3 mutations, immune activation, and increased survival 4 5 6 Jia-Wern Pan1, Muhammad Mamduh Ahmad Zabidi1, Boon-Keat Chong1, Mei-Yee Meng1, Pei-Sze 7 Ng1,2, Siti Norhidayu Hasan1, Bethan Sandey3, Saira Bahnu4, Pathmanathan Rajadurai4, Cheng-Har 8 Yip2,4, Oscar M. Rueda3, Carlos Caldas3,5,6, Suet-Feung Chin3*, Soo-Hwang Teo1,2* 9 10 11 1Cancer Research Malaysia, No. 1, Jalan SS12/1A, 47500 Subang Jaya, Malaysia 12 2University Malaya Cancer Research Institute, Faculty of Medicine, University Malaya, Kuala Lumpur, 13 Malaysia 14 3Cancer Research UK, Cambridge Institute & Department of Oncology, Li Ka Shing Centre, Robinson 15 Way, Cambridge CB2 0RE, UK 16 4Subang Jaya Medical Centre, No. 1, Jalan SS12/1A, 47500 Subang Jaya, Malaysia 17 5Cambridge Breast Cancer Research Unit, CRUK Cambridge Cancer Centre, Cambridge, UK 18 6NIHR Cambridge Biomedical Research Centre and Cambridge Experimental Cancer Medicine Centre, 19 Cambridge University Hospital NHS Foundation Trust, Cambridge, UK 20 *These authors share senior authorship. 21 *To whom correspondence should be addressed: [email protected]; suet- 22 [email protected] 23

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Abstract 2 3 A 30-kb deletion that eliminates the coding region of APOBEC3B (A3B) is >5 times more common in 4 women of Asian compared to European descent. This polymorphism creates a chimera with the 5 APOBEC3A (A3A) coding region and A3B 3’UTR, and is associated with an increased risk for breast 6 cancer in Asian women. Here, we explored the relationship between the A3B deletion polymorphism 7 with tumour characteristics in Asian women. Using whole exome and whole transcriptome 8 sequencing data of 527 breast tumours, we report that germline A3B deletion polymorphism leads 9 to expression of the A3A-B hybrid isoform and increased APOBEC-associated somatic hypermutation. 10 Hypermutated tumours, regardless of A3B germline status, were associated with the Her2 molecular 11 subtype and PIK3CA mutations. Compared to non-hypermutated tumours, hypermutated tumours 12 also had higher neoantigen burden, tumour heterogeneity and immune activation. Taken together, 13 our results suggest that the germline A3B deletion polymorphism, via the A3A-B hybrid isoform, 14 contributes to APOBEC-mutagenesis in a significant proportion of Asian breast cancers. In addition, 15 APOBEC somatic hypermutation, regardless of A3B background, may be an important clinical 16 biomarker for Asian breast cancers. 17 18 19 20 21 22 23 24 25 26 27 28 29 30

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Graphical Abstract 2

3 4 5

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Introduction 2 3 The identification of mutations in cancer cells driven by the APOBEC (“Apolipoprotein B mRNA- 4 editing enzyme, catalytic polypeptide-like”) family of proteins1 has highlighted the role of intrinsic 5 mutational processes in carcinogenesis and tumour evolution. APOBEC are part of the 6 AID/APOBEC superfamily of cytidine deaminases that include activation-induced deaminase (AID), 7 APOBEC1 (A1), APOBEC2 (A2), APOBEC3A-H (A3A to A3H), and APOBEC4 (A4), many of which play 8 important roles in innate immunity2,3. More recently, A3A and A3B, and to a lesser extent, A3H, have 9 been suggested to be endogenous sources of mutations for various cancers. Expression of A3B is 10 associated with APOBEC-associated C-to-T transitions and is upregulated in breast, bladder, lung, 11 head and neck, ovarian, and cervical cancer4–7. In vitro expression of A3A in cell lines has been shown 12 to induce breaks in nuclear DNA and cell cycle arrest8–10. A3H has been postulated to contribute to 13 mutagenesis in breast and lung cancer11, and A3H polymorphisms may contribute to risk for lung 14 cancer12. Together, the data suggest that multiple APOBEC family members contribute to APOBEC 15 mutagenesis in human cancers, with different effects in different cancers. 16 17 A common germline copy number polymorphism deleting a 30 kb coding region of the A3B coding 18 sequence, fusing its 3’ UTR to the coding region of A3A resulting in an alternative A3A-B hybrid 19 allele13,14 (Figure 1a), has been reported. In women of European descent, germline A3B deletion 20 carriers in The Cancer Genome Atlas (TCGA) breast cancer cohort are twice as likely to develop 21 cancers with a large number of mutations that correspond to APOBEC-driven mutational signatures13. 22 The APOBEC mutational signature has been associated with neoantigen burden and immune 23 activation15–17. This suggests that germline A3B deletion may generate tumour-specific antigens, 24 which in turn activate the immune system in breast cancer patients. This may be mediated by the 25 hybrid A3A-B isoform that is associated with APOBEC mutational signatures18, and may be 26 more highly expressed in carriers of the germline A3B deletion17. 27 28 Intriguingly, there are population-specific differences in the APOBEC genomic locus. The deletion 29 polymorphism is significantly more common in individuals of East Asian than of European ancestry 30 (minor allelic frequency of 37% compared to 6%)19. This A3B deletion polymorphism is also 31 associated with a modest increase in risk for breast cancer in women of Asian ancestry and in some 32 studies of women of European ancestry, but not for European women in other studies14,15,20–23. 33 These results raise the possibility that population-specific differences in genetic, lifestyle and 34 environmental factors may modulate the impact of APOBEC3B deletion polymorphism in different

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 populations. To investigate the biological consequences of germline A3B deletion on APOBEC 2 mutagenesis and immune activation in the Asian population, we examined the effect of germline 3 A3B deletion in a cohort of 527 breast tumours from Malaysia, where whole exome and whole 4 transcriptomic sequencing information is available (Pan et al., in revision)24. The high prevalence of 5 the polymorphism allowed for sufficient power to analyse homozygous and heterozygous carriers 6 separately, and also enabled subtype-specific analyses. 7 8 9 Results 10 11 Germline APOBEC3B deletion results in reduced APOBEC3B expression and increased expression of 12 an APOBEC3A-B hybrid isoform 13 14 With the transcriptomic data of 527 breast tumour samples from the MyBrCa cohort, we first 15 investigated the relationship between germline A3B deletion and of the 16 AID/APOBEC family members, particularly those located in the same genomic region (Figure 1a). As 17 expected, homozygous A3B germline deletion carriers had 10-fold lower A3B expression than non- 18 carriers (ANOVA, P < 0.001; Figure 1b), but there was no association between germline status and 19 expression of other in the AID /APOBEC family. We also found that there was significantly 20 higher A3A-B hybrid expression in germline A3B deletion carriers than in non-carriers, with a 21 reciprocal loss of expression in the normal A3A-1 transcript (Kruskal Wallis rank-sum test, P < 0.001; 22 Figure 1c-d). In addition, we found that the overall expression of A3A, A3B and the A3A-1 and A3A-B 23 hybrid isoforms were significantly different between different subtypes (highest in the basal subtype, 24 lowest in the luminal A subtype) (Supp. Fig. 1), and germline A3B deletion leads to loss of expression 25 of A3B and the A3A-1 isoform, and an increase in expression of the A3A-B hybrid isoform, with the 26 overall expression of A3A remaining similar in all breast cancer subtypes analysed (Supp. Fig. 1). 27 28 Germline APOBEC3B deletion increases the risk for signature 2 and 13 hypermutation 29 30 Given that germline A3B deletion affects the expression of A3B and A3A isoforms and that these 31 isoforms have been proposed to have different mutational activities25, we examined the relationship 32 between germline A3B deletion status and mutational signatures of the breast tumours, focusing on 33 the mutational signatures previously detected in breast tumours26–28. Of the mutational signatures 34 analysed, tumours from germline A3Bdel/del carriers were more likely to carry the APOBEC-associated 35 Signatures 2 and 13 compared to non-carriers, while carriers of pathogenic variants in BRCA1, BRCA2,

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 PALB2, ATM and CHEK2 were more likely to have high levels of Signatures 3 (homologous 2 recombination deficiency, HRD) than non-carriers (Fig. 2a-b). Although there was no significant 3 difference in the number of total somatic mutations (single nucleotide variants (SNVs) + indels) 4 across A3B deletion status (ANOVA, P = 0.347; Figure 2c), tumours in germline A3Bdel carriers have a 5 higher proportion of mutations with Signatures 2 and 13, as well as a higher overall rate (mutational 6 proportion multiplied by total somatic mutations) of Signature 2 and 13 mutations (Figure 2d-e). 7 Notably, the level of mutational signatures 2 and 13 was highest in HER2-enriched breast tumours, 8 but there was no association between germline APOBEC3B status and mutational burden in this 9 subtype (Supp. Fig. 2). In contrast, we found that there was a significantly higher rate of mutational 10 signatures 2 and 13 in germline A3B deletion carriers with luminal B breast cancers compared to 11 non-carriers (Supp. Fig. 2). 12 13 We also found that Signature 2 and 13 (APOBEC) somatic hypermutation was significantly more 14 common in germline A3B deletion carriers (Cochran-Armitage test for trend, P < 2.2 × 10-16; Supp. 15 Table 1). Overall, individuals with one copy or two-copy deletion had a relative risk of 1.89 (95% 16 confidence interval [CI]: [1.16, 3.06]) and 2.52 (95% confidence interval [CI]: [1.42, 4.45]) (Table 1) of 17 bearing APOBEC hypermutated cancers. No association was observed with 3 other mutational 18 signatures found in breast cancer (signatures 1, 3 and 8; Chi-squared test, P > 0.05, Supp. Table 2). 19 20 Next, we asked if signature 2 and 13 mutations were associated with A3A or A3B expression in 21 breast cancer overall or in its subtypes. Overall, we found little association between total somatic 22 mutation, the proportion of signature 2 and 13 mutations, and overall rates of signature 2 and 13 23 mutations with A3B gene expression (Supp. Fig. 3a). However, all three mutational measures were 24 significantly associated with expression of both the A3A-1 and A3A-B hybrid isoforms (Supp. Fig. 3b- 25 c). These associations appear to be strongest in the luminal B subtype, and weaker in the basal 26 subtype (Supp. Fig. 4). Together, these results suggest that A3A may be a more important driver of 27 signature 2 and 13 mutagenesis than A3B, and that the drivers of APOBEC mutagenesis may be 28 slightly different across the various molecular subtypes. 29 30 To test the role of A3A in signature 2 and 13 mutagenesis, we compared the relative prevalence of 31 mutations associated with A3A and A3B by examining the mutational context of all CT transitions 32 identified in our tumour samples through whole-exome sequencing (YTCA and RTCA for A3A and 33 A3B, respectively29). We found that A3A-associated YTCA mutations were more common than A3B- 34 associated RTCA mutations in the majority of our tumour samples, regardless of A3B deletion status

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 or hypermutation (Supp. Fig. 5, Supp. Table 3). However, when the analysis was stratified by 2 molecular subtype, this effect was observed in ER-positive and HER2-positive breast cancers, but 3 appears to be weaker in the basal subtype (Supp. Fig. 6). 4 5 Finally, we conducted a multiple regression analysis to determine the factors associated with 6 signature 2 and 13 mutational burden using a linear model. Using backward-stepwise elimination 7 (see Methods), we obtained a minimal model in which only the A3A-B hybrid isoform, A3C, A3H, and 8 three A3A interacting proteins (COX6C, COX7A1 and STAMPBP) were significantly associated with 9 the total rate of signature 2 and 13 mutation (Supp. Fig. 7; Supp. Table 4). The minimal model also 10 corresponded to the best model predicted using the Aikake information criterion (AIC), with an 11 adjusted R-squared of 12.4%. 12 13 14 Germline APOBEC3B deletion does not affect the molecular profile of breast cancers 15 16 To understand the impact of germline A3B deletion status on the molecular profile of breast cancer, 17 we first examined at the frequency of the Integrative Cluster (IntClust) molecular subtypes across 18 different A3B germline status. We found that there were no significant differences in the 19 frequencies of molecular subtypes between A3B germline deletion carriers and non-carriers (Figure 20 3a), suggesting that germline A3B deletion does not significantly affect molecular subtype. We then 21 examined the mutational profiles of these tumours by examining the non-synonymous mutations in 22 the 10 most frequently affected genes in the MyBrCa cohort (Pan et al. 2020, submitted), comparing 23 patients with different germline A3B copy number, but also did not find any significant differences 24 (Figure 3b). We found that germline A3B deletion was marginally associated with a higher number of 25 C-to-T or G-to-A transitions in PIK3CA (Supp. Fig. 8a), but not in the other nine genes. Comparison of 26 PIK3CA hotspot mutations between samples with different A3B copy number revealed that the 27 PIK3CAE545K hotspot mutation is more common in carriers of the A3B deletion, while the 28 PIK3CAH1047R/L hotspot mutation is less common (Supp. Fig. 8b). Of note, the PIK3CAE545K hotspot 29 mutation is the result of the YTCA favoured mutational context for A3A mutagenesis. 30 31 As several recent reports have linked germline A3B deletion to the presentation of tumour- 32 infiltrating immune cells15,16, we set out to explore if similar associations could be observed in our 33 cohort. However, we found no significant difference in HLA-A neoantigen count across germline A3B 34 deletion carriers (ANOVA, P = 0.1965; Figure 3c). Next, we characterised the immune scores of 35 breast cancers across germline A3B copy number using four different algorithms, namely

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 ESTIMATE30, GSVA31 using gene sets for immune cells from Bindea et al. (2013)32, GSVA using the 2 expanded interferon-gamma (IFN-γ) gene set from Ayers et al. (2017)33 and the IMPRES method34. 3 The IFN-γ and IMPRES methods have been used to predict response to checkpoint inhibitor 4 immunotherapy in various cancer types. We did not find any significant difference across A3B copy 5 numbers using the IFN-γ method (Figure 3d), or any of the other three algorithms (Supp. Fig. 9b). 6 Immunohistochemistry (IHC) data using four different markers (anti-CD3, anti-CD4, anti-CD8, and 7 anti-PD-L1) also revealed no significant difference in the staining for anti-CD8 (Figure 3e), as well as 8 for anti-CD3, anti-CD4, and anti-PD-L1 (Supp. Fig. 10) across germline A3B deletion status. We 9 further compared breast tumour heterogeneity, quantified using PyClone35, to A3B copy number, 10 but observed no significant difference in tumour heterogeneity across germline A3B deletion 11 (ANOVA, P = 0.3613; Figure 3f). Finally, our results also showed that A3B deletion status did not 12 influence overall survival (Figure 3g), even though homozygous deletion carriers were significantly 13 more likely to be Stage III, and significantly less likely to be Stage I (Supp. Table 5). 14 15 16 APOBEC somatic hypermutation is associated with Her2 subtype, PIK3CA mutations, immune 17 activation, and increased survival 18 19 Given our finding that germline A3B deletion increases the odds of developing APOBEC somatic 20 hypermutation, we examined the relationship between APOBEC somatic hypermutation and the 21 molecular profile of breast cancer in our cohort. To do so, we repeated the previous analyses, first 22 by comparing the distribution of molecular subtypes and somatic mutations between samples with 23 APOBEC hypermutation and samples without. 24 25 Interestingly, when we compared IntClust subtypes between APOBEC hypermutators and non- 26 hypermutators, we found a higher prevalence of the Her2-associated IntClust 5 in the hypermutated 27 tumours, with corresponding lower prevalence of IntClust 4- and IntClust 6 (Figure 4a). Tumours 28 with APOBEC somatic hypermutation also had significantly higher (3-fold) PIK3CA mutations, 29 marginally higher KMT2C, CDH1, and NF1 mutations, and lower GATA3 mutations (Figure 4b). 30 31 Next, when we compared neoantigen burden between APOBEC-hypermutated tumours and non- 32 hypermutated tumours, we found that hypermutated tumours had a four-fold higher HLA-A 33 neoantigen count compared to non-hypermutated tumours (ANOVA, P < 0.0001; Figure 4c). We 34 found a weak association between HLA-A neoantigen count and A3A-B hybrid expression (Pearson’s

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 correlation, r = 0.125, P = 0.0479; Supp. Fig. 9a), and a stronger association with the mutational rate 2 of signatures 2 and 13 (Pearson’s correlation, r = 0.441, P < 0.0001; Supp. Fig. 9a). Furthermore, the 3 association between APOBEC hypermutation and neoantigen burden was observed across all 4 molecular subtypes (Supp. Figure 11). 5 6 Hypermutated tumours had slightly higher immune scores compared to non-hypermutators. This 7 difference was most prominent when comparing IFN-γ immune scores (Figure 4d), although a similar 8 trend also seen when comparing Bindea and ESTIMATE immune scores, but not with IMPRES scores 9 (Supp. Fig. 9c). This finding was confirmed by IHC data, where hypermutators had a greater amount 10 of staining than non-hypermutators for anti-CD8 (Figure 4e), as well as anti-CD3, anti-CD4, and anti- 11 PD-L1 (Supp. Fig. 10). Additionally, comparisons of mutation rates to immune markers showed a 12 modest correlation between signature 2 and 13 mutations and anti-CD3, anti-CD4, and anti-CD8 13 staining, supporting an association between APOBEC mutagenesis and the prevalence of tumour 14 infiltrating lymphocytes in Asian breast cancer (Supp. Fig. 12). Notably, the association between 15 immune scores and APOBEC hypermutation appears to be strongest in the luminal subtypes (luminal 16 B in particular), with little to no association in ER- subtypes (Supp. Fig. 13) 17 18 We found that hypermutated tumours were predicted by PyClone to have two-fold more subclonal 19 clusters than non-hypermutated tumours (Figure 4f). We also found a strong correlation between 20 the rate of signature 2 and 13 mutation with tumour heterogeneity (Supp. Fig. 14), and the 21 association between hypermutation and tumour heterogeneity was seen in all subtypes, albeit 22 somewhat weaker in the basal subtype (Supp. Fig. 15). 23 24 Finally, we examined the association between APOBEC hypermutation and overall survival. Whilst 25 there was no association between APOBEC hypermutated tumours and tumour stage (Supp. Table 5), 26 patients with hypermutated tumours had better unadjusted overall survival than patients with non- 27 hypermutated cancers (Figure 4g). Similarly, in a multivariable Cox proportional hazard model for 28 overall survival that adjusted for germline A3B copy number, tumour stage, and subtype as 29 covariables, APOBEC hypermutation had a hazard ratio of 0.4, albeit not statistically significant [0.1- 30 1.86] (Supp. Fig. 16). Overall, the data suggest that APOBEC hypermutation may be associated with 31 better overall survival. 32

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Germline A3B deletion does not affect the molecular profile of tumours with APOBEC somatic 2 hypermutation 3 4 Lastly, we asked if APOBEC somatic hypermutation associated with germline A3B deletion led to 5 different outcomes than APOBEC somatic hypermutation driven by other sources. To test this, we 6 focused only on tumours with APOBEC hypermutation and examined their molecular profiles in 7 different A3B backgrounds. We found that the distribution of molecular subtypes and somatic 8 mutations in APOBEC hypermutators were similar, regardless of A3B background (Figure 5a-b). 9 Additionally, we found no significant difference in neoantigen burden (Figure 5c), immune scores 10 (Figure 5d, Supp. Fig. 17), immune IHC markers (Figure 5e, Supp. Fig. 18), and tumour heterogeneity 11 (Figure 5f, Supp. Fig 19) between APOBEC hypermutated tumours with different A3B backgrounds. 12 Similar results were obtained in tumours without APOBEC hypermutation (Supp. Figs. 17-19). Overall 13 survival of patients with hypermutation did not differ when they were further stratified by germline 14 A3B deletion, and survival of patients with non-mutated cancers also did not differ by germline A3B 15 status (Fig. 5g). Together, these results suggest that regardless of whether the source of the 16 mutations is associated with germline A3B deletions, APOBEC hypermutation is associated with 17 increased immune profiles and better survival. 18 19 20 21 Discussion 22 23 In this study, we set out to investigate the biological consequences of germline A3B deletion on 24 breast cancer in Asian women, where the deletion is more common compared to women of 25 European descent. The high prevalence of the polymorphism allowed for sufficient power to analyse 26 homozygous and heterozygous carriers separately, and to conduct subtype-specific analyses. We 27 showed that germline A3B deletion results in reduced gene expression of A3B and the wild-type A3A 28 isoform, with a reciprocal increase in expression of the A3A-B hybrid isoform. Heterozygous A3B 29 deletion carriers have increased risk for APOBEC somatic hypermutation, particularly in the luminal B 30 subtype. Importantly, we further showed that APOBEC somatic hypermutation, is associated with 31 Her2-enriched molecular subtypes and PIK3CA mutations, higher levels of neoantigen burden, 32 immune cell presentation, and tumour heterogeneity, and potentially better overall survival in Asian 33 breast cancer patients. Furthermore, these associations seem to be independent of the source of 34 mutation, as they were the same regardless of A3B background. 35

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 The association we found between APOBEC somatic hypermutation and immune infiltration may 2 have potential implications for breast cancer immuno-oncology in populations where germline A3B 3 deletion is common. Notably, our results are consistent with recent studies suggesting that APOBEC 4 mutagenesis may identify individuals who benefit from immunotherapy36–38. On the other hand, our 5 results are not entirely consistent with previous studies in tumour analysis of breast cancers in 6 women of European descent that described a direct association between germline A3B deletion and 7 immune activation15–17. One possibility is that the deletion only indirectly affects the immune 8 microenvironment via APOBEC hypermutation, which is itself only weakly associated with expression 9 of the A3A-B hybrid isoform, leading to a very weak overall association between germline A3B copy 10 number and the immune microenvironment. 11 12 Our results raise an intriguing question of whether the association between APOBEC hypermutation 13 and immune infiltration holds true for other cancer types in populations where the A3B deletion is 14 common. An analysis of predominantly European TCGA pan-cancer data suggests the role of APOBEC 15 mutagenesis in modulating the tumour immune microenvironment may be significant only in breast 16 cancers17; on the other hand, whole-exome sequencing of 4000 Japanese tumours revealed a weak 17 association between APOBEC mutational signatures and tumour mutational burden (itself a 18 potential response biomarker for checkpoint inhibitors39,40) across several types of cancer41. 19 20 Our finding of a lack of association between germline A3B deletion and any specific molecular 21 subtype is consistent with a previous report that found that the association between germline A3B 22 deletion and increased risk for breast cancer is independent of ER status20. Likewise, our finding that 23 APOBEC somatic hypermutation is highly associated with Her2-enriched molecular subtypes is 24 evocative of previous reports that have linked Her2-enriched subtypes with higher A3B expression42. 25 Besides that, the association between APOBEC mutagenesis with certain PIK3CA hotspot YTCA 26 mutations has also previously been noted43–45, and may be related to the mutagenic predilections of 27 the hybrid A3A/B isoform and PIK3CA’s role as an oncogene. Our results extend this association to 28 germline A3B deletion as well, which may have important implications for therapies that target the 29 PI3K pathway. 30 31 To our knowledge, this is the first study to suggest that APOBEC hypermutation modulates tumour 32 heterogeneity and survival in Asian breast cancer. Some recent reports have indicated that APOBEC 33 mutagenesis contributes to tumour heterogeneity in late stage non-small cell lung carcinoma (NSCLC) 34 and metastatic thoracic tumours46,47, but its role in breast cancer was previously not analysed. The

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 link between APOBEC hypermutation and tumour heterogeneity is unsurprising given that 2 mutagenesis should tend to increase cellular diversity. Likewise, the positive survival effect we found 3 is consistent with previous studies on Asian oral cancer48, and may be the result of hypermutators 4 having higher numbers of tumour-infiltrating immune cells38, which has been linked to better overall 5 survival in some subtypes of breast cancer49. On the other hand, our results contradict previous 6 studies that suggest that APOBEC mutagenesis is associated with poor prognosis in other cancers, 7 such as multiple myeloma50. Our study also confirms previous reports which found that, in contrast 8 with APOBEC mutagenesis or A3A/A3B mRNA expression, germline A3B copy number does not have 9 prognostic value23,51. 10 11 To our knowledge, this is also the first study to elucidate the subtype-specific associations of 12 germline A3B deletion and APOBEC mutagenesis in breast cancer. Our finding that germline A3B 13 deletion is a stronger driver of APOBEC mutagenesis in the luminal B subtype, but a weaker driver in 14 the basal subtype, suggests that further research in this area will need to account for breast cancer 15 subtype as a significant variable. Similarly, our finding that APOBEC hypermutation is more strongly 16 associated with immune activation in the luminal B subtype may help to focus future research 17 efforts. 18 19 Multiple lines of evidence from our analysis suggest that in A3B deletion carriers, increased 20 expression of the A3A-B hybrid isoform drives mutagenesis, suggesting that the A3B 3’-UTR may play 21 an important role in modulating APOBEC mutagenesis. Together with previous studies25,29, this 22 finding contributes to the resolution of the paradox that loss of A3B, an endogenous mutator, is 23 associated with an increased risk of breast cancer in the Asian population by supporting the 24 hypothesis that the A3A-B hybrid isoform is a more potent mutator than either the A3A or A3B 25 normal isoforms. It is also interesting to note that the three A3 members that are significantly 26 associated with APOBEC mutagenesis in our linear model each contain only one zinc-dependent 27 domain (ZD-CDA) and exhibit both cytoplasmic and nuclear localisations52. While 28 it is conceivable that A3D, A3F and A3G did not appear in the model (they are cytoplasmically 29 localised and are even excluded from chromatin throughout mitosis hence have no contact with 30 nuclear DNA53), it is not immediately clear why A3B, which has access to nuclear DNA and is widely 31 regarded as the endogenous source of mutation5–7,42,54, did not turn out as a significant predictor in 32 our model. This may or may not be related to the recent finding that APOBEC3 mutagenesis in vitro 33 is episodic, occurring in intermittent and irregular bursts55. 34

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 As a whole, our study in Asian women validates the results from previous analyses on the biological 2 consequences of germline A3B deletion on breast cancer that were conducted predominantly in 3 women of European descent. In particular, our study highlights the significance of APOBEC 4 hypermutation as an important predictor of the breast tumour microenvironment, and suggests that 5 Asian breast cancer patients with APOBEC hypermutation may be more amenable to 6 immunotherapy. 7 8 9 Acknowledgements 10 This project was funded by a research grant from the Newton-Ungku Omar Fund (MRC Ref: 11 MR/P012442/1) by the British Council and the Malaysian Industry-Government Group for High 12 Technology (MIGHT) to CC and SHT. Cancer Research Malaysia also receives charitable funding from 13 the Scientex Foundation, Estée Lauder Companies, Yayasan Petronas, and Yayasan Sime Darby which 14 contributed to the funding of this study. OMR, CC, and SFC also receive funding from Cancer 15 Research UK. The authors would like to thank Dr. Tan Min Min and Nadia Rajaram for help with data 16 curation, nurses and staff who helped with sample collection, as well as Tan Wee Lin, and all staff at 17 the Subang Jaya Medical Centre Tissue Diagnostics laboratory for assistance with histopathological 18 sample retrieval and processing. All genomics work was undertaken by the Genomics Core Facility 19 CRUK Cambridge Institute. 20 21 Author Contributions 22 JWP led the data analysis and wrote the manuscript. MMAZ and BKC contributed to data analysis 23 and helped to draft the manuscript. MYM, PSN, SB, SNH, and CHY contributed to sample collection 24 and processing and data collection, while BS, OMR, and SFC generated and collected data. PR 25 provided histopathology expertise, and together with CHY collected clinical data. OMR, CC, SFC, and 26 ST designed experiments, interpreted results, and drafted the manuscript. The project was directed 27 and co-supervised by OMR, CC, SFC and ST, and were responsible for final editing. 28 29 Competing Interests 30 The authors declare no competing interests. 31 32 33

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Methods 2 3 Data Description 4 Data for this project was taken primarily from the MyBrCa cohort tumour sequencing project. In 5 brief, shallow whole-genome sequencing (sWGS), whole-exome sequencing (WES), and RNA- 6 sequencing were conducted on 560 sequential female breast cancer patients from a single hospital 7 in Subang Jaya, Malaysia, and analysed together with available clinical and overall survival data. The 8 cohort data and sequencing methods are described in full in Pan et al. (in revision)24. Only the 9 methods important or unique to this study are described below. 10 11 Determining the Presence of the APOBEC3B Deletion 12 To determine the presence of the APOBEC3B (A3B) germline deletion for each individual, we 13 calculated for each matched normal WES sample the ratio (r) of the mean depth of coverage for 35

14 loci in exons within the deletion (din) to the mean depth of coverage for 70 loci in exons immediately

15 adjacent to the deletion (dout). The list of loci used is identical to the list used in Nik Zainal et al. 16 201413. Then, we designated each sample as being homozygous, heterozygous, or normal for the 17 A3B deletion (A3Bdel/del, A3Bwt/del, and A3Bwt/wt respectively) by fitting a 3-component mixture model 18 to the distribution of r using an expectation-maximization algorithm as implemented in the mixtools 19 (v. 1.1) package in R. The gene diagram of the APOBEC locus in Figure 1 is adapted from the UCSC 20 Genome Browser. 21 22 Molecular Subtyping

23 We transformed gene-level count matrices for the MyBrCa cohort into log2 counts-per-million 24 (logCPM) using the voom function from the limma (v. 3.34.9) R package. We then performed 25 quantile normalization on each individual transformed matrix, followed by subtyping according to 26 PAM50 and SCMgene designations using the Genefu package in R (v. 2.14.0), and according to 27 integrative clusters using the iC10 R package (v. 1.5). 28 29 Mutational profiling 30 For SNVs, we used positions called by Mutect2 with following filters: minimum 10 reads in tumour 31 and 5 reads in normal samples, OxoG metric less than 0.8, variant allele frequency (VAF) 0.075 or

32 more, p-value for Fisher’s exact test on the strandedness of the reads 0.05 or more, and SAF more 33 than 0.75. For positions that are present in 5 samples or more, we removed two positions that were 34 not in COSMIC and in single tandem repeats. We also removed variants that have VAF at least 0.01

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 in gnomAD, and considered only variants that are supported by at least 4 alternate reads, with at 2 least 2 reads per strand. For indels, we also required the positions to be called by Strelka2. Variants 3 were annotated using Oncotator version 1.9.9.0. PIK3CA lollipop plots were generated using the 4 MutationMapper tool from cBioPortal56. 5 6 Mutational Signatures 7 To determine the prevalence of previously-reported breast cancer mutational signatures from 8 COSMIC matrices (Signatures 1, 2, 13, 3, 8, 6, 15, 20, 26, 5, 17, 18 and 30), we used deconstructSigs57, 9 restricted to samples with at least 15 SNVs. To determine the difference in mutational signature 10 weights between A3Bdel/del and A3Bwt/wt germline carrier samples, we performed 2-sided rank sum 11 Wilcoxon’s test on the two categories. Rates of mutational signatures 1, 2, 3, 8 and 13 were 12 calculated by multiplying the total somatic SNVs and indels with the proportions of each mutational 13 signature. 14 15 Identification of Hypermutators 16 Using the combined rates of mutational signatures 2 and 13 (simple addition of the rates of each 17 signature), hypermutators were identified based on Nik Zainal et al.’s definition of hypermutators as 18 samples that had a mutational rate of signatures 2+13 exceeding 1.5 times the interquartile range 19 from the 75th percentile13. Outliers for other mutational signatures were identified using a similar 20 approach. 21 22 Profiling the Tumor-Immune Microenvironment 23 We assessed overall immune cell infiltration in the bulk tumour samples from RNA-seq TPM gene 24 expression scores using ESTIMATE (v. 1.0.13)30, as well as with GSVA (v. 1.26) using the combined 25 immune cell gene sets from Bindea et al. 201332. We also scored each sample for the immune 26 features predictive of checkpoint inhibitor immunotherapy using IMPRES scores58, as well as GSVA 27 using the Expanded IFN-gamma gene set from Ayers et al. 201733. 28 29 Validation of Immune Scores with Immunohistochemistry 30 To quantify tumour infiltrating lymphocytes, FFPE blocks for 154 patients with sequencing data were 31 sectioned and stained for anti-CD3 (clone 2GV6, predilute; Ventana Medical Systems), anti-CD4 32 (clone SD35, predilute; Ventana Medical Systems), anti-CD8 (clone SD57, predilute; Ventana Medical 33 Systems) and anti-PD-L1 (clone SP263, predilute; Ventana Medical Systems) using an automated 34 immunostainer (Ventana BenchMark ULTRA; Ventana Medical Systems, Tucson, AZ). Stained slides

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 were digitized using an Aperio AT2 whole slide scanner. CD3, CD4, and CD8 staining was quantified 2 using the Aperio Positive Pixel digital pathology tool (v9 algorithm at 0.16 colour saturation). PD-L1 3 expression was determined using the Combined Positive Score system. 4 5 Quantification of Tumour Heterogeneity 6 Tumour heterogeneity was determined using PyClone (v 0.13.1)35 with default options to estimate 7 the number of subclonal clusters within each tumour sample. Allele counts used for the PyClone 8 input were extracted from the GATK output MAF files, while copy number input data was generated 9 by ASCAT (v. 2.5.2) from WES allele counts generated by alleleCounter (v 4.0.1). 10 11 Survival analysis 12 For each patient, overall survival data was obtained by querying their names and identity card 13 numbers against the Malaysian National Registry records of deaths. Patients that did not return any 14 matches against the database were assumed to still be alive, and vice versa. Length of survival was 15 defined as the period of time from the date when patients were recruited into the study until the 16 date of death as recorded by the Malaysian National Registry for patients who have passed away, or 17 until the date when the Malaysian National Registry was last queried for patients assumed to still be 18 alive. For all survival analyses in this study, only patients with at least two years of survival data were 19 included (n = 367). Unadjusted Kaplan-Meier analyses and log rank tests were conducted using the 20 survival package in R (v. 2.44) and plotted using the “ggtsurvplot” function from the survminer R 21 package (v. 0.4.4). Cox proportional hazard models were built using the “coxph” function from the 22 survival package and plotted using the “ggforest” function from survminer. 23 24 Neoantigen Analysis 25 Sample HLAs were determined using Polysolver59 from tumor and normal DNA WES data. Only HLA 26 alleles that were concordant in tumor and normal WES data were considered. In this analysis, we 27 focused only on HLA-A. This selection was made because the majority of studies identifying peptides 28 that bind to HLA molecules have focused on those recognised by cytotoxic T lymphocytes, hence the 29 prediction of antigen binding to MHC class I molecules is the most studied60. Amongst MHC class I, 30 HLA-A is ranked among the genes with the fastest evolving coding sequence as a result of ever- 31 changing antigen selection pressures, thus serving as a good reflection of neo-antigen burden. 32 Somatic mutations were annotated using the Ensembl Variant Effect Predictor. All possible 33 neoantigen peptides (9- to 11-mers) encompassing the nonsynonymous mutations were predicted

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 using a combination of NetMHCpan and NetMHC on the pVAC-seq platform. Only neoantigens with 2 predicted binding of less than 500nM were considered. 3 4 Extraction of Isoform Data 5 Raw RNA-Seq reads were mapped to the GrCh38 reference , and isoform-level 6 expression was quantified as transcripts-per-million (TPM) using RSEM (v. 1.2.31) with the Homo 7 sapiens GRCh38.97 genome annotation model. 8 9 Gene Expression Statistical Analyses 10 The difference in transcript per million (TPM) expression of all APOBEC family members across 11 A3Bdel/del, A3Bwt/del, and A3Bwt/wt breast cancers was compared using analysis of variance (ANOVA) 12 whereas difference in isoform TPM and percentages were compared using the Kruskal Wallis rank 13 sum test (Figure 1) as these data were not normally distributed. As for the total somatic mutation 14 (SNVs and indels) and rates of mutational signatures 2 and 13 across A3B deletion status, the 15 difference was tested using ANOVA while Kruskal Wallis test was used for the total proportion of 16 mutational signatures (Figure 3). The association between somatic mutations, proportions and rate 17 of signatures 2 and 13 with A3B and A3A-B hybrid expression was tested using Spearman’s 18 regression. As for HLA-A neo-antigen count and tumour heterogeneity, logarithmic transformation 19 of these data was normally distributed, hence Pearson’s correlation was applied in these cases. The 20 difference across A3B deletion and hypermutation status for these data was compared using ANOVA 21 whereas those for immune scores was tested using Kruskal Wallis and Wilcoxon rank sum tests, 22 respectively, for three- and two-group comparisons. 23 24 Backward-stepwise elimination for linear modelling and Akaike Information Criterion 25 The minimal model was built from the original global model using backward-stepwise elimination 26 method implemented in the MuMIn package in R. In our first constructed linear model, the 27 predictors consisted of all APOBEC family members. Subsequent models involved addition of known 28 A3A interacting proteins as predictors and the interaction terms between A3A and its interacting 29 proteins, hoping to improve the explanatory power of our model. The minimal models were then 30 compared to all the possible alternative models generated by AIC to generate a final model. 31 32 33 34

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 References 2 3 1. Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 4 979–93 (2012). 5 2. Refsland, E. W. & Harris, R. S. The APOBEC3 family of retroelement restriction factors. Curr. 6 Top. Microbiol. Immunol. 371, 1–27 (2013). 7 3. Cullen, B. R. Role and Mechanism of Action of the APOBEC3 Family of Antiretroviral 8 Resistance Factors. J. Virol. 80, 1067–1076 (2006). 9 4. Burns, M. B. et al. APOBEC3B is an enzymatic source of mutation in breast cancer.[Erratum 10 appears in Nature. 2013 Oct 24;502(7472):580]. Nature (2013) 11 doi:https://dx.doi.org/10.1038/nature11881. 12 5. Burns, M. B., Temiz, N. A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple 13 human cancers. Nat. Genet. (2013) doi:10.1038/ng.2701. 14 6. Sasaki, H. et al. APOBEC3B gene overexpression in non-small-cell lung cancer. Biomed. reports 15 2, 392–395 (2014). 16 7. Leonard, B. et al. APOBEC3B upregulation and genomic mutation patterns in serous ovarian 17 carcinoma. Cancer Res. (2013) doi:10.1158/0008-5472.CAN-13-1753. 18 8. Landry, S., Narvaiza, I., Linfesty, D. C. & Weitzman, M. D. APOBEC3A can activate the DNA 19 damage response and cause cell-cycle arrest. EMBO Rep. (2011) doi:10.1038/embor.2011.46. 20 9. Mussil, B. et al. Human APOBEC3A Isoforms Translocate to the Nucleus and Induce DNA 21 Double Strand Breaks Leading to Cell Stress and Death. PLoS One (2013) 22 doi:10.1371/journal.pone.0073641. 23 10. Caval, V., Suspène, R., Vartanian, J. P. & Wain-Hobson, S. Orthologous mammalian APOBEC3A 24 cytidine deaminases hypermutate nuclear DNA. Mol. Biol. Evol. (2014) 25 doi:10.1093/molbev/mst195. 26 11. Starrett, G. J. et al. The DNA cytosine deaminase APOBEC3H haplotype i likely contributes to 27 breast and lung cancer mutagenesis. Nat. Commun. (2016) doi:10.1038/ncomms12918. 28 12. Zhu, M. et al. The eQTL-missense polymorphisms of APOBEC3H are associated with lung 29 cancer risk in a Han Chinese population. Sci. Rep. (2015) doi:10.1038/srep14969. 30 13. Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and 31 APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. 32 Genet. (2014) doi:10.1038/ng.2955. 33 14. Klonowska, K. et al. The 30 kb deletion in the APOBEC3 cluster decreases APOBEC3A and 34 APOBEC3B expression and creates a transcriptionally active hybrid gene but does not 35 associate with breast cancer in the European population. Oncotarget (2017) 36 doi:10.18632/oncotarget.19400. 37 15. Wen, W. X. et al. Germline APOBEC3B deletion is associated with breast cancer risk in an 38 Asian multi-ethnic cohort and with immune cell presentation. Breast Cancer Res. (2016) 39 doi:10.1186/s13058-016-0717-1. 40 16. Cescon, D. W., Haibe-Kains, B. & Mak, T. W. APOBEC3B expression in breast cancer reflects 41 cellular proliferation, while a deletion polymorphism is associated with immune activation. 42 Proc. Natl. Acad. Sci. U. S. A. (2015) doi:10.1073/pnas.1424869112. 43 17. Chen, Z. et al. Integrative genomic analyses of APOBEC-mutational signature, expression and 44 germline deletion of APOBEC3 genes, and immunogenicity in multiple cancer types. BMC 45 Med. Genomics 12, 131 (2019). 46 18. Middlebrooks, C. D. et al. Association of germline variants in the APOBEC3 region with cancer 47 risk and enrichment with APOBEC-signature mutations in tumors. Nat. Genet. 48, 1330–1338 48 (2016). 49 19. Kidd, J. M., Newman, T. L., Tuzun, E., Kaul, R. & Eichler, E. E. Population stratification of a 50 common APOBEC gene deletion polymorphism. PLoS Genet. (2007) 51 doi:10.1371/journal.pgen.0030063.

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 20. Long, J. et al. A common deletion in the APOBEC3 genes and breast cancer risk. J. Natl. 2 Cancer Inst. (2013) doi:10.1093/jnci/djt018. 3 21. Rezaei, M., Hashemi, M., Hashemi, S. M., Mashhadi, M. A. & Taheri, M. APOBEC3 Deletion is 4 Associated with Breast Cancer Risk in a Sample of Southeast Iranian Population. Int. J. Mol. 5 Cell. Med. (2015). 6 22. Gansmo, L. B. et al. APOBEC3A/B deletion polymorphism and cancer risk. Carcinogenesis 39, 7 118–124 (2018). 8 23. Göhler, S. et al. Impact of functional germline variants and a deletion polymorphism in 9 APOBEC3A and APOBEC3B on breast cancer risk and survival in a Swedish study population. J. 10 Cancer Res. Clin. Oncol. (2016) doi:10.1007/s00432-015-2038-7. 11 24. Pan, J.-W. et al. The Molecular Landscape of Asian Breast Cancers Reveals Clinically Relevant 12 Population-Specific Differences. bioRxiv Cancer Biol. 2020.04.09.035055 (2020) 13 doi:10.1101/2020.04.09.035055. 14 25. Caval, V., Suspène, R., Shapira, M., Vartanian, J. P. & Wain-Hobson, S. A prevalent cancer 15 susceptibility APOBEC3A hybrid allele bearing APOBEC3B 3′UTR enhances chromosomal DNA 16 damage. Nat. Commun. 5, 1–7 (2014). 17 26. Drost, J. et al. Use of CRISPR-modified human stem cell organoids to study the origin of 18 mutational signatures in cancer. Science (80-. ). 358, 234–238 (2017). 19 27. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415– 20 421 (2013). 21 28. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome 22 sequences. Nature 534, 47–54 (2016). 23 29. Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of 24 background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015). 25 30. Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from 26 expression data. Nat. Commun. 4, (2013). 27 31. Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: Gene set variation analysis for microarray 28 and RNA-Seq data. BMC Bioinformatics 14, 7 (2013). 29 32. Bindea, G. et al. Spatiotemporal dynamics of intratumoral immune cells reveal the immune 30 landscape in human cancer. Immunity 39, 782–795 (2013). 31 33. Ayers, M. et al. IFN-γ-related mRNA profile predicts clinical response to PD-1 blockade. J. Clin. 32 Invest. 127, 2930–2940 (2017). 33 34. Auslander, N. et al. Robust prediction of response to immune checkpoint blockade therapy in 34 metastatic melanoma. Nature Medicine vol. 24 1545–1549 (2018). 35 35. Roth, A. et al. PyClone: Statistical inference of clonal population structure in cancer. Nat. 36 Methods 11, 396–398 (2014). 37 36. Boichard, A. et al. APOBEC-related mutagenesis and neo-peptide hydrophobicity: implications 38 for response to immunotherapy. Oncoimmunology 8, 1550341 (2019). 39 37. Wang, S., Jia, M., He, Z. & Liu, X. S. APOBEC3B and APOBEC mutational signature as potential 40 predictive markers for immunotherapy response in non-small cell lung cancer. Oncogene 37, 41 3924–3936 (2018). 42 38. Glaser, A. P. et al. APOBEC-mediated mutagenesis in urothelial carcinoma is associated with 43 improved survival, mutations in DNA damage response genes, and immune response. 44 Oncotarget 9, 4537–4548 (2018). 45 39. Rizvi, N. A. et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 46 blockade in non-small cell lung cancer. Science (80-. ). 348, 124–128 (2015). 47 40. Goodman, A. M. et al. Tumor mutational burden as an independent predictor of response to 48 immunotherapy in diverse cancers. Mol. Cancer Ther. 16, 2598–2608 (2017). 49 41. Hatakeyama, K. et al. Mutational burden and signatures in 4000 Japanese cancers provide 50 insights into tumorigenesis and response to therapy. Cancer Sci. 110, 2620–2628 (2019). 51 42. Roberts, S. A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 human cancers. Nat. Genet. (2013) doi:10.1038/ng.2702. 2 43. Temko, D., Tomlinson, I. P. M., Severini, S., Schuster-Böckler, B. & Graham, T. A. The effects of 3 mutational processes and selection on driver mutations across cancer types. Nat. Commun. 9, 4 (2018). 5 44. Kosumi, K. et al. APOBEC3B is an enzymatic source of molecular alterations in esophageal 6 squamous cell carcinoma. Med. Oncol. 33, 1–9 (2016). 7 45. McGranahan, N. et al. Clonal status of actionable driver events and the timing of mutational 8 processes in cancer evolution. Sci. Transl. Med. 7, (2015). 9 46. de Bruin, E. C., McGranahan, N. & Swanton, C. Analysis of intratumor heterogeneity unravels 10 lung cancer evolution. Mol. Cell. Oncol. (2015) doi:10.4161/23723556.2014.985549. 11 47. Roper, N. et al. APOBEC Mutagenesis and Copy-Number Alterations Are Drivers of 12 Proteogenomic Tumor Evolution and Heterogeneity in Metastatic Thoracic Tumors. Cell Rep. 13 (2019) doi:10.1016/j.celrep.2019.02.028. 14 48. Chen, T. W. et al. APOBEC3A is an oral cancer prognostic biomarker in Taiwanese carriers of 15 an APOBEC deletion polymorphism. Nat. Commun. 8, (2017). 16 49. Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of 17 breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet 18 Oncol. 19, 40–50 (2018). 19 50. Walker, B. A. et al. APOBEC family mutational signatures are associated with poor prognosis 20 translocations in multiple myeloma. Nat. Commun. 6, (2015). 21 51. Liu, J. et al. The 29.5 kb APOBEC3B Deletion Polymorphism Is Not Associated with Clinical 22 Outcome of Breast Cancer. PLoS One 11, e0161731 (2016). 23 52. LaRue, R. S. et al. The artiodactyl APOBEC3 innate immune repertoire shows evidence for a 24 multi-functional domain organization that existed in the ancestor of placental mammals. 25 BMC Mol. Biol. (2008) doi:10.1186/1471-2199-9-104. 26 53. Lackey, L., Law, E. K., Brown, W. L. & Harris, R. S. Subcellular localization of the APOBEC3 27 proteins during mitosis and implications for genomic DNA deamination. Cell Cycle (2013) 28 doi:10.4161/cc.23713. 29 54. Burns, M. B. et al. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature 30 (2013) doi:10.1038/nature11881. 31 55. Petljak, M. et al. Characterizing Mutational Signatures in Human Cancer Cell Lines Reveals 32 Episodic APOBEC Mutagenesis. Cell 176, 1282-1294.e20 (2019). 33 56. Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the 34 cBioPortal. Sci. Signal. 6, (2013). 35 57. Rosenthal, R., McGranahan, N., Herrero, J., Taylor, B. S. & Swanton, C. deconstructSigs: 36 Delineating mutational processes in single tumors distinguishes DNA repair deficiencies and 37 patterns of carcinoma evolution. Genome Biol. 17, (2016). 38 58. Auslander, N. et al. Robust prediction of response to immune checkpoint blockade therapy in 39 metastatic melanoma. Nat. Med. 24, 1545–1549 (2018). 40 59. Shukla, S. A. et al. Comprehensive analysis of cancer-associated somatic mutations in class i 41 HLA genes. Nat. Biotechnol. 33, 1152–1158 (2015). 42 60. Hutchison, S. & Pritchard, A. L. Identifying neoantigens for use in immunotherapy. 43 Mammalian Genome vol. 29 714–730 (2018). 44 45 46

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure legends 2 3 Figure 1. The relationship between germline APOBEC3B deletion and expression of APOBEC family 4 members. (a) Diagram showing the APOBEC3 locus on 22, along with the transcript 5 variants of each gene, modified from the UCSC Genome Browser. Location of a common germline 6 deletion in the locus is also shown, where the deletion results in fusion of A3A coding sequence to 7 the A3B 3’ UTR. (b) Gene expression of all APOBEC family members across A3B copy number, 8 quantified as log transcripts-per-million (TPM). (c) Expression of the A3A-B hybrid isoform and the 9 A3A-1 isoform in log TPM across A3B copy number. (d) Prevalence of the A3A-B hybrid isoform and 10 A3A-1 isoform across different A3B copy numbers, measured as the percentage of total A3A gene 11 expression. P-values indicated are for one-way ANOVA. 12 13 Figure. 2. Germline APOBEC3B deletion and mutational signatures. (a) Comparison of mutational 14 signatures between samples with homozygous germline A3B deletion and non-carriers, and, for 15 contrast, (b) between samples with and without germline mutations in homologous recombination 16 genes. (c-e) The (log-normalized) total mutational burden (c), the proportion of mutations with 17 signatures 2 and 13 (d), and the (log-normalized) total rate of signature 2 and 13 mutations (e), 18 between tumour samples with different germline A3B copy number. The grey area in (e) represents 19 somatic hypermutation as defined by Nik-Zainal et al. (2014). P-values are for Kruskal Wallis rank 20 sum tests (d) or one-way ANOVA (c,e). 21 22 Figure 3. Molecular profiles of breast tumours with different germline APOBEC3B copy number. (a) 23 Frequency of Integrative Cluster molecular subtypes across different A3B copy number. (b) 24 Frequency of the top ten most commonly mutated driver genes across A3B copy number. (a-b) 25 Sample sizes are indicated in brackets in the figure legend. Numbers above the bars are p-values for 26 Fisher’s exact test. (c) Log-normalized counts of predicted HLA-A neoantigens in samples with 27 different A3B copy number. (d) Gene set expression scores for the expanded IFN-γ gene set that is 28 predictive of response to immunotherapy from Ayers et al. (2017), using the GSVA method, across 29 A3B copy number. (e) Anti-CD8 IHC staining, measured as the percentage of area, of FFPE tumour 30 samples with different A3B copy number. (f) Tumour heterogeneity, measured as the log-normalized 31 counts of PyClone clusters, across different germline A3B copy number. (c-f) P-values indicated are 32 for one-way ANOVA. (g) Kaplan-Meier plot of overall survival for patients with different A3B copy 33 number. P-value indicated is for an unadjusted log-rank test. 34

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Figure 4. Molecular profiles of breast tumours with signature 2 and 13 (APOBEC) hypermutation. 2 (a) Frequency of Integrative Cluster molecular subtypes in samples with and without APOBEC 3 hypermutation. (b) Frequency of the top ten most commonly mutated driver genes in samples with 4 and without APOBEC hypermutation. (a-b) Sample sizes are indicated in brackets in the figure 5 legend. Numbers above the bars are p-values for Fisher’s exact test. (c) Log-normalized counts of 6 predicted HLA-A neoantigens in samples with and without APOBEC hypermutation. (d) Gene set 7 expression scores for the expanded IFN-γ gene set that is predictive of response to immunotherapy 8 from Ayers et al. (2017), using the GSVA method, in samples with and without APOBEC 9 hypermutation. (e) Anti-CD8 IHC staining, measured as the percentage of area, of FFPE tumour 10 samples with and without APOBEC hypermutation. (f) Tumour heterogeneity, measured as the log- 11 normalized counts of PyClone clusters, in samples with and without APOBEC hypermutation. (c-f) P- 12 values indicated are for one-way ANOVA. (g) Kaplan-Meier plot of overall survival for patients with 13 and without APOBEC hypermutation. P-value indicated is for an unadjusted log-rank test. 14 15 Figure 5. Comparison of molecular profiles of breast tumours with APOBEC hypermutation in 16 different germline APOBEC3B copy number backgrounds. (a) Frequency of Integrative Cluster 17 molecular subtypes in samples with APOBEC hypermutation across different A3B copy number 18 backgrounds. (b) Frequency of the top ten most commonly mutated driver genes in samples with 19 APOBEC hypermutation across different A3B copy number backgrounds. (a-b) Sample sizes are 20 indicated in brackets in the figure legend. Numbers above the bars are p-values for Fisher’s exact 21 test. (c) Log-normalized counts of predicted HLA-A neoantigens in samples with APOBEC 22 hypermutation across different A3B copy number backgrounds. (d) Gene set expression scores for 23 the expanded IFN-γ gene set that is predictive of response to immunotherapy from Ayers et al. 24 (2017), using the GSVA method, in samples with APOBEC hypermutation across different A3B copy 25 number backgrounds. (e) Anti-CD8 IHC staining, measured as the percentage of area, of FFPE 26 tumour samples with APOBEC hypermutation, across different A3B copy number backgrounds. (f) 27 Tumour heterogeneity, measured as the log-normalized counts of PyClone clusters, in samples with 28 APOBEC hypermutation across different A3B copy number backgrounds. (c-f) P-values indicated are 29 for one-way ANOVA. (g) Kaplan-Meier plot of overall survival for patients stratified by APOBEC 30 hypermutation and A3B copy number. P-value indicated is for an unadjusted log-rank test. 31

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page) Figure 1. The relationship between germline APOBEC3B deletion and expression of APOBEC family members. (a) Diagram showing the APOBEC3 locus on , along with the transcript variants of each gene, modified from the UCSC Genome Browser. Location of a common germline deletion in the locus is also shown, where the deletion results in fusion of A3A coding sequence to the A3B 3’ UTR. (b) Gene expression of all APOBEC family members across A3B copy number, quantified as log transcripts-per-million (TPM). (c) Expression of the A3A-B hybrid isoform and the A3A-1 isoform in log TPM across A3B copy number. (d) Prevalence of the A3A-B hybrid isoform and A3A-1 isoform across different A3B copy numbers, measured as the percentage of total A3A gene expression. P-values indicated are for one-way ANOVA.

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2

Figure. 2. Germline APOBEC3B deletion and mutational signatures. (a) Comparison of mutational signatures between samples with homozygous germline A3B deletion and non- carriers, and, for contrast, (b) between samples with and without germline mutations in homologous recombination genes. (c-e) The (log-normalized) total mutational burden (c), the proportion of mutations with signatures 2 and 13 (d), and the (log-normalized) total rate of signature 2 and 13 mutations (e), between tumour samples with different germline A3B copy number. The grey area in (e) represents somatic hypermutation as defined by Nik-Zainal et al. (2014). P-values are for Kruskal Wallis rank sum tests (d) or one-way ANOVA (c,e).

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page)

Figure 3. Molecular profiles of breast tumours with different germline APOBEC3B copy number. (a) Frequency of Integrative Cluster molecular subtypes across different A3B copy number. (b) Frequency of the top ten most commonly mutated driver genes across A3B copy number. (a-b) Sample sizes are indicated in brackets in the figure legend. Numbers above the bars are p-values for Fisher’s exact test. (c) Log-normalized counts of predicted HLA-A neoantigens in samples with different A3B copy number. (d) Gene set expression scores for the expanded IFN-γ gene set that is predictive of response to immunotherapy from Ayers et al. (2017), using the GSVA method, across A3B copy number. (e) Anti- CD8 IHC staining, measured as the percentage of area, of FFPE tumour samples with different A3B copy number. (f) Tumour heterogeneity, measured as the log-normalized counts of PyClone clusters, across different germline A3B copy number. (c-f) P-values indicated are for one-way ANOVA. (g) Kaplan-Meier plot of overall survival for patients with different A3B copy number. P-value indicated is for an unadjusted log-rank test.

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page)

Figure 4. Molecular profiles of breast tumours with signature 2 and 13 (APOBEC) hypermutation. (a) Frequency of Integrative Cluster molecular subtypes in samples with and without APOBEC hypermutation. (b) Frequency of the top ten most commonly mutated driver genes in samples with and without APOBEC hypermutation. (a-b) Sample sizes are indicated in brackets in the figure legend. Numbers above the bars are p-values for Fisher’s exact test. (c) Log-normalized counts of predicted HLA-A neoantigens in samples with and without APOBEC hypermutation. (d) Gene set expression scores for the expanded IFN-γ gene set that is predictive of response to immunotherapy from Ayers et al. (2017), using the GSVA method, in samples with and without APOBEC hypermutation. (e) Anti-CD8 IHC staining, measured as the percentage of area, of FFPE tumour samples with and without APOBEC hypermutation. (f) Tumour heterogeneity, measured as the log-normalized counts of PyClone clusters, in samples with and without APOBEC hypermutation. (c-f) P-values indicated are for one-way ANOVA. (g) Kaplan-Meier plot of overall survival for patients with and without APOBEC hypermutation. P-value indicated is for an unadjusted log-rank test.

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page)

Figure 5. Comparison of molecular profiles of breast tumours with APOBEC hypermutation in different germline APOBEC3B copy number backgrounds. (a) Frequency of Integrative Cluster molecular subtypes in samples with APOBEC hypermutation across different A3B copy number backgrounds. (b) Frequency of the top ten most commonly mutated driver genes in samples with APOBEC hypermutation across different A3B copy number backgrounds. (a-b) Sample sizes are indicated in brackets in the figure legend. Numbers above the bars are p-values for Fisher’s exact test. (c) Log-normalized counts of predicted HLA-A neoantigens in samples with APOBEC hypermutation across different A3B copy number backgrounds. (d) Gene set expression scores for the expanded IFN-γ gene set that is predictive of response to immunotherapy from Ayers et al. (2017), using the GSVA method, in samples with APOBEC hypermutation across different A3B copy number backgrounds. (e) Anti-CD8 IHC staining, measured as the percentage of area, of FFPE tumour samples with APOBEC hypermutation, across different A3B copy number backgrounds. (f) Tumour heterogeneity, measured as the log-normalized counts of PyClone clusters, in samples with APOBEC hypermutation across different A3B copy number backgrounds. (c-f) P-values indicated are for one-way ANOVA. (g) Kaplan- Meier plot of overall survival for patients stratified by APOBEC hypermutation and A3B copy number. P- value indicated is for an unadjusted log-rank test.

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplemental Figures

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 1

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page) Supp. Fig. 1. APOBEC expression of across breast cancer subtypes. Expression of the A0POBEC3A and APOBEC3B genes, as well as the A3A-1 and A3A-B hybrid isoforms of A3A, across the different PAM50 breast cancer molecular subtypes as well as germline A3B copy number, quantified as log transcripts- per-million (TPM). P-values indicated are for one-way ANOVA.

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 2

Supp. Fig. 2. Germline APOBEC3B deletion and mutational signatures 2 and 13 across subtypes. Comparison of the proportion and total rate of mutational signatures 2 and 13 between samples with different germline A3B copy number, stratified by PAM50 molecular subtype. P-values shown are for one-way ANOVA.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 3

Supp. Fig. 3. Relationship between APOBEC expression and signature 2/13 mutations. (a) Comparison of total mutational burden (left), the proportion of mutations with signatures 2 and 13 (middle), and the total rate of signature 2 and 13 mutations (right) to expression of the APOBEC3B gene. (b-c) Comparison of total mutational burden (left), the proportion of mutations with signatures 2 and 13 (middle), and the total rate of signature 2 and 13 mutations (right) to expression of the (b) A3A-1 and (c) A3A-B hybrid gene isoforms. In the right-sided figures, the grey area represents APOBEC somatic hypermutation as defined by Nik Zainal et al. 2014.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 4

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page)

Supp. Fig. 4. Relationship between APOBEC expression and signature 2/13 mutations in different molecular subtypes. Comparison of log-normalized total mutational burden (left), the proportion of mutations with signatures 2 and 13 (middle), and the log-normalized total rate (proportion of mutations multiplied by total mutational burden) of signature 2 and 13 mutations (right) to expression of the APOBEC3A (top) and APOBEC3B (second from top) genes, as well as expression of the A3A-1 (second from bottom) and A3A-B hybrid isoforms, for each main PAM50 molecular subtype.

Gene/isoform expression is quantified as log2-normalized transcripts-per-million. Also indicated are the Pearson’s correlation coefficient (r) for each comparison, and the corresponding p-value. Comparisons with a p-value below 0.05 are highlighted in red.

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 5

Supp. Fig. 5. Comparison of RTCA and YTCA mutations. (a) Comparison of the number of YTCA mutations to the number of RTCA mutations in each MyBrCa sample. Each sample is colored according to germline A3B status (left) or hypermutation status (right). Black line indicates a 1:1 ratio. (b) Zoomed- in version of (a), for samples with mutation counts of less than 50. (c) Prevalence of YTCA mutations 39 across A3B copy number (left) and hypermutation status (right). bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 6

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page)

Supp. Fig. 5. Comparison of RTCA and YTCA mutations. (a) Comparison of the number of YTCA mutations to the number of RTCA mutations in each MyBrCa sample, stratified by molecular subtype. Black line indicates a 1:1 ratio. (b) Zoomed-in version of (a), for samples with mutation counts of less than 30.

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 7

Supp. Fig. 7. Multivariate linear regression of the rate of signature 2 and 13 mutations. Figure shows significant predictors from a multivariate linear regression analysis of the total rate of signature 2 and 13 mutations using gene expression of all APOBEC family members as well as known or putative A3A interacting proteins (from UniProt, highlighted in red) as predictors. Bars indicate the level of significance for each predictor, and those highlighted in cyan are significant predictors in the minimal model after backward-stepwise elimination analysis. Marked line indicates where (Pr > |t|) = 0.05. The adjusted R-squared of the model is 0.124.

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 8

43 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page) Supp. Fig. 8. Germline APOBEC3B deletion is associated with specific PIK3CA mutations. (a) Comparison of the frequency of C to T/G to A mutations across the top ten most commonly mutated breast cancer driver genes across samples with different germline A3B copy number. Numbers above the bar indicate p-value for Fisher’s exact test (NS indicates p>0.1). (b) Lollipop plots of PIK3CA mutations for samples with different germline A3B copy number.

44 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 9

Supp. Fig. 9. Relationship of germline APOBEC3B deletion to neoantigen burden and immune scores. (a) Comparison of neoantigen burden (quantified as the number of predicted HLA-A neoantigens from MyBrCa tumours) to germline A3B copy number (left), expression of the A3A-B hybrid isoform (middle- left), total rate of signature 2 and 13 mutations (middle-right), and signature 2 and 13 hypermutation status (right). In the middle-right figure, the grey area represents APOBEC somatic hypermutation as defined by Nik Zainal et al. 2014, and the two lines represent linear regression for all samples (orange) or for hypermutators only (red). P-values shown are for one-way ANOVA, Pearson’s correlation, or t-test. (b-c) Comparison of MyBrCa tumour immune scores, quantified using four different methods (see Methods), across A3B copy number (b) and hypermutation status (c). P-values shown are for one-way ANOVA or t-tests. 45 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 10

Supp. Fig. 10. Validation of the relationship between germline A3B deletion and the tumor immune microenvironment using IHC. (a-c) Comparison of anti-CD3 staining (a), anti-CD8 staining (b), and anti-CD4 staining (c) between samples with different A3B copy number (left) and between hypermutators and non- hypermutators (right). Antibody staining was quantified as a percentage of total intratumoural area. P-values indicated are for one-way ANOVA or Student’s t- tests. (d) Prevalence of anti-PD-L1 staining in samples with different A3B copy number (left) and in hypermutators and non- hypermutators (right). Anti-PD-L1 staining was quantified using the Combined Positive Score system - samples with a CPS of 1 are considered PD-L1 positive, and vice versa. P-values indicated are for Fisher’s exact test.

46 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 11

Supp. Fig. 11. Relationship between neoantigen burden and germline A3B deletion across subtypes. Comparison of neoantigen burden to germline A3B copy number (left) and signature 2 and 13 (APOBEC) hypermutation (right), across different PAM50 molecular subtypes. Numbers above the bars are p- values for one-way ANOVA.

47 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 12

Supp. Fig. 12. Relationship between signature 2 and 13 mutations and tumour- infiltrating lymphocytes. (a-c) Comparison of anti-CD3 staining (a), anti-CD8 staining (b), and anti- CD4 staining (c) to log-normalized rate of signature 2 and 13 mutations. Samples are colored according to whether they are homozygous (gold), heterozygous (yellow), or non-carriers (blue) of germline A3B deletion. Gray area represents APOBEC somatic hypermutation as defined by Nik Zainal et al. 2014.

48 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 13

49 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

(Continued from previous page) Supp. Fig. 13. Relationship between germline A3B deletion and immune scores across subtypes. Comparison of germline A3B copy number (left) and signature 2 and 13 hypermutation (right) to (from left to right:) ESTIMATE immune score, GSVA using immune gene sets from Bindea et al. 2013, GSVA using the expanded IFN-gamma gene set from Ayers et al. 2017, and IMPRES score, for samples of different PAM50 molecular subtypes. P-values on top of the figures are for one-way ANOVA.

50 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 14

Supp. Fig. 14. Relationship of germline APOBEC3B deletion to tumour heterogeneity. Comparison of tumour heterogeneity, quantified as the log-normalized number of predicted PyClone clusters from MyBrCa tumours, to germline A3B copy number (left), expression of the A3A-B hybrid isoform (middle- left), total rate of signature 2 and 13 mutations (middle-right), and signature 2 and 13 hypermutation status (right). In the middle-right figure, the grey area represents APOBEC somatic hypermutation as defined by Nik Zainal et al. 2014, and the two lines represent linear regression for all samples (orange) or for hypermutators only (red). P-values shown are for one-way ANOVA, Pearson’s correlation, and t- test.

51 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 15

Supp. Fig. 15. Relationship between tumour heterogeneity and germline A3B deletion across subtypes. Comparison of tumour heterogeneity, quantified as log-transformed counts of PyClone clusters, to germline A3B copy number (left) and signature 2 and 13 (APOBEC) hypermutation (right), across different PAM50 molecular subtypes. Numbers above the bars are p-values for one-way ANOVA.

52 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 16

Supp. Fig. 16. Cox proportional hazard model for APOBEC hypermutation. Forest plots indicates hazard ratios for overall survival in MyBrCa patients from a Cox proportional hazard model with germline A3B copy number and Signature 2 and 13 somatic hypermutation as variables, adjusted for tumour stage, PAM50 molecular subtype, and IMPRES score.

53 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 17

Supp. Fig. 17. Relationship of APOBEC3B hypermutation to neoantigen burden and immune scores, stratified by APOBEC3B deletion. Dot plots (left) and violin plots (right) of neoantigen burden and four different immune scores compared to APOBEC hypermutation, stratified by presence of the APOBEC3B deletion.

54 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 18

Supp. Fig. 18. Relationship of APOBEC3B hypermutation to immune markers in breast tumours, stratified by APOBEC3B deletion. Dot plots (left, middle) and violin plots (right) of tumour immune markers, as measured by percentage of tumour area with anti-CD3 (top), -CD8 (middle), or –CD4 (bottom) IHC staining, compared to APOBEC hypermutation, stratified by presence of the APOBEC3B deletion (middle, right). 55 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Fig. 19

Supp. Fig. 19. Relationship of APOBEC3B hypermutation to tumour heterogeneity, stratified by APOBEC3B deletion. Dot plots (left, middle) and violin plots (right) of tumour heterogeneity, as measured by the log-normalized counts of PyClone clusters, compared to APOBEC hypermutation, stratified by presence of the APOBEC3B deletion (middle, right).

56 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supplemental Tables

57 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Table 1

Supp. Table 1. APOBEC-associated hypermutation in carriers of germline APOBEC3B deletion. Comparison of the prevalence of germline APOBEC3B deletion and signature 2- and 13- associated somatic hypermutation in the MyBrCa and TCGA cohorts (TCGA data from Nik-Zainal et al. 2014). The Cochran-Armitage test of trend is for correlation between A3B copy number and hypermutation.

Deletion allele Allele Hypermutated Non- % Cohort N status distribution (column %) hypermutated Hypermutated MyBrCa Homozygous 79 0.158 20 (26) 59 25.32 Heterozygous 232 0.464 39 (50) 193 16.81 Non-carriers 189 0.378 19 (24) 170 10.05 Total 500 - 78 (100) 422 15.60 TCGA Homozygous 14 0.015 4 (4) 10 28.57 Heterozygous 128 0.139 28 (26) 100 21.88 Non-carriers 781 0.846 74 (70) 707 9.48 Total 923 - 106 (100) 817 11.48

MyBrCa TCGA Cochran-Armitage p < 2.2e-16 p = 6.3e-6 test for trend

58 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Table 2

Supp. Table 2. Signature 1, 3, and 8 hypermutation in carriers of germline APOBEC3B deletion. The table shows hypermutation for 3 other mutational signatures common in breast cancer across APOBEC3B deletion status in the MyBrCa Asian cohort. Supp. Table. 1. Signature 1, 3, and 8 hypermutation in carriers of germline APOBEC3B deletion. The table shows hypermutation for 3 other mutational signatures common in breast cancer across APOBEC3B deletion status in the MyBrCa Asian cohort.

Signature 1

Deletion allele status Hypermutators Non hypermutators Total Hypermutators/ Total

Homozygous 5 74 79 0.063

Heterozygous 6 226 232 0.026

Non-carrier 6 183 189 0.032

Fisher’s Exact test; p = 0.2603

Signature 3

Deletion allele status Hypermutators Non hypermutators Total Hypermutators/ Total

Homozygous 15 64 79 0.190

Heterozygous 31 201 232 0.134

Non-carrier 30 159 189 0.159

Chi-squared test = 1.5536, p = 0.4599

Signature 8

Deletion allele status Hypermutators Non hypermutators Total Hypermutators/ Total

Homozygous 4 75 79 0.051

Heterozygous 20 212 232 0.086

Non-carrier 11 178 189 0.058

Chi-squared test = 1.7954, p = 0.4075

59 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Table 3

Supp.Table Table 2: 3: A Analysisnalysis of of Y/RT Y/RTCACA mutationsmutations substratified-categorised by hypermutation by hypermutation and germlineand germline A3B deletionA3B status.deletion Y/RTCA status. mutations Y/RTC wereA mutations compared were on compared a sample- onby a -samplesample-by basis-sample according basis according to which to mutation which type mutatiwas moreon type common was more in commoneach sample. in each sample.

> 50% > 50% Proportion of samples with YTCA RTCA > 50% YTCA mutation Status count count total count Hypermutators 70 4 74 0.946 Non-hypermutators 136 75 211 0.645 Fisher's Exact test P = 6.582 e-08 Cochran-Armitage test for trend P < 2.2 × 10-16

Homozygous deletion 40 11 51 0.784 Heterozygous deletion 101 26 127 0.795 Homozygous normal 66 42 108 0.611 Pearson's Chi squared test P = 0.004 Cochran-Armitage test for trend P = 0.0042 Deletion carriers 141 37 178 0.792 Non-carriers 66 42 108 0.611 Pearson's Chi squared test P = 0.0014 Cochran-Armitage test for trend P = 9 × 10-4

60 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Table 4

Supp. Table 4: Regression analysis of signature 2 and 13 mutations. Minimal model (after backward-stepwise elimination analysis) for a multivariable linear regression analysis of the rate of signature 2 and 13 mutations in the MyBrCa cohort (n=527) using gene expression data (log2 TPM) for APOBEC family members and known APOBEC3A-interacting proteins (from UniProt). Asterisks indicate the level of significance (Pr(>|t|)) for each variable (*< 0.05; **<0.01; ***<0.001).

Variable Est. Coefficient Std. Error t-value Pr(>|t|) Sig. APOBEC family members A3A-B hybrid isoform 0.96344 0.20663 4.663 4.02E-06 *** A3C -0.46377 0.15651 -2.963 0.003191 ** A3H 0.53222 0.22619 2.353 0.019011 * APOBEC3A-interacting proteins STAMBP 0.4562 0.22245 2.051 0.040816 * COX7A1 -0.29855 0.10417 -2.866 0.004334 ** COX6C -0.27144 0.07402 -3.667 0.000272 *** Adjusted R-squared: 0.124 Model p-value: 1.56e-13

61 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.04.135251; this version posted June 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Supp. Table 5

Supp. Table 5: Relationship between germline A3B deletion or hypermutation with tumour stage. Numbers in brackets are column percentages. P-values shown are for Fisher’s exact test.

A3Bdel/del A3Bdel/wt A3Bwt/wt p-value Tumour Stage 0 2 (2.4) 9 (3.8) 8 (4.0) >0.1 (n (%)) I 4 (4.8) 50 (21.3) 34 (17.2) 0.0011 II 37 (44.6) 110 (46.8) 92 (46.5) >0.1 III 35 (42.2) 61 (26.0) 59 (29.8) 0.023 IV 5 (6.0) 5 (2.1) 5 (2.5) >0.1

Hypermutators Non-hypermutators p-value Tumour Stage 0 5 (6.4) 13 (3.1) >0.1 (n (%)) I 14 (17.9) 72 (17.2) >0.1 II 37 (47.4) 191 (45.6) >0.1 III 20 (25.6) 131 (31.3) >0.1 IV 2 (2.6) 12 (2.9) >0.1

62