Cancer Genome and Epigenome Research
Sex Differences in Cancer Driver Genes and Biomarkers Constance H. Li1,2, Syed Haider1, Yu-Jia Shiah1,2, Kevin Thai1, and Paul C. Boutros1,2,3
Abstract
Cancer differs significantly between men and women; even and BAP1. Sex influenced biomarkers of patient outcome, after adjusting for known epidemiologic risk factors, the sexes where different genes were associated with tumor aggression differ in incidence, outcome, and response to therapy. These in each sex. These data call for increased study and consider- differences occur in many but not all tumor types, and their ation of the molecular role of sex in cancer etiology, progres- origins remain largely unknown. Here, we compare somatic sion, treatment, and personalized therapy. mutation profiles between tumors arising in men and in women. We discovered large differences in mutation density Significance: This study provides a comprehensive cata- and sex biases in the frequency of mutation of specific genes; log of sex differences in somatic alterations, including in these differences may be associated with sex biases in DNA cancer driver genes, which influence prognostic biomarkers mismatch repair genes or microsatellite instability. Sex-biased that predict patient outcome after definitive local therapy. genes include well-known drivers of cancer such as b-catenin Cancer Res; 78(19); 5527–37. 2018 AACR.
Introduction advantage declines and disappears during menopause (11). Some of these differences in treatment response may be attributed to Sex differences in cancer have been known at least since 1949 differences in driver mutations between the sexes, and others to (1), with repeated demonstration that males have higher cancer differences in epigenetics or chromatin conformation. risk both in studies using North American (e.g., SEER; ref. 2) and The origins and mechanisms of these sex differences remain a international databases (e.g., IARC; ref. 3). Most, but not all tumor majorunresolvedquestionincancerbiology.Theymaybe types show increased incidence in men: thyroid cancer occurs caused by differences in the expression of genes on the sex 2.5 times more frequently in women. These differences remain chromosomes, in hormone levels, in developmental biology, after controlling for known epidemiologic risk factors (3). At most or in lifestyle features not reflected in current epidemiologic tumor sites, cancers arising in men induce higher mortality (4); for studies. Likely, a mixture of all these components contributes to example, there is a 3-fold increase in lethality from urinary sex differences in patient outcomes. We hypothesized that, bladder carcinomas in men relative to women (4). Further, there independent of their mechanism, sex differences in cancer are significant differences in response to treatment: female would be reflected by differences in somatic mutation profiles. patients with non–small cell lung cancer respond better to both That is, male and female tumors would acquire mutations at surgery (5, 6) and chemotherapy (7, 8), even after accounting for different rates and of different types. Recent intriguing data differences in variables such as subtype. Female patients with on missense mutations in melanoma support this hypothesis colorectal cancer respond better to surgery, and this difference is (12). We, therefore, undertook a systematic evaluation of sex- driven by improved female survival in the rectal cancer subgroup associated biases in mutations in cancer across a broad range (9). Similarly, female patients with colorectal also respond better of tumor types. Our study provides a comprehensive pan- to chemotherapy, which is partially attributed to differences in cancer catalog of sex-biased mutations and a perspective on tumor site and microsatellite instability (10). Finally, a propen- sex-specific prognostic biomarkers. sity-matched study of nasopharyngeal carcinoma found that females have a survival advantage regardless of tumor stage, Materials and Methods radiation technique, and chemotherapy regimen, but that this Data acquisition and processing mRNA abundance, DNA genome-wide somatic copy-number 1Computational Biology Program, Ontario Institute for Cancer Research, Tor- and somatic mutation profiles for the Cancer Genome Atlas 2 onto, Ontario, Canada. Department of Medical Biophysics, University of Tor- (TCGA) datasets were downloaded from Broad GDAC 3 onto, Toronto, Ontario, Canada. Department of Pharmacology and Toxicology, Firehose (https://gdac.broadinstitute.org/), release 2016-01-28. University of Toronto, Toronto, Ontario, Canada. For mRNA abundance, Illumina HiSeq rnaseqv2 level 3 RSEM- Note: Supplementary data for this article are available at Cancer Research normalized profiles were used. Genes with >75% of samples Online (http://cancerres.aacrjournals.org/). having zero reads were removed from the respective data set. Corresponding Author: Paul C. Boutros, Ontario Institute for Cancer Research, GISTIC v2 (13) level 4 data were used for somatic copy-number Toronto, ON M5G0A3, Canada. Phone: 647-258-4321; E-mail: analysis. mRNA abundance data were converted to log2 scale for [email protected] subsequent analyses. Mutational profiles were based on TCGA- doi: 10.1158/0008-5472.CAN-18-0362 reported MutSig v2.0 calls. All preprocessing was performed in R 2018 American Association for Cancer Research. statistical environment (v3.1.3).
www.aacrjournals.org 5527
Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2018 American Association for Cancer Research. Li et al.
Patients younger than 18, older than 85 or lacking sex anno- used to reduce false positives that may arise from unbalanced tation were excluded from analysis, resulting in a sample size of tumor type subsets of the pan-cancer data. Multivariate logistic 7,131 across all tumor types for copy-number alterations (CNA; regression (MLR) was used to adjust ternary CNA data for sex, age, 1.5% excluded, Supplementary Table S1) and 6,073 for somatic race, and tumor type. The MLR sex term was tested for significance single-nucleotide variants (SNV; 1.5% excluded; Supplementary and FDR corrected to identify bins with pan-cancer sex biases (q < Table S1). Genes were excluded if they were mutated in fewer than 0.05). 20 patients (for CNAs) or 5% of patients (for SNVs). Gene filters The same approach was applied to each tumor type individ- were applied independently for pan-cancer and per individual ually. Proportions tests were used to select bins for multivariate tumor type data set. All analyses excluded genes on the X and Y analysis (q value < 0.1). MLR was again used to adjust ternary chromosomes. copy-number call for clinical variables. MLR modeling for each tumor type varies based on available clinical data. Tumor type– Mutation load specific models were fit independently per univariately significant Mutation load per patient was calculated as the sum of SNVs bin and variable significance for each bin was extracted from the across all genes on the autosomes. Mutation load was Box–Cox fitted models. FDR correction was used and an FDR threshold of transformed, and transformed values were compared between the 0.05 was used. A description of pan-cancer and tumor type– sexes using unpaired two-sided t tests for both pan-cancer and specific models, along with a breakdown of the data for each tumor type–specific analysis. A linear regression model was used group, can be found in Supplementary Table S1 and results can be to adjust mutation load for tumor type for the pan-cancer com- found in Supplementary Tables S3–S5. parison. Tumor type–specific P values were adjusted using the Benjamini–Hochberg false discovery rate procedure. Tumor types CNA-mRNA functional analysis with q values meeting an FDR threshold of 10% were further Genes in bins altered by sex-biased CNAs after multivariate analyzed using linear regression to adjust for tumor type–specific adjustment for kidney clear cell and kidney papillary cell cancers variables described in Supplementary Table S1. A multivariate were further investigated to determine sex-biased functional q value threshold of 0.05 was then used to determine statistical effects. Available mRNA samples were matched to those used in significance. Full results are in Supplementary Table S2. CNA analysis. For each gene affected by a sex-biased loss, its mRNA abundance was modeled against sex, copy-number loss Genome instability status, and a sex–copy-number loss interaction term. The inter- Genome instability was calculated as the percentage of the action term was used to identify genes with sex-biased mRNA genome affected by copy-number alterations. The number of base changes. FDR-adjusted P values and fold changes were extracted pairs for each CNA segment was summed to obtain a total number for visualization. A q value threshold of 0.05 was used for of base pairs altered per patient. The total number of base pairs statistical significance. For genes affected by sex-biased gains, the was divided by the number of assayed bases excluding the sex same procedure was applied using copy-number gains. chromosomes ( 7.8 million bp) to obtain the percentage of the genome altered (PGA). Box–Cox transformed PGA was treated as CNA-mRNA survival analysis a continuous variable and compared by sex using two-sided Genes found to have significant or trending (FDR threshold of unpaired t tests for all tumor types combined (pan-cancer) and 10%) sex biases in the CNA-mRNA functional analysis were separately (tumor type–specific). Linear regression models were further analyzed using Cox proportional hazards modeling. That used to adjust PGA for tumor type, age, and race for the pan-cancer is, we focused on genes that were both altered by sex-biased CNAs comparison. Tumor types where univariate testing indicated (MLR q value < 0.05) and showed mRNA abundance differences putative sex biases in PGA (FDR threshold of 10%) were also between the copy-number neutral and loss/gain groups for either adjusted for tumor type–specific variables (Supplementary sex (sex–loss interaction q < 0.1). For each gene, the mRNA Table S1). A q value threshold of 0.05 was used to determine abundance was median dichotomized over all samples to identify statistical significance for multivariate results and full results are low- and high-expression groups. Cox proportional hazard regres- presented in Supplementary Table S2. sion models incorporating sex, mRNA group, and a sex–mRNA group interaction were fit for overall survival after checking the Genome-spanning CNA analysis proportional hazards assumption. FDR-adjusted interaction Adjacent genes whose copy-number profiles across patients P values and log2 hazard ratios were extracted for visualization. were highly correlated (Pearson r > 95%) were binned. The copy- A q value threshold of 0.1 was used to identify genes with sex- number call for each patient was taken to be the majority call influenced survival. across all genes in each bin. Copy-number calls were collapsed to ternary (loss, neutral, gain) representation by combining loss Genome-spanning SNV analysis groups (monoallelic and biallelic) and gain groups (low and We focused on genes mutated in at least 5% of patients. All high). The number of loss, neutral, and gain calls was summed per genes tested are listed in Supplementary Table S6. Mutation data bin and sex, and assessed using univariate and multivariate were binarized to indicate presence or absence of SNV in each gene techniques. For univariate analysis, proportional differences per patient. Proportions of mutated genes were compared between the sexes for gains and losses were tested for each bin between the sexes using proportions tests for univariate analysis. using proportions tests. To account for multiple testing, FDR FDR correction was used to adjust P values and a q value threshold correction was performed and an FDR threshold of 10% was of 0.1 used to select genes for multivariate analysis. used to select bins for further multivariate analysis. After identifying pan-cancer univariately significant genes from After identifying candidate pan-cancer significant bins from proportions testing, binary logistic regression (LR) was used to univariate proportions testing, generalized linear modeling was reduce false positives that may arise from unbalanced tumor type
5528 Cancer Res; 78(19) October 1, 2018 Cancer Research
Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2018 American Association for Cancer Research. Sex Differences in Cancer Driver Genes and Biomarkers
subsets of the pan-cancer data. Age and race were also included in Results the pan-cancer model. FDR correction was again applied and Sex biases in mutation burden genes with significant pan-cancer sex terms were extracted from We leveraged data from TCGA studies comprising 7,131 the models (q value < 0.05). matched tumor–normal pairs of 18 tumor types: 4,265 from LR was also used for multivariate analysis of each individual males and 2,866 from females (Supplementary Table S1). We tumor type to adjust for clinical variables. The same model focused on somatic CNAs and SNVs in protein-coding genes variables from the CNA MLR models were used. Tumor type– as they are well-established driver events. These data are specific models were fitted independently per univariately select- well powered to detect differences in driver-gene mutation ed gene and variable significance for each gene was extracted from frequencies between tumors arising in men and those arising in the fitted models as P values. FDR correction was used to adjust women (Supplementary Fig. S1). We excluded genes and P values and a LR q value threshold of 0.05 was used. A description regions of the X and Y chromosomes and analyzed autosomal of pan-cancer and tumor type–specific models can be found in differences (19). Supplementary Table S1. A summary of results can be found in We first compared pan-cancer mutational burden between Supplementary Table S5. tumors arising in men and those arising in women. Male-derived tumors exhibited a higher density of somatic-coding SNVs than Validation of sex biases female-derived tumors in univariate analysis (difference in Copy-number data for tumor types with sex-biased CNAs were means ¼ 0.17; 95% CI, 0.14–0.20, P ¼ 1.0 10 29, unpaired downloaded from the Progenetix database (14) as a meta-analysis Welch t test on Box–Cox transformed mutation load; Supple- data set. Matching genomic regions were analyzed using propor- mentary Fig. S2). This sex bias persisted even after multivariate tions tests to validate genes in sex-biased CNAs. Similarly, somatic analysis adjusting for imbalances in sample numbers across SNV data were obtained from cBioPortal and the ICGC Data tumor type, race, and age (linear regression P ¼ 4.5 10 6; Portal and analyzed to validate sex-biased somatic SNV load and Supplementary Table S2). After finding sex differences on the pan- genes with sex-biased mutations frequencies. A description of cancer level, we asked if there were such differences within validation data, data sources, and results are available in Supple- individual cancer types and focused our analysis on each tumor mentary Table S7. type. Six of these showed univariate sex biases in mutation density (10% FDR threshold; Supplementary Fig. S2) and were further Multigene prognostic models investigated using tumor type–specific multivariate modeling. Computationally purified tumor mRNA profiles for the Direc- Again, we used Box–Cox transformation and linear modeling to tor's Challenge data were downloaded (15). The training and determine whether sex remained a significant variable after adjust- validation cohorts were processed and split as previously ing for possible confounders (linear regression q values given described and were checked for balance between male and female in Fig. 1A; model-specific variables described in Supplementary samples. Colon transcriptomic data were downloaded (16, 17) Table S1). Finally, because the association between sex and and reprocessed and normalized. Colon training and validation mutation load may be biased by later stage male-derived tumors cohorts were balanced for data source, sex, and survival status. (Supplementary Table S1), we created a sub–pan-cancer model Survival modeling was performed using overall survival as the using only tumor types with stage data and found that higher clinical endpoint for both datasets. mutation prevalence in male-derived samples persisted after To identify genes univariately associated with survival, puri- accounting for stage. A summary of univariate and multivariate fied mRNA abundance was median dichotomized for each results can be found in Supplementary Table S2. gene identify low- and high-expression groups. Cox propor- Of the six tumor types with univariate sex differences (Fig. 1A), tional hazard regression models included variables for sex, males exhibited more somatic-coding SNVs in bladder urothelial mRNA–group and the sex–mRNA group interaction, and cancer (BLCA: difference in Box–Cox means ¼ 0.55; 95% CI, P values and log hazard ratios were extracted for visualization. 2 0.20–0.90; multivariate q ¼ 3.6 10 3), melanoma (SKCM: A P value threshold of 0.01 was used to determine statistical difference in Box–Cox means ¼ 0.78; 95% CI, 0.29–1.3; multi- significance. variate q ¼ 0.037), renal papillary cell cancer (KIRP: difference in Ridge regression models were used to train 50,000 randomly Box–Cox means ¼ 2.2; 95% CI, 0.81–3.6; multivariate q ¼ 0.019), generated 100-gene prognostic signatures. The glmnet package and liver hepatocellular cancer (LIHC: difference in Box–Cox (v2.0-5) was used to run 10-fold cross-validation using means ¼ 0.16; 95% CI, 0.049–0.27; multivariate q ¼ 0.019). glmnetcv (a ¼ 0.1) and AUC as the type measure. Signatures There was an opposite trend in glioblastoma where female- were trained using the training cohort and validated in the derived samples had higher mutation burden (GBM: difference validation cohort. Signatures were then run on male- and in Box–Cox means ¼ 1.6; 95% CI, 0.14–3.0; multivariate q ¼ female-only validation patients, and Cox proportional hazards 0.094). Using independent sequencing datasets, we validated the modeling was performed. Signatures that failed the propor- male biases seen in bladder, liver, lung adenocarcinoma, and skin tional hazards assumption were removed from analysis. The cancers (Supplementary Table S7). same approach was used to train a signature using the top 100 To see if these sex differences affected multiple mutation types, univariately significant genes. we also compared the load of CNAs across tumor types based on the percentage of genome altered, which is a prognostic marker in Statistical analysis and data visualization several tumor types (20–22). A putative univariate sex bias in pan- All statistical analyses and data visualization were performed in cancer PGA was not significant after multivariate adjustment the R statistical environment (v3.2.1) using the BPG (v5.3.4; (Supplementary Fig. S2); however, 4/18 individual tumor types ref. 18), mlogit (v0.2-4), glmnet (v2.0-5), and pROC (v1.8) showed univariate sex differences in PGA (Supplementary packages.
www.aacrjournals.org Cancer Res; 78(19) October 1, 2018 5529
Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2018 American Association for Cancer Research. Li et al.
Figure 1. Mutation burden is sex biased. We found sex differences in somatic mutation load (A) and genome instability (B). Each point represents a sample (male-derived, blue; female-derived, pink). We focused on tumor types with univariately significant sex differences in mutation and show q values from multivariate modeling here. Red lines show mean mutation burden for each group. C, Mosaic map showing the relationship between microsatellite instability and sex in stomach and esophageal cancer. D, Higher male mutation prevalence emerges after adjusting for microsatellite instability. Adjusted Box–Cox transformed data are shown.
Fig. S2). These were further analyzed with multivariate modeling the mutation rates of DNA MMR genes, we observed significantly to examine the influence of sex (Fig. 1B; Supplementary Table S2). lower mRNA abundance in female-derived tumors for MLH1 Males showed elevated genomic instability in stomach and (male mean ¼ 8.89, female mean ¼ 8.5, 95% CI, 0.19, 0.62, esophageal cancer (STES: difference in Box–Cox means ¼ 1.7; t test q ¼ 0.0011) and PMS2 (male mean ¼ 9.0, female mean ¼ 95% CI, 0.92–2.4; multivariate q ¼ 9.7 10 3), head and neck 8.87, 95% CI, 0.05–0.21, t test q ¼ 0.0060). Taken together, this cancer (HNSC: difference in Box–Cox means ¼ 1.9; 95% CI, 1.0– suggests that differential mRNA abundance may form a link 2.8; multivariate q ¼ 0.016), and kidney clear cell cancer (KIRC: between MMR and sex biases in mutation load in stomach and difference in Box–Cox means ¼ 0.40; 95% CI, 0.14–0.67; mul- esophageal cancer. We did not find novel sex biases in colorectal tivariate q ¼ 0.019). A strong opposite trend was seen in sarcoma, or pancreatic mutation burden after accounting for MSI (Supple- where PGA was higher in female-derived tumors (SARC: differ- mentary Fig. S3). ence in Box–Cox means ¼ 1.5; 95% CI, 0.41–2.7; multivariate q ¼ To investigate whether sex-biased mutation load is generally 0.021). associated with DNA MMR, we also looked specifically at MMR Measures of mutation burden such as SNV load and PGA may genes in all tumor types with sex-biased mutation load. We found be correlated with defects in DNA mismatch repair (MMR). For decreased MSH2 (male mean ¼ 8.45, female mean ¼ 8.83, 95% example, microsatellite instability (MSI), a marker of defective CI, 0.22–0.53, t test q ¼ 3.98 10 6), MSH3 (male mean ¼ 8.50, DNA MMR, is more common in some tumor types (23) and could female mean ¼ 8.71, 95% CI, 0.082–0.34, t test q ¼ 1.51 10 3), be a confounder in the relationship between mutation burden MSH6 (male mean ¼ 9.12, female mean ¼ 9.65, 95% CI, 0.37– and sex. We further examined three tumor types with available 0.67, t test q ¼ 4.57 10 10) and PMS1 (male mean ¼ 7.71, MSI-monodinucleotide assay data: colorectal, pancreatic, and female mean ¼ 8.01, 95% CI, 0.14–0.46, t test q ¼ 2.26 10 4) stomach and esophageal cancers. In samples with MSI data mRNA abundance in male kidney papillary tumors, correspond- (Supplementary Table S1), we found an association between MSI ing with higher male mutation prevalence. Similarly, male mRNA and sex in stomach and esophageal cancer (Pearson c2 P ¼ 1.4 abundance of PMS2 (male mean ¼ 8.84, female mean ¼ 8.97, 10 5; 40% of female-derived samples vs. 26% of male-derived 95% CI, 0.025–0.24, t test q ¼ 0.055) and MLH3 (male mean ¼ samples; Fig. 1C) and colorectal cancer (Pearson c2 P ¼ 0.025; 8.70, female mean ¼ 8.87, 95% CI, 0.039–0.30, t test q ¼ 0.055) 33% of female-derived samples vs. 25% of male-derived samples; was also lower than that of female-derived tumors in liver cancer. Supplementary Fig. S3). By contrast, MSI status was not sex This suggests that for some tumor types, differences in mutation associated in pancreatic cancer (Pearson c2 P ¼ 0.63). Incorpo- load may be explained by sex biases in the efficiency of MMR. rating MSI into our analyses of SNV burden and PGA, we first Taken together, this analysis of mutation burden identified sex noted that MSI was associated with increased SNV burden but not biases across several tumor types even after adjusting for race, PGA in all three tumor types. We then used multivariate models tumor stage, and smoking history, among others. Indeed, a sex including MSI to examine the interplay between sex, MSI, and bias in stomach and esophageal cancer was only discovered after mutation burden. Intriguingly, though there was no univariate adjusting for MSI status, highlighting its importance. Finally, relationship between sex and SNV burden in stomach and esoph- changes in the abundances of DNA MMR mRNA form a candidate ageal cancer, a novel sex bias emerged after adjusting for MSI (MV mechanism for sex biases in mutation density. P ¼ 0.023; Fig. 1D). We observed the same effect in an indepen- dent data set (Supplementary Table S7). The association between Sex biases in somatic CNAs sex and PGA persisted in this new model, enforcing the sex bias in Differences in mutation density might reflect changes in specific PGA for this tumor type. Because MSI is thought to result from driver genes, or alternatively global changes as might be induced defective DNA MMR, we also looked for sex biases specifically in a by differences in DNA damage or repair. To distinguish these set of seven MMR genes (24). Though we did not find sex biases in possibilities, we compared male- and female-derived tumors in
5530 Cancer Res; 78(19) October 1, 2018 Cancer Research
Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2018 American Association for Cancer Research. Sex Differences in Cancer Driver Genes and Biomarkers