bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Sex-specific regulatory mechanisms underlying Hepatocellular

2 Carcinoma

3

4 Heini M Natri1,*, Melissa Wilson Sayres1, Kenneth Buetow1

5

6 1 Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe,

7 Arizona, USA

8

9 Corresponding author:

10 E-mail: [email protected] (H.M.N)

1 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

11 Abstract

12 Sex differences in cancer occurrence and mortality are evident across tumor types; men exhibit

13 higher rates of incidence and often poorer responses to treatment. Targeted approaches to the

14 treatment of tumors that account for these sex differences require the characterization and

15 understanding of the fundamental biological mechanisms that differentiate them. Hepatocellular

16 Carcinoma (HCC) is the second leading cause of cancer death worldwide, with the incidence

17 rapidly rising. HCC exhibits a male-bias in occurrence and mortality. Most HCC studies, to date,

18 have failed to explore the sex-specific effects in regulatory functions. Here we have

19 characterized the regulatory functions underlying HCC tumors in sex-stratified and combined

20 analyses. By sex-specific analyses of differential expression of tumor and tumor adjacent

21 samples, we uncovered etiologically relevant , pathways and canonical networks

22 differentiating male and female HCC. While both sexes exhibited activation of pathways related

23 to apoptosis and p53 signaling, pathways involved in innate and adaptive immunity showed

24 differential activation between the sexes. Using eQTL analyses, we discovered germline

25 regulatory variants with differential effects on tumor gene expression between the sexes. We

26 discovered eQTLs overlapping HCC GWAS loci, providing regulatory mechanisms connecting

27 these loci to HCC risk. Furthermore, we discovered genes under germline regulatory control that

28 alter survival in patients with HCC, including some that exhibited differential effects on survival

29 between the sexes. Overall, our results provide new insight into the role of genetic regulation of

30 transcription in modulating sex differences in HCC occurrence and outcome and provide a

31 framework for future studies on sex-biased cancers.

32 Author Summary

33 Sex differences in cancer occurrence and mortality are evident across tumor types, and targeted

34 approaches to the treatment male and female tumors require the characterization and

2 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

35 understanding of the fundamental biological mechanisms that differentiate them. Hepatocellular

36 Carcinoma (HCC) is the second leading cause of cancer death worldwide, with the incidence

37 rapidly rising. HCC exhibits a male-bias in occurrence and mortality. Here, we have characterized

38 the regulatory functions underlying male and female HCC. We show that while HCC shares

39 commonalities across the sexes, it shows demonstrable regulatory differences that could be

40 critical in terms of prevention and treatment: we detected differential activation of pathways

41 involved in innate and adaptive immunity, as well as differential genetic effects on gene

42 expression, between the sexes. Furthermore, we have detected regulatory variants overlapping

43 known HCC GWAS risk loci, providing regulatory mechanisms connecting these risk variants to

44 HCC. Finally, we discovered genes under germline regulatory control altering the overall survival

45 of patients with HCC, including some that showed differential effects between the sexes. Overall,

46 our findings create a paradigm for future studies on sex-biased cancers.

47 Introduction

48 Differences in cancer occurrence and mortality between sexes are evident across tumor types;

49 males exhibit higher rates of cancer incidence and often poorer response to treatment, including

50 some forms of chemotherapy and immunotherapy [1,2]. While differences in risk behaviors and

51 environmental exposure may explain some portion of the sex-bias, cellular and molecular

52 differences are also likely to be important. The sexes differ in their endocrinological profile,

53 immunological functions and genetic makeup [3–5]. However, sex differences are rarely

54 considered in the development of cancer therapies, and the contribution from sex

55 variation to tumor etiology remains poorly understood. Sex-biased gene expression and

56 regulatory functions may underlie differences between the sexes in disease prevalence and

57 severity [6,7]. Analyses of sex-specific regulation of gene expression are essential for

3 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

58 understanding sources of sex difference in cancer incidence, as well as sex-specific mechanisms

59 affecting tumor etiology, disease progression and outcome.

60 Hepatocellular carcinoma (HCC) exhibits sex-bias in occurrence, with a male-to-female

61 incidence ratio between 1.3:1 and 5.5:1 across populations [8,9]. HCC is the second leading cause

62 of cancer mortality worldwide, accounting for 8.2% of all cancer deaths [10]. HCC risk is

63 influenced by genetic susceptibility, environmental factors such as oncogenic viral infections,

64 metabolic syndrome, and alcohol use, and shaped by numerous biological processes, resulting in

65 a high degree of genetic and transcriptional heterogeneity [11]. HCC incidence in the US has

66 doubled in the last 3 decades, attributable to increased rates of obesity [9].

67 While sex-bias in HCC is partly attributed to sex-specific differences in risk behaviors and

68 environmental exposure, the relationship of these factors and sex as a biological variable has not

69 been systematically investigated. Previously, sex differences in HCC were explored in

70 conjunction with 12 additional Cancer Genome Atlas (TCGA) tumor types by Yuan et al [12].

71 They discovered extensive sex-biased signatures in gene expression in HCC and other strongly

72 sex-biased cancers.

73 HCC shows evidence of molecular subtypes, though their role in sex-bias has yet to be explored.

74 More specifically, clustering analyses of gene expression data have produced a characterization

75 of four distinct HCC subtypes that differ in terms of gene expression, pathway enrichment, and

76 median survival [13]. A study utilizing a weighted co-expression network analysis method

77 discovered hub genes associated with HCC outcome [14]. On the other hand, genetic variants

78 affecting the expression of TNFRSF10 [15], PAX8 [16], and PVT1 [17] have been found to alter

79 HCC disease progression and outcome, highlighting the role of regulatory variants in HCC. How

80 these progression altering subtypes or variants are related to HCC sex-bias has not been

81 examined.

4 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

82 Sex-specific analyses can reveal genetic and regulatory mechanisms of sexual dimorphism in

83 HCC susceptibility, progression, and mortality. Targeted approaches to the treatment of male and

84 female HCC will require the characterization and understanding of the fundamental biological

85 mechanisms that differentiate them. Here, we analyzed data from The Genotype-Tissue

86 Expression (GTEx) project and The Cancer Genome Atlas (TCGA) to examine the sex-specific

87 patterns of gene expression and regulation in HCC and healthy liver tissue. We find etiologically

88 meaningful differences in regulatory functions in HCC between males and females, providing

89 insight into the mechanisms underlying sex-bias in cancer. Additionally, we observed genes

90 under germline regulatory control that alter survival in sex-biased way in patients with HCC.

91 Results

92 Sex-specific patterns of gene expression HCC

93 We found sex-differences in gene expression in normal liver, tumor adjacent, and tumor tissues

94 (Fig 1). We found 13 genes showing a consistent sex-bias in expression in liver, HCC adjacent

95 tissue and HCC. Notably, we find that X-inactivation and X-linked dosage compensation appear

96 to be functioning in the healthy liver, HCC adjacent and HCC tissues. This is concluded from the

97 bulk RNAseq measurements where we observe XIST expression only in female samples. As XIST

98 is primarily involved in X-inactivation [18], this is further supported by the observation of very

99 little or no sex differences in expression of other X-linked genes. We find Y-linked genes

100 expressed in male samples in all three tissues. Many of these Y-linked genes have functional X-

101 linked homologs. When examining the combined expression levels of these homologous gene

102 pairs, we did not detect notable differences between the sexes (Fig S1).

103 Interestingly, we identify 9 genes, including 6 -coding genes (CYP4F22, HRCT1,

104 ZCCHC16, DTX1, ATF5, CD24), 2 non-coding RNAs (CTD-2325A15.3, RP11-495P10.9) and

5 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

105 one pseudogene (ASB9P1) that show sex-differences in expression in HCC, but not in HCC

106 adjacent or healthy liver. Notably, Notch-regulating DTX1, activating transcription factor 5

107 (ATF5) and signal transducer CD24 were downregulated in male HCC.

108 To further investigate the regulatory mechanisms underlying HCC of males and females, we

109 detected differentially expressed genes (DEGs) between tumor and tumor adjacent normal

110 samples in males and females, as well as in a joint analysis of both sexes (Fig 2A). Substantially

111 more DEGs were detected in sex-specific analyses than in the unstratified analysis (Fig 2B).

112 Specifically, DEGs that showed different magnitudes in fold change between the sexes were

113 detected in the sex-specific analyses, while DEGs with similar fold changes across all

114 comparisons were detected in the unstratified analysis as well as the sex-specific analyses (Fig

115 S2). DEGs that were only detected in the unstratified analysis, and not in sex-specific analyses,

116 showed a large variance in expression and were thus not detected as statistically significant DEGs

117 in sex-specific analyses (Fig S2). Tumor-infiltrating immune cells may produce spurious signals

118 in DEG analyses, which is evident from the detection of various immunoglobulin genes in tumor

119 vs. tumor adjacent comparisons (Table S4-6). However, cell content analyses based on gene

120 expression data did not exhibit sex differences in the prevalence of tumor-infiltrating immune

121 cells (Fig S3), and thus such spurious signals are unlikely to affect male-female comparisons.

122 In the joint analysis of male and female samples, we detected 581 DEGs, 23 of which were X-

123 linked (Table S4). In male and female specific analyses, we detected 606 and 416 DEGs,

124 respectively (Supplementary Tables 5 and 6). In both sex-specific analyses, 25 X-linked genes

125 were detected. Some of these X-linked genes had similar expression patterns in male and female

126 comparisons: including the MAGE (Melanoma-associated antigen) gene family members

127 MAGEA1, MAGEA12, MAGEA6, MAGEC2, MAGEA3, XAGE (X Antigen) family members

128 XAGEA1 and XAGEB1, PAGE (Prostate-associated) family member PAGE2, as well as SXX

129 (synovial sarcoma X) family member SSX1 were overexpressed in both male and female HCC.

6 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

130 Interestingly, prostate-associated antigen family member PAGE4 was only overexpressed in male

131 HCC.

132 To further investigate the sex-specific functions of HCC tumors, we analyzed the lists of male-

133 and female-specific DEGs for the enrichment of functional pathways and canonical networks (Fig

134 2C and 2D). We found that sex-specific DEGs were enriched in pathways relevant to oncogenesis

135 and cancer progression (Supplementary Tables 7 and 8). Most of the top pathways were non-

136 overlapping, strongly indicating that male and female HCC are driven by different mechanisms

137 and processes.

138 Differential cis-eQTL effects in male and female HCC

139 We used eQTL analyses to detect germline genetic effects on tumor gene expression in males,

140 females, and in both sexes (Fig 3A). Due to the small number of samples from females, we were

141 unable to reliably detect female-specific eQTL associations. However, we detected 24 male-

142 specific eGenes (Fig 3B), which were not detected in the female-specific or in the unstratified

143 analysis. Since these associations were not detected in the unstratified analysis, they are likely not

144 a result of differential power to detect associations due to different sample sizes or allele

145 frequencies between the sexes, but exhibit differential effects between the sexes. None of the

146 male-specific eGenes were differentially expressed between male and female HCC, indicating

147 that the male-specific associations are not a driven of differences in overall gene expression levels

148 between males and females, but are likely to arise from factors such as chromatin accessibility,

149 transcription factor activity, and hormone receptors [19,20]. Genomic annotations show that most

150 of the detected regulatory variants were located on the non-coding regions (Fig 3C).

151 Overlap of eQTLs and HCC risk GWAS loci

7 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

152 eQTL mapping can be used to find a regulatory mechanism explaining GWAS risk loci by

153 identifying associations between genotypes and intermediate molecular phenotypes (gene

154 expression levels). A number of variants in the HLA region in chr6, as well as variants near

155 Signal Transducer and Activator of Transcription 4 (STAT4) in chr4 and Aspartylglucosaminidase

156 (AGA) in chr2, have been identified as HCC risk loci [21–25], but the regulatory mechanisms

157 connecting these risk variants to the HCC phenotype are not clear. We discovered sex-shared

158 regulatory variants tightly linked with known HCC risk loci altering gene expression levels in

159 HCC tumors (Table 1).

160 Table 1. Tightly linked (r2>0.8) HCC risk loci and sex-shared regulatory variants.

GWAS Position Adjacent Context eVariant eVariant eVariant Annotation Target variant gene(s) Position distance eGene and to study GWAS variant

rs75748 chr2:19 STAT4 Intron 190603780 496127 Intergenic, inter- ORMDL 65[22] 1099907 CpG-island 1

rs11741 chr4:17 LOC2855 Intron 177438565 58428 CpG-shelf AGA 3665[24 7496993 00 ]

rs25239 chr6:29 MICD Non- 29985849, 14046, Intergenic, inter- MCCD1 61[25] 971803 coding 29659172*, 312631, CpG-island; Intron, P2, transcript 29626949* 344854 intron-exon- MICD, exon boundary, inter-CpG- MICD island; 1 to 5kb, intron, CpG-shore

rs11104 chr6:30 TRIM31 3’UTR 30035457, 67703, Promotor, exon, ZNRD1- 46[25] 103160 30010508 92652 intron; Intron, CpG- AS1_2, shelf, lncRNA ZNRD1- AS1_3

rs30941 chr6:30 TRIM26 - Intergeni 29828972* 404124 Intron, CpG-shore HCG17 37[25] 233096 HCG17 c

8 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

rs25965 chr6:31 LOC1019 Intergeni 31411200, 12382, Intron, CpG-shelf, XXbac- 42[21] 398818 29072 - c 31411843 13025 lncRNA; Intron, BPG181 MICA CpG-shelf, lncRNA B23.7, MICA

rs92721 chr6:32 HLA- Intron 32429758*, 202464, Intergenic, inter- HLA- 05[23] 632222 DRB1 - 32473499* 158723 CpG-island; DRB1 LOC1079 Intergenic, inter- 86589 CpG-island

rs92753 chr6:32 HLA- Intergeni 32604372*, 94146, Intergenic, inter- HLA- 19[22] 698518 DQB1 - c 32662070* 36448 CpG-island; 1 to 5kb, DQB1 MTCO3P CDS, exon, intron, 1 intron-exon boundary, CpG-shelf

rs28567 chr6:32 HLA- Intergeni 32604372*, 95613, Intergenic, inter- HLA- 23[99] 699985 DQB1 - c 32662070* 37915 CpG-island; 1 to 5kb, DQB1 MTCO3P CDS, exon, intron, 1 intron-exon boundary, CpG-shelf

rs92755 chr6:32 MTCO3P Intergeni 32637497, 73725, CDS, exon, first HLA- 72[21] 711222 1 - c 32662070, 49152, exon, intron-exon DQA1, LOC1027 32637430, 73792, boundary, inter-CpG- HLA- 25019 32662025 49197 island; 1 to 5kb, CDS, DQB, exon, intron, intron- HLA- exon boundary, CpG- DQB1- shelf;Promoter, AS1, 5'UTR, exon, first HLA- exon, intron-exon DQA2, boundary, inter-CpG- HLA- island; 1 to 5kb, exon, DQB2 intron, intron-exon boundary, 3'UTR, CpG-shelf

161 GWAS study citation is denoted in the risk variant column. Adjacent genes: If the GWAS

162 variant is located within a gene, that gene is listed. If the variant is intergenic, the upstream

163 and downstream genes are listed. Asterisk denotes instances where eVariants outside the LD

164 block associated with reported or adjacent genes were found. eVariant sites have been

165 annotated for genes and putatively regulatory regions (Methods, Table S12). Variant

166 positions and distance are denoted as base pairs.

167 Germline genetic effects on HCC tumor gene expression and survival

9 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

168 We used Cox’s proportional hazard model to test for effects of eGene expression on patient

169 survival in the TCGA Liver Hepatocellular Carcinoma (LIHC) cohort. Out of the 1204 sex-shared

170 eGenes, 50 were associated with joint survival (FDR-adjusted p-value <0.05, Table 2). No

171 female-specific survival associations were detected, likely due to the low number of female

172 samples affecting statistical power to detect survival effects. However, we detected 65 sex-shared

173 eGenes associated with male survival (Table 3). 15 of these were not associated with joint

174 survival, clearly indicating differential effects on overall survival between the sexes.

175 Table 2. Sex-shared eGenes associated with overall survival in joint analysis of both sexes.

Gene name Likelihood ratio χ2 FDR-adjusted p-value

PSRC1 27.30 2.09E-04

MMP1 22.24 1.45E-03

CFHR3 19.15 3.63E-03

PON1 19.45 3.63E-03

RP11-274H2.3 17.26 7.83E-03

FAM110D 16.49 9.83E-03

CYP3A5 15.94 1.12E-02

TAF1B 15.43 1.15E-02

BAATP1 15.49 1.15E-02

RP4-608O15.3 14.64 1.46E-02

AC018641.7 14.43 1.46E-02

RP11-459O16.8 14.54 1.46E-02

CCDC163P 14.27 1.47E-02

RP11-108O10.2 14.00 1.57E-02

TMPRSS11D 12.78 2.08E-02

RP11-650L12.2 12.88 2.08E-02

TMEM147-AS1 12.76 2.08E-02

LPA 12.90 2.08E-02

10 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

RRM1-AS1 13.12 2.08E-02

RP11-105C19.2 12.72 2.08E-02

RP11-669E14.4 13.13 2.08E-02

CYP4F60P 12.02 2.88E-02

RP11-711K1.8 11.79 3.10E-02

RGN 11.72 3.10E-02

RP11-738E22.3 11.41 3.15E-02

LINC01252 11.43 3.15E-02

TMEM147 11.52 3.15E-02

CCDC146 11.51 3.15E-02

RP1-239B22.5 11.29 3.20E-02

ANKRD20A7P 11.25 3.20E-02

GLMP 11.16 3.25E-02

MASTL 10.92 3.58E-02

CFHR1 10.75 3.80E-02

CTD-2270P14.1 10.66 3.88E-02

TRG-AS1 10.53 3.93E-02

CHRNA5 10.58 3.93E-02

FCRL3 10.33 4.00E-02

DDT 10.30 4.00E-02

RP4-736L20.3 10.32 4.00E-02

IKBIP 10.39 4.00E-02

TMEM79 10.05 4.38E-02

GABPB1-AS1 10.07 4.38E-02

LDAH 9.63 4.77E-02

FAM99B 9.71 4.77E-02

RP11-109L13.1 9.72 4.77E-02

SPSB2 9.59 4.77E-02

11 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

DDX11 9.62 4.77E-02

DDTL 9.69 4.77E-02

CR848007.2 9.65 4.77E-02

TECTB 9.57 4.77E-02

176 Cox's proportional hazard ratio: likelihood ratio χ2, FDR-adjusted p-value threshold 0.05.

177 Table 3. Sex-shared eGenes associated with overall survival in males.

Gene name Likelihood ratio χ2 FDR-adjusted p-value

PSRC1 30.89944 3.27E-05

MMP1 21.751568 1.25E-03

RGN 21.758012 1.25E-03

RP11-650L12.2 20.488249 1.81E-03

PON1 18.294246 2.74E-03

FAM99A* 18.308255 2.74E-03

FAM99B 17.940412 2.74E-03

RP1-239B22.5 18.05079 2.74E-03

CHRNA5 19.089358 2.74E-03

TMEM147 18.899008 2.74E-03

RP11-108O10.2 17.018204 4.05E-03

LPA 16.509603 4.48E-03

RRM1-AS1 16.518558 4.48E-03

CYP3A5 16.170768 4.98E-03

AADACP1* 15.893269 5.04E-03

TMEM147-AS1 15.899306 5.04E-03

RP11-274H2.3 15.100292 7.22E-03

CCDC163P 14.133518 1.03E-02

CENPQ* 14.243691 1.03E-02

RP4-736L20.3 14.142785 1.03E-02

12 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

F12* 13.823228 1.05E-02

GRK6* 13.92507 1.05E-02

LRRC23* 13.902588 1.05E-02

RP11-105C19.2 13.577894 1.15E-02

CFHR3 13.16418 1.37E-02

RP4-608O15.3 13.044288 1.41E-02

CMA1* 11.929339 2.29E-02

RP13-753N3.3* 12.001046 2.29E-02

RP11-669E14.4 11.937499 2.29E-02

WHAMM* 11.580894 2.59E-02

RRP12* 11.538845 2.59E-02

CPXM2* 11.518611 2.59E-02

RP11-80A15.1* 11.223818 2.70E-02

BAATP1 11.314561 2.70E-02

FRMPD2B* 11.377566 2.70E-02

RP11-153K11.3* 11.258503 2.70E-02

AC114730.7* 11.069887 2.71E-02

RP11-459O16.8 11.068885 2.71E-02

ULK4P2* 11.108146 2.71E-02

TRG-AS1 10.96346 2.80E-02

RP11-421M1.8* 10.769889 2.97E-02

RP11-434C1.2* 10.739028 2.97E-02

FAM110D 10.717104 2.97E-02

TMPRSS11D 10.580786 3.13E-02

DDX11 10.472235 3.24E-02

CTD-2270P14.1 10.390323 3.32E-02

AGMO* 10.305048 3.40E-02

SPSB2 9.904202 4.05E-02

13 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

LARS* 9.936718 4.05E-02

ZG16B* 9.800817 4.10E-02

DCTN5* 9.801029 4.10E-02

CCDC146 9.774758 4.10E-02

FCRL3 9.62148 4.37E-02

CFHR1 9.357763 4.43E-02

LDAH 9.444584 4.43E-02

RPL23AP7* 9.502224 4.43E-02

LINC01252 9.43441 4.43E-02

RP11-1260E13.2* 9.441852 4.43E-02

NDUFA6-AS1* 9.337829 4.43E-02

CYP2D6* 9.363136 4.43E-02

LINC01277* 9.411621 4.43E-02

SLC25A1P5* 9.211757 4.67E-02

RP11-738E22.3 9.142993 4.77E-02

RP11-711K1.8 9.071118 4.88E-02

CYP2D7 9.043546 4.88E-02

178 Cox's proportional hazard ratio: likelihood ratio χ2, FDR-adjusted p-value threshold 0.05.

179 Genes that were not associated with joint survival are indicated with an asterisk.

180 Discussion

181 Sex-specific patterns of gene expression in liver and HCC

182 It is well established that patterns of gene expression vary between the sexes. Previous studies

183 have confounded these differences with those which may be etiologically important in cancer. For

184 example, Yuan et al. previously reported extensive sex-biased signatures in gene expression in

185 HCC and other strongly sex-biased cancers [12]. From the results presented here, it is possible to

14 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

186 distinguish the differences detected in comparisons of male and female HCC from those

187 reflecting normal sex-differences.

188 We report etiologically meaningful differences in gene expression between male and female HCC

189 cases. In addition to X- and Y-linked genes, 8 autosomal genes were expressed in a sex-biased

190 way in HCC tumors. Three of these genes - all of which were underexpressed in male HCC

191 compared to female - are of particular interest in the context of HCC: Notch-regulating DTX1 has

192 been identified as a putative tumor suppressor gene in head and neck squamous cell carcinoma

193 [26]. ATF5 is highly expressed in the liver but downregulated in HCC [27]. ATF5 is known to

194 inhibit hepatocyte proliferation [28] and re-expression of ATF5 in HCC inhibits proliferation and

195 induces G2/M arrest of the cell cycle [28]. ATF5 also may act as a negative regulator of IL-1β

196 signaling pathway in hepatocytes and thus act as a regulator of IL-1β-mediated immune response

197 [29]. CD24 has a crucial role in T cell homeostasis and autoimmunity [30]. In breast cancer, high

198 CD44/CD24 ratio is an indicator for malignancy [31]. The opposing roles of CD24 expression in

199 cancer and autoimmune diseases raise interesting questions on the role of sex differences in

200 immunity underlying female prevalence in autoimmunity [32] and male prevalence in cancer.

201 More studies are needed to better understand the differential regulation of immune functions

202 between the sexes, and how these differences contribute to the observed biases in disease

203 occurrence.

204 In both sex-specific analyses, 25 X-linked genes were detected including the MAGE family

205 (Melanoma-associated antigen), the XAGE (X Antigen) family members, the PAGE (Prostate-

206 associated) family, and the SXX (synovial sarcoma X) family. While most of the 25 X-linked

207 genes exhibited similar patterns between the sexes, PAGE4 was overexpressed only in male

208 HCC. PAGE4 is expressed in the human fetal prostate during development as well as malignant

209 prostate and prostate cancer, but not in healthy adult prostate [33]. The role of PAGE4 in prostate

210 cancer has been studied in considerable detail, and it has been indicated as a promising target for

15 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

211 therapies [33]. However, its role in other cancers has not been thoroughly explored. MAGE,

212 PAGE, and SSX genes are members of a large family of cancer testis antigens (CTAs). CTAs are

213 expressed in tumors, but not in normal tissue, with the exception of testis and placenta, and

214 recognized by cytotoxic T lymphocytes, making them ideal targets for immunotherapeutic

215 approaches [34]. MAGE expression has been associated with response to checkpoint inhibition

216 therapy in metastatic melanoma [35]. Given the observed differences here, CTA expression may

217 prove valuable in determining the potential response for checkpoint inhibition therapy in HCC

218 [36].

219 Sex-differences in the pathway and network activation strongly indicate that male and female

220 HCC are driven by distinct functional pathways. Males and females differed in oncogenic

221 processes with females showing RAS pathway enrichment while males showed PI3K/AKT

222 pathway enrichment. Additionally, female DEGs were highly enriched in genes involved in

223 innate immunity Toll-like receptor (TLR) signaling (Table S9), while male DEGs exhibited high

224 enrichment of genes involved in adaptive immunity and Notch signaling (Table S8). TLR

225 signaling has a crucial role in the regulation of the immune system by evoking an inflammatory

226 response [37]. Interestingly, TLR signaling may activate Notch signaling [38], and Notch

227 signaling may feedback modulate TLR signaling pathway to modulate inflammatory response

228 through extracellular signal-regulated kinase 1/2-mediated nuclear factor κB activation [39]. Both

229 of these processes regulate macrophage functions and are likely to have a major role in cancer

230 progression by modulating the inflammatory tumor microenvironment [37–39]. TLR and Notch

231 signaling pathways are notable targets for anti-cancer and anti-metastasis therapies [40,41].

232 Furthermore, while both sexes exhibited an enrichment of genes involved in cell cycle, apoptosis

233 and p53 signaling, only males showed an activation of pathways regulating natural killer cell

234 cytotoxicity, and females showed an activation of pathways involved in NF-kB (nuclear factor

235 kappa-light-chain-enhancer of activated B cells) activation (Fig 2).

16 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

236 Sex-specific regulatory effects of germline variants

237 We detected 24 genes under germline regulatory control in male HCC only (Fig 3B). Functional

238 annotations of these male HCC specific eGenes provide insight into possible regulatory

239 mechanisms contributing to the observed male-bias in HCC. Protein O-glucosyltransferase 1

240 (POGLUT1) was found to be under germline regulation in male HCC, but not in female HCC or

241 in joint analysis of both sexes (Fig 3D). The eVariant associated with POGLUT1 is located on a

242 promoter region of its target (Table S14). POGLUT1 is an enzyme that is responsible for O-

243 linked glycosylation of . Glycosylation is a major type of post-translational modification,

244 during which proteins are decorated with oxygen (O) or nitrogen (N) atoms of amino acids. These

245 modifications may critically alter the proteins’ folding, localization, and function [42]. Altered

246 glycosylation of proteins has been observed in many cancers [43,44], including liver cancer

247 [45,46]. Cell surface receptor-mediated growth and apoptosis have been proposed as mechanisms

248 explaining the role of O-glycosylation in cancer [47]. Interestingly, altered glycosylation of tumor

249 proteins may also allow the tumor cells to evade natural killer cell immunity [48].

250 POGLUT1 is an essential regulator of Notch signaling and is likely involved in cell fate and

251 tissue formation during development. Genes involved in Notch signaling were also found to be

252 expressed in a sex-biased way in HCC tumors and highly enriched in DEGs detected in tumor vs.

253 tumor adjacent normal comparison of male samples, pointing to a possible role of Notch in the

254 development of HCC in males. The role of Notch signaling in various biological processes has

255 been thoroughly studied in a variety of organisms and tissues, particularly in Drosophila [49].

256 Notch signaling is of particular interest in the context of HCC, as it is involved in liver

257 development [50,51], the development of sexually dimorphic traits [52], and tumorigenesis [49].

258 The role of Notch signaling in HCC may differ between early and late-stage tumors and among

259 molecular subtypes, and further studies are necessary to understand the possible oncogenic

260 properties of Notch among HCC subtypes and between the sexes.

17 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

261 Chromatin states are a major determinant of sex-biased gene expression and sex-specific

262 regulatory functions in mouse liver [53] and in human peripheral blood mononuclear cells [19].

263 Other plausible mechanisms underlying sex-specific regulatory effects are differential

264 transcription factor activity and hormonal regulation. Further studies are needed to explore the

265 role of these mechanisms in male and female HCC.

266 Regulatory mechanisms underlying HCC risk variants

267 Genetic variants in the Human Leukocyte Antigen (HLA) region have been found to be

268 associated with hepatitis B related HCC in Asian populations [22,25]. While some of these risk

269 variants were found to be associated with HLA expression in normal liver tissue [25] or

270 differentially expressed between HCC tumors and tumor adjacent tissue [22], direct evidence of

271 regulatory mechanisms connecting the risk variants to HCC was lacking. We discovered

272 regulatory variants tightly linked with known HCC risk loci associated with HLA expression in

273 HCC tumors (Table 1). Polymorphisms in the HLA region may influence immune responses and

274 oncogenesis by altering the epitope binding properties and/or expression levels of HLA

275 molecules. Loss of HLA expression, associated with an immune escape by avoiding T-cell

276 recognition, is commonly found in malignant cells [54]. Tumor immune escape is one of the

277 hallmarks of cancer, and it has been shown to have negative effects on the clinical outcome of

278 cancer immunotherapies [55]. HLA expression is critical in tumor rejection, and HLA re-

279 expression has been recognized as a major potential target in the development of

280 immunotherapies [54]. HLA allele frequencies and LD relationships among these variants vary

281 between population, and HLA polymorphisms are likely partially responsible for the ethnic

282 disparities in HBV persistence among ethnicities [56], and thus likely contribute to the observed

283 biases in HCC occurrence between populations. Interestingly, sex modulates the degree to which

284 HLA molecules propagate the selection and expansion of T cells [57]. Sex-specific regulatory

285 functions discovered in this study may partially underlie the sex-biases in diseases with immune

18 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

286 system involvement. HLA associated shaping of the immune system points to a promising target

287 for future studies to develop more targeted therapies accounting for patients’ germline genetic

288 makeup.

289 Genes under germline regulatory control alter patient survival in HCC

290 In addition to the discovery of HCC risk variants, characterizations of patterns and levels of gene

291 expression in HCC tumors have led to the discovery of biomarkers associated with HCC

292 etiologies, disease progression and outcome [58–60]. We discovered genes under germline

293 regulatory control altering overall survival in the TCGA LIHC cohort, 15 of which showed

294 differential survival effects between the sexes (Tables 2 and 3). Some of the 15 male-specific

295 survival associated eGenes have been found to affect disease progression and outcome in other

296 cancer types, indicating a pan-cancer role of these genes. In breast cancer, lower Coagulation

297 Factor XII (F12) expression is a favorable prognostic marker [61]. Interestingly, low F12

298 expression was associated with poor survival among males in the TCGA LIHC cohort (Fig 4). In

299 some population studies, variants in Cytochrome P450 2D6 (CYP2D6) have been found to be

300 associated with recurrence in breast cancer patients treated with Tamoxifen ([62] but see [63] and

301 [64]), and it is highly expressed in the liver, where its expression may affect drug clearance,

302 efficacy, and safety [65]. G protein-coupled receptor kinase 6 (GRK6) deficiency promotes

303 angiogenesis, tumor progression, and metastasis in murine models of human lung cancer [66].

304 Variations in the miRNA binding sites of the Zymogen Granule Protein 16B (ZG16B) gene are

305 associated with prognosis in patients with colorectal cancer [67]. Variants at the Dynactin Subunit

306 5 (DCTN5) gene are associated with ovarian cancer mortality [68].

307 Furthermore, several of the male-specific survival associated eGenes have been shown to be

308 involved in HCC and other liver disorders. Family With Sequence Similarity 99 Member A

309 (FAM99A) has previously been shown to be downregulated in HCC [69]. In our study, low

19 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

310 FAM99A expression indicated poor survival in males (Table 3). Carboxypeptidase X, M14

311 Family Member 2 (CPXM2) has been identified as a driver gene in HCC based on patterns of

312 somatic alterations [70]. Alkylglycerol Monooxygenase (AGMO) has been shown to be

313 hypermethylated and transcriptionally repressed in HCC in comparison to cancer-free liver tissue

314 [71]. Mutations in the Leucyl-TRNA Synthetase (LARS) gene have been identified as a cause of

315 infantile hepatopathy [72].

316 In summary, we discovered differential genetic and regulatory functions in HCC tumors between

317 the sexes. By integrating genotype and gene expression data, we identified putative regulatory

318 mechanisms underlying known HCC risk loci and discovered genes under germline regulatory

319 control modulating the overall survival of patients with HCC. Our results highlight the role of

320 differential immune functions in cancer sex-bias and germline regulatory variants associated with

321 these functions. This work provides a framework for future studies on sex-biased cancers.

322 Materials and Methods

323 Data

324 GTEx (release V6p) whole transcriptome (RNAseq) data (dbGaP accession #8834) were

325 downloaded from dbGaP. TCGA LIHC Affymetrix Human Omni 6 array genotype data, whole

326 exome sequence (WES) and RNAseq data (dbGaP accession #11368) were downloaded from

327 NCI Genomic Data Commons [73]. FASTQ read files were extracted from the TCGA LIHC

328 WES BAM files using XYAlign [74]. We used FastQC [75] to assess the WES and RNAseq

329 FASTQ quality. Reads were trimmed using TRIMMOMATIC IlluminaClip [76], with the

330 following parameters: seed mismatches 2, palindrome clip threshold 30, simple clip threshold 10,

331 leading quality value 3, trailing quality value 3, sliding window size 4, minimum window quality

332 30 and minimum read length of 50.

20 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

333 Read mapping and read count quantification

334 Reads were mapped to custom sex-specific reference genomes using HISAT2 [77]. To avoid miss

335 mapping of reads along the sex , female samples were mapped to the human

336 reference genome GRCh38 with the Y-chromosome hard-masked. Male samples were mapped to

337 the human reference genome with Y-chromosomal pseudoautosomal regions hard-masked. Gene-

338 level counts from RNAseq were quantified using Subread featureCounts [78]. Reads overlapping

339 multiple features (genes or RNA families with conserved secondary structures) were counted for

340 each feature.

341 Germline variant calling

342 BAM files were processed according to Broad Institute GATK (Genome Analysis Toolkit) best

343 practices [79–81]: Read groups were added with Picard Toolkit’s AddOrReplaceReadGroups and

344 optical duplicates marked with Picard Toolkit’s MarkDuplicates (v.2.18.1,

345 http://broadinstitute.github.io/picard/). Base quality scores were recalibrated with GATK

346 (v.4.0.3.0) BaseRecalibrator. Germline genotypes were called from whole blood Whole Exome

347 Sequence samples from 248 male and 119 female HCC cases using the scatter-gather method

348 with GATK HaplotypeCaller and GenotypeGVCFs [79]. Affymetrix 6.0 array genotypes of

349 matching samples were lifted to GRCh38 using the UCSC LiftOver tool [82] and converted to

350 VCF. Filters were applied to retain variants with a minimum quality score >30, minor allele

351 frequency >10%, minor allele count >10, and no call rate <10% across all samples.

352 Cellular content of tumor samples

353 To examine the cellular heterogeneity of tumor samples, we performed cell type enrichment

354 analysis for 64 immune and stromal cell types using xCell [83].

21 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

355 Filtering of gene expression data

356 FPKM (Fragments Per Kilobase of transcript per Million mapped reads) expression values for

357 each gene were obtained using EdgeR [84]. Each expression dataset was filtered to retain genes

358 with mean FPKM>0.5 and read count of >6 in at least 10 samples across all samples under

359 investigation to avoid falsely inferring differential expression between two groups where both are

360 functionally not transcribed.

361 Differential expression analysis

362 For differential expression (DE) analysis, filtered, untransformed read count data were quantile

363 normalized and logCPM transformed with voom [85]. From the TCGA LIHC dataset, paired

364 tumor and tumor adjacent samples were available for 22 females and 28 males. From the GTEx

365 liver dataset, 22 female and 28 male samples were randomly selected to be used in the DE

366 analysis. A multi-factor design with sex and tissue type as predictor variables were used to fit the

367 linear model. To make comparisons both between and within subjects in the paired tumor and

368 tumor adjacent samples, duplicateCorrelation was used to treat the individual as a random effect.

369 Differentially expressed genes (DEGs) between comparisons were identified using the

370 limma/voom pipeline [85]. Linear modeling was conducted using the LmFit and contrast.fit

371 functions. DEGs between sexes and tissue types were identified using contrast designs (e.g., all

372 males for tumor versus all females for tumor, or all tumor samples versus all tumor adjacent

373 samples within the same sex). In all comparisons, DEGs were identified by computing empirical

374 Bayes statistics with eBayes, with an FDR adjusted p-value threshold of 0.01 and an absolute log2

375 fold-change (log2FC) threshold of 2.

376 Enrichment of biological functions and canonical pathways

22 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

377 We utilized the Consortium’s web tool [86] to analyze gene lists for enrichment

378 in regards to molecular function, biological processes, and cellular functions. Fisher’s exact test

379 was used to calculate enrichment scores. We also used the NetworkAnalyst [87] for enrichment

380 and network-based analyses of gene lists in regards to pathway relationships (KEGG and

381 Reactome pathways). An FDR-adjusted p-value threshold of 0.01 was used to select significantly

382 enriched GO terms and canonical pathways.

383 Accounting for technical confounders and population structure

384 Gene expression values are affected by genetic, environmental, and technical factors, many of

385 which may be unknown or unmeasured. Technical confounding factors introduce sources of

386 variance that may greatly reduce the statistical power of association studies, and even cause false

387 signals [88]. Thus, it is necessary to account for known and unknown technical confounders. This

388 is often achieved by detecting a set of latent confounding factors with methods such as principal

389 component analysis (PCA), Probabilistic Estimation of Expression Residuals (PEER) [89] or

390 Surrogate Variable Analysis (SVA) [88]. These known and surrogate variables are then used as

391 covariates in downstream analyses, or expression data is adjusted using a regression model or

392 other approach e.g. ComBat, which utilizes an Empirical Bayes method [90]. In eQTL studies, the

393 number of latent factors to adjust for is often selected to maximize the number of significant

394 associations. However, such an approach may remove biologically relevant sources of variation

395 and increase the number of false positive associations. Introducing too many covariates to linear

396 models may also cause a problem of overfitting. Furthermore, confounding factor methods may

397 model the effects of trans-acting broad impact eQTL as confounding variation [91,92].

398 Our goal was to remove the technical sources of variation while protecting the biologically

399 relevant variation. Because of missing values in technical covariate data, we were unable to adjust

400 the expression data directly for known covariates. We identified 50 PEER factors in the tumor

23 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

401 data. Some of these latent factors were associated with known clinical phenotypes or technical

402 covariates (Fig S4). Weights of the PEER factors that did not strongly correlate with biologically

403 relevant phenotypes were used as covariates in the eQTL analysis. To select these PEER factors,

404 FDR adjusted p-value threshold of 0.01 based on linear regression for continuous covariates and

405 ANOVA for categorical covariates were used. To avoid overfitting, the number of PEER factors

406 to include was limited to 10.

407 We used the R package SNPRelate [93] to perform PCA on the germline genotype data. We

408 accounted for population structure by applying the first three genotype PCs as covariates in the

409 eQTL analysis.

410 eQTL analysis

411 Germline genotypes and tumor gene expression data from 248 male and 119 female donors were

412 available for use in the eQTL analysis. Filtered count data was normalized by fitting the FPKM

413 values of each gene and sample to the quantiles of the normal distribution. cis-acting eQTLs were

414 detected with QTLtools v.1.1 [94]. We used the permutation pass with 10,000 permutations to get

415 adjusted p-values of associations between the phenotypes and the top WES and array variants in

416 cis. FDR adjusted p-values were calculated to correct for multiple phenotypes tested. An FDR-

417 adjusted p-value threshold of 0.01 was used to select significant associations.

418 Annotating genomic locations of putative regulatory variants

419 We used the R package Annotatr to annotate the genomic locations of eVariants [95]. Variant

420 sites were annotated for promoters, 5'UTRs, exons, introns, 3'UTRs, CpGs (CpG islands, CpG

421 shores, CpG shelves), and putative regulatory regions based on ChromHMM [96] annotations.

422 Overlap of eQTL loci and HCC GWAS loci

24 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

423 Due to the lack of genome-wide GWAS summary statistics, we were unable to test for

424 colocalization of GWAS and eVariants. To overcome this, we used top HCC GWAS variants and

425 top eVariant to detect high confidence colocalization events based on linkage. Locations of 32

426 GWAS loci associated with HCC were obtained from the NHGRI-EBI GWAS catalog [97]. To

427 find instances where known GWAS loci and eQTL loci are overlapping or tightly linked, we

428 searched for loci with r2>0.8 of the GWAS risk . Linked loci were obtained with the

429 LDLink’s LDProxy tool [98].

430 Survival analyses

431 We used the R package survival to test for differences in survival between different eVariant

432 genotypes. To detect possible differences in early and late survival, we used the surv_pvalue

433 function with log-rank, Tarone-Ware and Fleming-Harrington (p=1, q=1) tests. For genotype

434 comparisons, the pairwise_survdiff function from the R package survminer was used to calculate

435 p-values between all pairwise comparisons. To test for the effect of eGene expression on patient

436 survival, we utilized Cox’s proportional hazards regression model with the function coxph. FDR

437 adjusted p-values were calculated with the p.adjust function from the stats R package.

438 Acknowledgments

439 This study was supported by ASU Center for Evolution and Medicine postdoctoral fellowship for

440 HMN, ASU School of Life Sciences startup funds for MAW and ASU Center for Evolution and

441 Medicine Venture funds.

442 Citations

443 1. Clocchiatti A, Cora E, Zhang Y, Dotto GP. Sexual dimorphism in cancer. Nat Rev 444 Cancer. 2016;16: 330.

25 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

445 2. Regitz‐Zagrosek Vera. Sex and gender differences in health. EMBO Rep. 2012;13: 596– 446 603.

447 3. Jaillon S, Berthenet K, Garlanda C. Sexual Dimorphism in Innate Immunity. Clin Rev 448 Allergy Immunol. 2017; doi:10.1007/s12016-017-8648-x

449 4. Klein SL. The effects of hormones on sex differences in infection: from genes to 450 behavior. Neurosci Biobehav Rev. 2000;24: 627–638.

451 5. Klein SL, Flanagan KL. Sex differences in immune responses. Nat Rev Immunol. 452 2016;16: 626–638.

453 6. Gilks WP, Abbott JK, Morrow EH. Sex differences in disease genetics: evidence, 454 evolution, and detection. Trends Genet. 2014;30: 453–463.

455 7. Morrow EH. The evolution of sex differences in disease. Biol Sex Differ. 2015;6: 5.

456 8. Wands J. Hepatocellular carcinoma and sex. N Engl J Med. 2007;357: 1974–1976.

457 9. Petrick JL, Braunlin M, Laversanne M, Valery PC, Bray F, McGlynn KA. International 458 trends in liver cancer incidence, overall and by histologic subtype, 1978-2007. Int J Cancer. 459 2016;139: 1534–1545.

460 10. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer 461 statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 462 185 countries. CA Cancer J Clin. 2018; doi:10.3322/caac.21492

463 11. Tang R, Liu H, Yuan Y, Xie K, Xu P, Liu X, et al. Genetic factors associated with risk of 464 metabolic syndrome and hepatocellular carcinoma. Oncotarget. 2017;8: 35403–35411.

465 12. Yuan Y, Liu L, Chen H, Wang Y, Xu Y, Mao H, et al. Comprehensive Characterization 466 of Molecular Differences in Cancer between Male and Female Patients. Cancer Cell. 2016;29: 467 711–722.

468 13. Agarwal R, Cao Y, Hoffmeier K, Krezdorn N, Jost L, Meisel AR, et al. Precision 469 medicine for hepatocelluar carcinoma using molecular pattern diagnostics: results from a 470 preclinical pilot study. Cell Death Dis. 2017;8: e2867.

471 14. Yin L, Cai Z, Zhu B, Xu C. Identification of Key Pathways and Genes in the Dynamic 472 Progression of HCC Based on WGCNA. Genes . 2018;9. doi:10.3390/genes9020092

473 15. Wen J, Song C, Liu J, Chen J, Zhai X, Hu Z. Expression quantitative trait loci for 474 TNFRSF10 influence both HBV infection and hepatocellular carcinoma development. J Med 475 Virol. 2016;88: 474–480.

476 16. Ma S, Yang J, Song C, Ge Z, Zhou J, Zhang G, et al. Expression quantitative trait loci for 477 PAX8 contributes to the prognosis of hepatocellular carcinoma. PLoS One. 2017;12: e0173700.

478 17. Tian T, Song C, Pu Z-N, Ge Z-J, Yu C-X, Liu J-B, et al. Expression quantitative trait loci 479 for PVT1 contributes to the prognosis of hepatocellular carcinoma. HRMAGAZINE. 2018;4: 27.

26 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

480 18. Chow JC, Yen Z, Ziesche SM, Brown CJ. Silencing of the mammalian X chromosome. 481 Annu Rev Genomics Hum Genet. 2005;6: 69–92.

482 19. Kukurba KR, Parsana P, Balliu B, Smith KS, Zappala Z, Knowles DA, et al. Impact of 483 the X Chromosome and sex on regulatory variation. Genome Res. 2016;26: 768–777.

484 20. Dimas AS, Nica AC, Montgomery SB, Stranger BE, Raj T, Buil A, et al. Sex-biased 485 genetic effects on gene regulation in humans. Genome Res. 2012;22: 2368–2375.

486 21. Kumar V, Kato N, Urabe Y, Takahashi A, Muroyama R, Hosono N, et al. Genome-wide 487 association study identifies a susceptibility locus for HCV-induced hepatocellular carcinoma. Nat 488 Genet. 2011;43: 455–458.

489 22. Jiang D-K, Sun J, Cao G, Liu Y, Lin D, Gao Y-Z, et al. Genetic variants in STAT4 and 490 HLA-DQ genes confer risk of hepatitis B virus-related hepatocellular carcinoma. Nat Genet. 491 2013;45: 72–75.

492 23. Li S, Qian J, Yang Y, Zhao W, Dai J, Bei J-X, et al. GWAS identifies novel susceptibility 493 loci on 6p21.32 and 21q21.3 for hepatocellular carcinoma in chronic hepatitis B virus carriers. 494 PLoS Genet. 2012;8: e1002791.

495 24. Lin Y-Y, Yu M-W, Lin S-M, Lee S-D, Chen C-L, Chen D-S, et al. Genome-wide 496 association analysis identifies a GLUL haplotype for familial hepatitis B virus-related 497 hepatocellular carcinoma. Cancer. 2017;123: 3966–3976.

498 25. Sawai H, Nishida N, Khor S-S, Honda M, Sugiyama M, Baba N, et al. Genome-wide 499 association study identified new susceptible genetic variants in HLA class I region for hepatitis B 500 virus-related hepatocellular carcinoma. Sci Rep. 2018;8: 7958.

501 26. Gaykalova DA, Zizkova V, Guo T, Tiscareno I, Wei Y, Vatapalli R, et al. Integrative 502 computational analysis of transcriptional and epigenetic alterations implicates DTX1 as a putative 503 tumor suppressor gene in HNSCC. Oncotarget. 2017;8: 15349–15363.

504 27. Pascual M, Gómez-Lechón MJ, Castell JV, Jover R. ATF5 is a highly abundant liver- 505 enriched transcription factor that cooperates with constitutive androstane receptor in the 506 transactivation of CYP2B6: implications in hepatic stress responses. Drug Metab Dispos. 507 2008;36: 1063–1072.

508 28. Gho JW-M, Ip W-K, Chan KY-Y, Law PT-Y, Lai PB-S, Wong N. Re-expression of 509 transcription factor ATF5 in hepatocellular carcinoma induces G2-M arrest. Cancer Res. 2008;68: 510 6743–6751.

511 29. Abe T, Kojima M, Akanuma S, Iwashita H, Yamazaki T, Okuyama R, et al. N-terminal 512 hydrophobic amino acids of activating transcription factor 5 (ATF5) protein confer interleukin 1β 513 (IL-1β)-induced stabilization. J Biol Chem. 2014;289: 3888–3900.

514 30. Liu Y, Zheng P. CD24: a genetic checkpoint in T cell homeostasis and autoimmune 515 diseases. Trends Immunol. 2007;28: 315–320.

516 31. Li W, Ma H, Zhang J, Zhu L, Wang C, Yang Y. Unraveling the roles of CD44/CD24 and 517 ALDH1 as cancer stem cell markers in tumorigenesis and metastasis. Sci Rep. 2017;7: 13856.

27 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

518 32. Whitacre CC. Sex differences in autoimmune disease. Nat Immunol. 2001;2: 777–780.

519 33. Kulkarni P, Dunker AK, Weninger K, Orban J. Prostate-associated gene 4 (PAGE4), an 520 intrinsically disordered cancer/testis antigen, is a novel therapeutic target for prostate cancer. 521 Asian J Androl. 2016;18: 695–703.

522 34. Mazzone R, Zwergel C, Mai A, Valente S. Epi-drugs in combination with 523 immunotherapy: a new avenue to improve anticancer efficacy. Clin Epigenetics. 2017;9: 59.

524 35. Shukla SA, Bachireddy P, Schilling B, Galonska C, Zhan Q, Bango C, et al. Cancer- 525 Germline Antigen Expression Discriminates Clinical Outcome to CTLA-4 Blockade. Cell. 526 2018;173: 624–633.e8.

527 36. Iñarrairaegui M, Melero I, Sangro B. Immunotherapy of Hepatocellular Carcinoma: Facts 528 and Hopes. Clin Cancer Res. 2018;24: 1518–1524.

529 37. Gu J, Liu Y, Xie B, Ye P, Huang J, Lu Z. Roles of toll-like receptors: From inflammation 530 to lung cancer progression. Biomed Rep. 2018;8: 126–132.

531 38. Palaga T, Buranaruk C, Rengpipat S, Fauq AH, Golde TE, Kaufmann SHE, et al. Notch 532 signaling is activated by TLR stimulation and regulates macrophage functions. Eur J Immunol. 533 2008;38: 174–183.

534 39. Zhang Q, Wang C, Liu Z, Liu X, Han C, Cao X, et al. Notch signal suppresses Toll-like 535 receptor-triggered inflammatory responses in macrophages by inhibiting extracellular signal- 536 regulated kinase 1/2-mediated nuclear factor κB activation. J Biol Chem. 2012;287: 6208–6217.

537 40. Li L, Tang P, Li S, Qin X, Yang H, Wu C, et al. Notch signaling pathway networks in 538 cancer metastasis: a new target for cancer therapy. Med Oncol. 2017;34: 180.

539 41. Braunstein MJ, Kucharczyk J, Adams S. Targeting Toll-Like Receptors for Cancer 540 Therapy. Target Oncol. 2018;13: 583–598.

541 42. Shental-Bechor D, Levy Y. Effect of glycosylation on protein folding: a close look at 542 thermodynamic stabilization. Proc Natl Acad Sci U S A. 2008;105: 8256–8261.

543 43. Pinho SS, Reis CA. Glycosylation in cancer: mechanisms and clinical implications. Nat 544 Rev Cancer. 2015;15: 540–555.

545 44. Stowell SR, Ju T, Cummings RD. Protein glycosylation in cancer. Annu Rev Pathol. 546 2015;10: 473–510.

547 45. Mehta A, Herrera H, Block T. Chapter Seven - Glycosylation and Liver Cancer. In: 548 Drake RR, Ball LE, editors. Advances in Cancer Research. Academic Press; 2015. pp. 257–279.

549 46. Liang K-H, Yeh C-T. O-glycosylation in liver cancer: Clinical associations and potential 550 mechanisms. Liver Research. 2017;1: 193–196.

551 47. Zhu W, Leber B, Andrews DW. Cytoplasmic O-glycosylation prevents cell surface 552 transport of E-cadherin during apoptosis. EMBO J. 2001;20: 5999–6007.

28 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

553 48. Tsuboi S, Sutoh M, Hatakeyama S, Hiraoka N, Habuchi T, Horikawa Y, et al. A novel 554 strategy for evasion of NK cell immunity by tumours expressing core2 O-glycans. EMBO J. 555 2011;30: 3173–3185.

556 49. Ntziachristos P, Lim JS, Sage J, Aifantis I. From fly wings to targeted cancer therapies: a 557 centennial for notch signaling. Cancer Cell. 2014;25: 318–334.

558 50. McDaniell R, Warthen DM, Sanchez-Lara PA, Pai A, Krantz ID, Piccoli DA, et al. 559 NOTCH2 mutations cause Alagille syndrome, a heterogeneous disorder of the notch signaling 560 pathway. Am J Hum Genet. 2006;79: 169–173.

561 51. Oda T, Elkahloun AG, Pike BL, Okajima K, Krantz ID, Genin A, et al. Mutations in the 562 human Jagged1 gene are responsible for Alagille syndrome. Nat Genet. 1997;16: 235–242.

563 52. Deng H, Jasper H. Sexual Dimorphism: How Female Cells Win the Race. Curr Biol. 564 2016;26: R212–5.

565 53. Sugathan A, Waxman DJ. Genome-wide analysis of chromatin states reveals distinct 566 mechanisms of sex-dependent gene regulation in male and female mouse liver. Mol Cell Biol. 567 2013;33: 3594–3610.

568 54. Garrido F, Aptsiauri N, Doorduijn EM, Garcia Lora AM, van Hall T. The urgent need to 569 recover MHC class I in cancers for effective immunotherapy. Curr Opin Immunol. 2016;39: 44– 570 51.

571 55. Beatty GL, Gladney WL. Immune escape mechanisms as a guide for cancer 572 immunotherapy. Clin Cancer Res. 2015;21: 687–692.

573 56. Akcay IM, Katrinli S, Ozdil K, Doganay GD, Doganay L. Host genetic factors affecting 574 hepatitis B infection outcomes: Insights from genome-wide association studies. World J 575 Gastroenterol. 2018;24: 3347–3360.

576 57. Schneider-Hohendorf T, Görlich D, Savola P, Kelkka T, Mustjoki S, Gross CC, et al. Sex 577 bias in MHC I-associated shaping of the adaptive immune system. Proc Natl Acad Sci U S A. 578 2018;115: 2168–2173.

579 58. Hass HG, Jobst J, Scheurlen M, Vogel U, Nehls O. Gene expression analysis for 580 evaluation of potential biomarkers in hepatocellular carcinoma. Anticancer Res. 2015;35: 2021– 581 2028.

582 59. Meng C, Shen X, Jiang W. Potential biomarkers of HCC based on gene expression and 583 DNA methylation profiles. Oncol Lett. 2018;16: 3183–3192.

584 60. Ferrín G, Aguilar-Melero P, Rodríguez-Perálvarez M, Montero-Álvarez JL, de la Mata 585 M. Biomarkers for hepatocellular carcinoma: diagnostic and therapeutic utility. Hepat Med. 586 2015;7: 1–10.

587 61. Todd JR, Ryall KA, Vyse S, Wong JP, Natrajan RC, Yuan Y, et al. Systematic analysis 588 of tumour cell-extracellular matrix adhesion identifies independent prognostic factors in breast 589 cancer. Oncotarget. 2016;7: 62939–62953.

29 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

590 62. Zembutsu H, Nakamura S, Akashi-Tanaka S, Kuwayama T, Watanabe C, Takamaru T, et 591 al. Significant Effect of Polymorphisms in CYP2D6 on Response to Tamoxifen Therapy for 592 Breast Cancer: A Prospective Multicenter Study. Clin Cancer Res. 2017;23: 2019–2026.

593 63. Hertz DL, Kidwell KM, Hilsenbeck SG, Oesterreich S, Osborne CK, Philips S, et al. 594 CYP2D6 genotype is not associated with survival in breast cancer patients treated with 595 tamoxifen: results from a population-based study. Breast Cancer Res Treat. 2017;166: 277–287.

596 64. Lu J, Li H, Guo P, Shen R, Luo Y, Ge Q, et al. The effect of CYP2D6 *10 polymorphism 597 on adjuvant tamoxifen in Asian breast cancer patients: a meta-analysis. Onco Targets Ther. 598 2017;10: 5429–5437.

599 65. Stevens JC, Marsh SA, Zaya MJ, Regina KJ, Divakaran K, Le M, et al. Developmental 600 changes in human liver CYP2D6 expression. Drug Metab Dispos. 2008;36: 1587–1593.

601 66. Raghuwanshi SK, Smith N, Rivers EJ, Thomas AJ, Sutton N, Hu Y, et al. G protein- 602 coupled receptor kinase 6 deficiency promotes angiogenesis, tumor progression, and metastasis. J 603 Immunol. 2013;190: 5329–5336.

604 67. Kang BW, Lee SJ, Lee YJ, Kim JG, Chae YS, Sohn SK, et al. Genetic variations in 605 miRNA binding site of TPST1 and ZG16B associated with prognosis for patients with colorectal 606 cancer. J Clin Orthod. American Society of Clinical Oncology; 2013;31: 3553–3553.

607 68. Goode EL, Chenevix-Trench G, Hartmann LC, Fridley BL, Kalli KR, Vierkant RA, et al. 608 Assessment of hepatocyte growth factor in ovarian cancer mortality. Cancer Epidemiol 609 Biomarkers Prev. 2011;20: 1638–1648.

610 69. Esposti DD, Hernandez-Vargas H, Voegele C, Fernandez-Jimenez N, Forey N, Bancel B, 611 et al. Identification of novel long non-coding RNAs deregulated in hepatocellular carcinoma 612 using RNA-sequencing. Oncotarget. 2016;7: 31862–31877.

613 70. Ling S, Hu Z, Yang Z, Yang F, Li Y, Lin P, et al. Extremely high genetic diversity in a 614 single tumor points to prevalence of non-Darwinian cell evolution. Proc Natl Acad Sci U S A. 615 2015;112: E6496–505.

616 71. Udali S, Guarini P, Ruzzenente A, Ferrarini A, Guglielmi A, Lotto V, et al. DNA 617 methylation and gene expression profiles show novel regulatory pathways in hepatocellular 618 carcinoma. Clin Epigenetics. 2015;7: 43.

619 72. Casey JP, McGettigan P, Lynam-Lennon N, McDermott M, Regan R, Conroy J, et al. 620 Identification of a mutation in LARS as a novel cause of infantile hepatopathy. Mol Genet Metab. 621 2012;106: 351–358.

622 73. Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, et al. Toward a 623 Shared Vision for Cancer Genomic Data. N Engl J Med. 2016;375: 1109–1112.

624 74. Webster TH, Couse M, Grande BM, Karlins E, Phung TN, Richmond PA, et al. 625 Identifying, understanding, and correcting technical biases on the sex chromosomes in next- 626 generation sequencing data [Internet]. bioRxiv. 2018. p. 346940. doi:10.1101/346940

30 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

627 75. Andrews S. FastQC A Quality Control tool for High Throughput Sequence Data. In: 628 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ [Internet]. 2010. Available: 629 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

630 76. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence 631 data. Bioinformatics. 2014;30: 2114–2120.

632 77. Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory 633 requirements. Nat Methods. 2015;12: 357–360.

634 78. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for 635 assigning sequence reads to genomic features. Bioinformatics. 2014;30: 923–930.

636 79. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The 637 Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA 638 sequencing data. Genome Res. 2010;20: 1297–1303.

639 80. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A 640 framework for variation discovery and genotyping using next-generation DNA sequencing data. 641 Nat Genet. 2011;43: 491–498.

642 81. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, 643 et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best 644 practices pipeline. Curr Protoc Bioinformatics. 2013;43: 11.10.1–33.

645 82. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, et al. The 646 UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34: D590–8.

647 83. Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular heterogeneity 648 landscape. Genome Biol. 2017;18: 220.

649 84. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential 650 expression analysis of digital gene expression data. Bioinformatics. 2010;26: 139–140.

651 85. Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model 652 analysis tools for RNA-seq read counts. Genome Biol. 2014;15: R29.

653 86. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, et al. PANTHER version 11: 654 expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool 655 enhancements. Nucleic Acids Res. 2017;45: D183–D189.

656 87. Xia J, Gill EE, Hancock REW. NetworkAnalyst for statistical, visual and network-based 657 meta-analysis of gene expression data. Nat Protoc. 2015;10: 823–844.

658 88. Leek JT, Storey JD. Capturing Heterogeneity in Gene Expression Studies by Surrogate 659 Variable Analysis. PLoS Genet. 2007;3: e161.

660 89. Stegle O, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of 661 expression residuals (PEER) to obtain increased power and interpretability of gene expression 662 analyses. Nat Protoc. 2012;7: 500–507.

31 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

663 90. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data 664 using empirical Bayes methods. Biostatistics. 2007;8: 118–127.

665 91. Fusi N, Stegle O, Lawrence ND. Joint Modelling of Confounding Factors and Prominent 666 Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies. PLoS Comput 667 Biol. 2012;8: e1002330.

668 92. Goldinger A, Henders AK, McRae AF, Martin NG, Gibson G, Montgomery GW, et al. 669 Genetic and Nongenetic Variation Revealed for the Principal Components of Human Gene 670 Expression. Genetics. 2013;195: 1117.

671 93. Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS. A high-performance 672 computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 673 2012;28: 3326–3328.

674 94. Delaneau O, Ongen H, Brown AA, Fort A, Panousis N, Dermitzakis E. A complete tool 675 set for molecular QTL discovery and analysis. bioRxiv. 2016; doi:10.1101/068635

676 95. Cavalcante RG, Sartor MA. annotatr: genomic regions in context. Bioinformatics. 677 2017;33: 2381–2383.

678 96. Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and 679 characterization. Nat Methods. 2012;9: 215–216.

680 97. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI 681 Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 682 2017;45: D896–D901.

683 98. Machiela MJ, Chanock SJ. LDlink: a web-based application for exploring population- 684 specific haplotype structure and linking correlated alleles of possible functional variants. 685 Bioinformatics. 2015;31: 3555–3557.

686 99. Lee M-H, Huang Y-H, Chen H-Y, Khor S-S, Chang Y-H, Lin Y-J, et al. Human 687 leukocyte antigen variants and risk of hepatocellular carcinoma modified by hepatitis C virus 688 genotypes: A genome-wide association study. Hepatology. 2017; doi:10.1002/hep.29531

689 Figure Captions

690 Fig 1. Sex-differences in gene expression in healthy liver, HCC tumor adjacent, and HCC

691 tumor tissue. X-linked genes are indicated in red, Y-linked in blue, and autosomal genes in

692 black. An FDR-adjusted p-value threshold of 0.01 and absolute log2(FC) threshold of 2 were used

693 to select significant genes. Multiple transcripts of the long non-coding RNA XIST are

694 independently expressed.

32 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

695 Fig 2. Expression and regulatory networks underlying male and female HCC. A: Volcano

696 plots of DEGs in tumor vs. tumor adjacent comparisons in the sex-specific analyses and in joint

697 analysis of both sexes. X-linked genes are indicated as red, Y-linked as blue, and autosomal as

698 black. Significant genes were selected based on an FDR-adjusted p-value threshold of 0.01 and

699 absolute log2(FC) threshold of 2. B: Venn-diagram of the overlap of DEGs in the sex-specific

700 analyses and in joint analysis of both sexes. C: Overlap of functional pathways and canonical

701 networks enriched in males, females, and in the joint analysis of both sexes. C: Examples of sex-

702 specific and sex-shared pathways.

703 Fig 3. Sex-specific genetic effects on gene expression. A: QQ-plot of eQTL associations in the

704 joint analysis of both sexes (grey), male HCC (blue) and female HCC (red). B: Overlap of

705 eGenes detected in joint and stratified analyses. C: Genomic annotations of eVariants detected in

706 joint and stratified analyses. CDS: coding sequences. D: An example of a male-specific eQTL.

707 POGLUT1 expression is modulated by a germline variant in cis in male HCC, but not in female

708 HCC nor in joint analysis of both sexes.

709 Fig 4. An example of a gene under germline regulatory control with differential effects on

710 survival between the sexes. A: Expression of F12 in female and male HCC. B-D: Effect of F12

711 expression on survival in joint analysis of both sexes (B), female (C), and male (D). Cox’s

712 proportional hazard ratio, FDR-adjusted p-value threshold 0.01. For visualization, samples with

713 normalized expression value >1 were selected as high expression and <-1 as low expression

714 samples.

715 Supporting Information

33 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

716 S1 Fig. Expression of homologous X and Y linked genes in the different sexes and tissues.

717 For females, expression of X-linked gene is plotted. For males, combined expression of X- and

718 Y-linked genes is shown.

719 S2 Fig. Absolute log2-fold changes of DEGs detected from tumor vs. tumor-adjacent

720 comparisons in the unstratified analysis of both sexes and in sex-specific analyses. Absolute

721 log2-fold changes are given across all samples, female samples, and male samples.

722 S3 Fig. Cell content of study samples based on cell-type enrichment analysis from bulk-

723 RNAseq data.

724 S4 Fig. PEER factors associated with continuous and categorical sample and

725 patient covariates. A-B: R2 and FDR-adjusted p-values between PEER factors and continuous

726 technical and biological covariates. C-D: ANOVA test statistic and FDR-adjusted p-values

727 between PEER factors and categorical technical and biological covariates.

728 S1 Table. Sex-biased gene expression in GTEx liver tissue samples. N males = 28, N females

729 = 22. Empirical p-value based on sex-label shuffling with 1000 permutations.

730 Table S2. Sex-biased gene expression in TCGA LIHC tumor adjacent samples. N males =

731 28, N females = 22. Empirical p-value based on sex-label shuffling with 1000 permutations.

732 Table S3. Sex-biased gene expression in TCGA LIHC tumor samples. N males = 28, N

733 females = 22. Empirical p-value based on sex-label shuffling with 1000 permutations.

734 Table S4. Differentially expressed genes between matched tumor and tumor-adjacent

735 samples in joint analysis of male and female samples. Each tissue N = 50, N female donors =

736 22, N male donors = 28.

34 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

737 Table S5. Differentially expressed genes between matched male tumor and tumor-adjacent

738 samples. N of sample pairs = 28.

739 Table S6. Differentially expressed genes between matched female tumor and tumor-

740 adjacent samples. N of sample pairs = 22.

741 Table S7. Enriched GO terms and canonical networks in tumor vs. tumor-adjacent DEGs in

742 the joint analysis of both sexes.

743 Table S8. Enriched GO terms and canonical networks in male-specific tumor vs. tumor-

744 adjacent DEGs.

745 Table S9. Enriched GO terms and canonical networks in female-specific tumor vs. tumor-

746 adjacent DEGs.

747 Table S10. cis-eQTLs detected in joint analysis of male and female tumors.

748 Table S11. cis-eQTLs detected in male tumors.

749 Table S12. cis-eQTLs detected in female tumors.

750 Table S13. Genomic annotations of regulatory variants detected in joint analysis of both

751 sexes.

752 Table S14. Genomic annotations of regulatory variants detected in the male-specific

753 analysis.

754 Table S15. Genomic annotations of regulatory variants detected in the female-specific

755 analysis.

35 bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/507939; this version posted December 28, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.