Identifying sex-specifc genetic effects across 733 traits in UK Biobank

James Han Yale University Wei Jiang Department of Biostatistics, Yale School of Public Health Yixuan Ye Yale University https://orcid.org/0000-0002-2643-665X Hongyu Zhao (  [email protected] ) Yale University https://orcid.org/0000-0003-1195-9607

Article

Keywords: sex-specifcity, diseases, traits, polygenic risk

Posted Date: July 20th, 2021

DOI: https://doi.org/10.21203/rs.3.rs-701876/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License

Page 1/21 Abstract

Sex-specifcity has been reported in a wide range of diseases and complex traits. While sex-specifc genetic effects have been documented for certain traits, the genetic mechanisms underlying sex differences in most traits remain largely unexplored. With its large sample size and wide range of diseases and traits, the UK Biobank—a large, prospective cohort study containing health history, phenotypic measurements, and genetic data for over 500,000 individuals— provides an opportunity to explore sexually dimorphic genetic architectures in a large number of traits and diseases. Here, we present a sex-specifc analysis of 733 sex-stratifed complex trait GWAS for 361,194 white British individuals in the UK Biobank, and report 16 traits with signifcant sex-specifc differences in heritability. These 16 candidate traits with sex-specifc genetic effects belong to 5 distinct groups: body fat mass and distribution, blood pressure, creatinine levels, snoring, and birth weight. Using a systematic sex-specifc discovery-replication analysis, we identify 47 (31 novel) loci showing sex-specifc effects on the traits related with body fat mass/distribution, blood pressure, and birth weight, and discover 74 potential sex-specifc biological pathways from the enrichment analyses based on associated from QTL analysis. In addition, we present further evidence for signifcant sex-specifc genetic effects in 13 traits spanning three trait groups (body fat mass/distribution, blood pressure, and birth weight) by comparing the prediction performance of sex-specifc polygenic risk scores.

Introduction

There are signifcant sex differences for many traits and diseases1, from cardiovascular diseases, asthma, autoimmune diseases, and mental illnesses to anthropometric traits such as BMI, body fat composition, and blood pressure2-8. Sex differences can originate from a wide range of factors, from genetic, hormonal, and other biological factors to environmental and sociological factors9,10. Understanding the mechanisms contributing to sex differences in various diseases and traits can aid in our understanding of the biological origins of diseases. These sex differences also present opportunities for improved therapies, where sex-specifc disease etiologies may require different treatment strategies, and are an important factor for equitable medical care1.

Here we focus on the genetic basis of sex differences for a range of common traits. Previous studies have already demonstrated the existence of sexually dimorphic genetic effects for certain traits. For example, Weiss et al discovered sex-specifc genetic architectures in some quantitative traits in Hutterites, such as blood lipid levels, blood pressure, and height11. Rawlik et al identifed sex-specifc genetic architectures for 14 complex traits such as basal metabolic rate, waist-hip ratio, and blood pressure, among others9. Using sex-stratifed -Wide Association Studies (GWAS) data collected by the GIANT consortium, Randall et al discovered 7 sexually dimorphic genetic loci for waist phenotypes12. More recently, Rask-Andersen et al observed genetic sex-heterogeneity specifcally in body fat distribution, identifying 37 variants showing stronger effects in females13. Sex-specifc genetic effects have also been observed in complex diseases, with recent studies showing sex-specifc risk alleles for asthma, coronary artery disease, diabetes, and Crohn’s disease14-16.

While sex-specifc genetic differences have been discovered for some traits in these studies, a thorough scan of the specifc genetic mechanisms conferring sexual dimorphism is still lacking for the majority of common traits and diseases1. Even for the traits that have been studied, most were designed to search for global evidence of sex-specifc genetic architectures, without identifying loci showing sex-specifc effects (SSE)13,17, while others were underpowered in detecting sex-specifc genetic loci9,15, many of which may only have small effect differences through varying disease etiology and mechanism.

The UK Biobank is a large, prospective cohort study of over 500,000 individuals, with rich data including questionnaire responses, phenotypic measurements, disease and health information, and genetic data18. This provides an opportunity to scan a large number of common traits and diseases for potential different genetic architectures between males and females and identify loci having SSE with comparatively higher statistical power. Although there were published studies analyzing selected traits within the UK Biobank, these studies either considered a subset of the samples9 or used methods that are not specifcally designed to detect sexually dimorphic loci13,19. With recent releases of more participant genomic and phenotypic data, as well as comprehensive GWAS analyses of these participants, the UK Biobank presents a promising resource for a more comprehensive investigation of sex-specifc genetic effects20,21.

In this manuscript, we present a sex-specifc analysis of 733 complex traits in the UK Biobank. A total of 16 traits have signifcant differences in heritability between males and females. These 16 traits belong to 5 distinct groups: body fat mass/distribution, blood pressure, creatinine levels, snoring, and birth weight. Using sex stratifed GWAS analysis, we initially discovered 360 SSE loci across these traits. To validate these fndings, we conduct a replication study using an independent set of individuals within the same genetic population from the UK Biobank, yielding 47 replicated SSE loci for traits related with body fat mass/distribution and blood pressure, of which 31 are novel. We then investigate possible biological interpretations of the SSE loci through pathway enrichment analysis, and report 74 possible sex-specifc pathways.

We further present evidence for signifcant sex-specifc genetic effects in 13 of the 16 traits through polygenic risk prediction. Using two different polygenic risk prediction methods, namely pruning and thresholding (P+T)22 and PRS-CS23, we report signifcant differences in risk prediction between models trained on male- and female-specifc training sets for traits related to fat mass/distribution, blood pressure, and birth weight.

Results

An overview of our analysis procedure, as well as our results, can be found in Figure 1. Due to the large number of phenotypes presented in the UK Biobank, as well as the presence of redundant phenotypes, we frstly chose 733 traits belonging to 7 categories, including lifestyles, clinical measurements, health and medical status, cognitive functions, biomarkers, and diagnosed diseases. Then we conducted sex-stratifed GWAS analyses in an initial discovery set for the 733 traits, and tested GWAS summary statistics for sex-specifc heritability difference. For the traits selected based on heritability difference tests, we identifed candidate sex-specifc loci. Then we used a replication set to validate the fndings, and conducted a literature review, QTL analysis, and pathway

Page 2/21 analysis based on the replicated loci. Finally, we used the discovery and replication sets to compare sex-specifc and sex-agnostic PRS models to demonstrate further evidence for sex-specifc genetic effects. The methodology details can be found in Methods Section.

Heritability Differences and Genetic Correlations

Of the 733 traits analyzed, 15 traits showed signifcant sex-differences in heritability: whole body fat mass (WFM, Field NO.: 23100), right leg fat percentage (RLFP, Field NO.: 23111), left leg fat percentage (LLFP, Field NO.: 23115), right leg fat mass (RLFM, Field NO.: 23112), right arm fat percentage (RAFP, Field NO.: 23119), left arm fat percentage (LAFP, Field NO.: 23123), right arm fat mass (RAFM, Field NO.: 23120), left arm fat mass (LAFM, Field NO.: 23124), trunk fat mass (TFM, Field NO.: 23128), diastolic blood pressure (DBP, Field NO.: 4079), systolic blood pressure (SBP, Field NO.: 4080), high blood pressure (HBP, Field NO.: 6150_4), snoring (SNRG, Field NO.: 1210), creatinine in urine (CRT, Field NO.: 30510), and birth weight (BW, Field NO.: 20022). Both sides of a bilateral symmetrical trait are presumed to have similar genetic basis. We noticed that left leg fat mass (LLFM, Field NO.: 23116), which pairs with right leg fat mass (RLFM, Field NO.: 23112), also showed a large sex difference in heritabilities with a p-value of 0.002. Although it did not pass the FDR threshold, we still included it in the remaining analyses, since it is the other side of a symmetrical trait. Figure 2 shows the scatter plot of the heritabilities for these traits in males and females. Like previous studies, which have reported comparatively higher heritabilities for many traits in females over males6,9,17, our results also showed that the heritability estimates are higher in females than in males for most traits. Genetic correlations between males and females showed high similarity between the male and female summary statistics, with genetic correlation estimates for the 16 traits all above 0.8859. However, the genetic correlations for all traits except birth weight were statistically signifcant from 1. The heritability differences and genetic correlation estimates for all 733 traits analyzed can be found in Supplementary Table 1.

To determine the relationships between these traits, genetic correlations between each of the traits’ combined summary statistics were calculated, and fve groups emerged, corresponding to fat mass/distribution, blood pressure, snoring, creatinine, and birth weight (Fig Genetic Correlations). From the genetic correlation results, we divided the 16 sex-specifc traits into 5 phenotype groups—fat mass related traits (FM, 10 traits), blood pressure related traits (BP, 3 traits), creatinine levels, snoring, and birth weight.

Sex-specifc Loci

Sex-specifc Manhattan plots and quantile-quantile (QQ) plots for each trait group are shown in Figures 4 and 5. All trait groups have regions of sex-specifc signifcance, further suggesting sex-specifcity of these traits. We note high genomic control factor λGC values for the sex-specifc GWASs. However, all traits had LDSC intercepts near 1, indicating that confounding effects were well adjusted for both males and females. Due to the lack of evidence of genomic 24 infation, we interpret the high λGC values as indicators of polygenicity for these traits .

Initial clumping within trait groups yielded 649 signifcant loci for FM traits, 381 for BP traits, 14 for creatinine levels, 8 for snoring, and 64 for birth weight (Figure 5). In the primary analysis, we identifed 360 loci showing sex-specifc effects: 235 loci were identifed in at least one trait among FM traits, 116 loci were identifed in at least one trait among BP traits, 1 was identifed in snoring, 4 loci were identifed in creatinine levels, and 4 loci were identifed in birth weight. These initial SSE loci can be visualized in Figure 5, where peaks in only males or females correspond to initial SSE loci. After our replication analysis, 47 loci were classifed as SSE — 37 loci in FM traits and 10 loci for BP traits. These loci are denoted in Supplementary Figure 2 with their corresponding names. Of the 1 locus initially identifed as sex-specifc in snoring, 4 loci initially identifed in creatinine, and 4 loci initially identifed in BW, none were replicated in the replication study. Separate power analyses were conducted for these traits to investigate the non-replication phenomenon25. At each locus, we frst selected the SNP showing the most signifcance in the discovery study, then we investigated its replication power by varying its effect size. Due to the “winner’s curse”25, the marginal effect size observed from the discovery study tends to be larger than its underlying true value. When the true effect sizes of the loci equal to 80% of their effect sizes observed from the discovery study, the average power for replicating the initially identifed loci will be 0.21, 0.23, and 0.35 for snoring, creatinine, and BW, respectively. Supplementary Figure 3 presents the curves of replication power for different effect sizes in these three traits. The low power values indicate that we may not have adequate power in replicating the SSE loci.

The classifcations for all loci for each trait in both the primary and replication study, as well as the number of loci replicated in both the primary and replication study, are provided in Supplementary Table 2. In addition, meta values for the effect sizes, standard errors, and p-values of each locus, calculated according to Jiang et al26, are also provided in Supplementary Table 2.

Of the 37 SSE loci identifed in FM traits, 16 were previously observed to have sex-specifc effects for fat-related traits in other studies, while 21 were not previously shown to have sex-specifc effects. Among the 21 novel SSE loci, 19 were not reported to be associated with any fat mass related traits, though there were previous reports of associations with anthropometric traits such as body mass index17,27,28, fat-free mass29, waist and hip circumference30-32, and waist-hip ratio17 for all novel loci. The remaining 10 SSE loci in blood pressure traits were all previously shown to be associated with blood pressure, but sex- specifcity was not previously reported30,33-38. Double Manhattan plots for FM and BP traits, as well as annotated SSE loci, can be found in Figure 5. Due to the large number of SSE loci identifed for fat traits, only novel loci are displayed in Figure 5, and all sex-specifc loci can be found in Supplementary Figure 2. All SSE loci and their neighboring genes, as well as classifcations for all other signifcant loci, can be found in Supplementary Table 2.

Pathway Analysis

As FM and BP were the only trait groups with both male and female SSE loci, sex-specifc pathway analysis was only conducted for FM and BP, with fve gene sets constructed including all QTL genes associated with signifcant loci, male- and female-specifc gene sets, and the unions of non-sex specifc loci genes and the male- and female-specifc sets. A total of 181 pathways were signifcantly enriched in at least one gene set, with 74 pathways displaying sex-specifc enrichment. Of the 74 pathways, one pathway relating to GDP catabolism (GO:0046712) was enriched in the female-specifc gene sets, with the other 73 enriched in the male-specifc gene sets. The pathways enriched in the male-specifc gene sets were related to adaptive immune response (GO:0002250, Page 3/21 GO:0002819), immunity (GO:0050852, GO:0045591, GO:0002456), and various pathways related to intracellular transport (GO:0098552, GO:0010008, GO:0005789, among others).

Polygenic Risk Prediction

Results from polygenic risk prediction analysis showed further support for the existence sex-specifc genetic effects in the traits analyzed. For each trait, the performance of male- and female-specifc models trained using P+T and PRS-CS were compared on both male and female replication data. Overall, PRS trained using PRS-CS showed higher accuracy across all traits in all replication sets compared to P+T. For P+T PRSs, signifcant differences in predictive performance were found in 12 traits but were only detected for scores evaluated on the female replication sets. Comparison of PRS-CS PRS revealed signifcant differences in SBP in addition to the 12 traits seen for P+T, as well as differences in scores evaluated on the male replication sets for 4 of the traits. All PRS performance and comparisons are shown in Table 3, and a comparison of risk prediction performance within the fve traits analyzed is also shown in Figure 7.

Discussion

In this paper, we have presented a systematic investigation of sex-specifc genetic effects in 733 traits in the UK Biobank. We reported 16 traits in fve genetically related trait groups with signifcant heritability differences and identifed 47 (31 novel) sex-specifc effects (SSE) loci. Through pathway enrichment analysis, we identifed 74 pathways with enrichment in only one sex. Finally, through polygenic risk score analysis, we showed further evidence for sex-specifc effects in 13 of the 16 traits analyzed.

All novel SSE loci for fat mass/distribution were previously shown in the literature to be associated with at least one fat-related trait from a total of 16 studies (Supplementary Table 3). These studies include meta-analyses, multi-ethnic GWASs, and fne-mapping studies in a variety of populations, including those of African ancestry, European ancestry, a Japanese population, and a Hispanic population27,32,36,39-47. As previously mentioned, these studies did not explore possible sex-specifc effects of these loci, suggesting that many loci being identifed through GWASs may be potentially sex-specifc loci, but have not been identifed as such. Similarly, all novel sex-specifc BP loci were also shown to be associated with blood pressure, hypertension, or related traits. Previous studies identifying these associations, as with fat-mass-related traits, were diverse in study designs and populations. These studies include GWASs, meta- analyses, and multinomial regression studies in European, Chinese, East Asian, trans-ethnic and trans-ancestral populations, identifying associations between the loci and blood pressure traits in adults, adolescents, and children19,33-37,48-55. In particular, the loci rs7701003 and rs12665166 were well-supported in the literature as associated with blood pressure, with 10 studies supporting each association19,33-37,48,50-54. As with the novel fat-mass-related trait loci, these loci were not shown to have sexually dimorphic effects in blood-pressure-related traits in any previous analyses.

Our methods build upon previous studies reporting sex-specifc genetic effects in two main ways. First, by using almost all genetically European individuals available within the UK Biobank in either the discovery or replication study, we signifcantly increased the sample size of our analysis, relative to many previous studies investigating sex-specifc effects. Second, we presented a systematic and comprehensive study of sex-specifc genetic loci, making use of both the discovery and replication sets to discover and further validate them. Many previous studies that reported sex-specifc loci investigated sex-specifcity after the identifcation of loci signifcantly associated with a trait overall13,31. This approach risks overlooking loci that may be associated in one sex, but not strongly associated in the other to lower overall signifcance below the genome-wide threshold. For example, the locus rs13322435, an SSE locus signifcantly associated in females for FM, does not reach genome-wide signifcance in non-stratifed GWAS for LAFP and RAFP, with p-values of 0.00109 and 0.00114, respectively. However, the locus surpasses genome-wide signifcance, with a p-value of 9.24e-10 and 4.77e-10 in female-specifc GWAS. Loci such as these would not be identifed in an approach investigating sex-specifcity after assessing overall signifcance. Other studies lacked rigorous replication procedures for sex-specifc analyses, relying only on sex-specifcity in the primary analysis17. Whether or not the identifed sex-specifcity can be replicated in different studies becomes unknown.

In their work, Rask-Andersen et al only tested for sex-specifcity in the replication cohort, and focused on overall signifcant loci in the multiple testing correction procedure13. In analyzing the role of sex bias in anorexia nervosa, Hubel et al performed sex-specifc GWASs and compared differences in signifcance but did not include replication methods29. Lu et al performed sex-stratifed GWASs for both all-ancestry and European ancestry groups on a comparatively smaller dataset, with the total number of participants at 100,716 individuals56. In addition, they also did not include a replication stage for identifed sex-specifc loci. Randall et al also used a z-score in the same form with zβ to identify loci with single sex effects for anthropometric traits using summary statistics released from the GIANT consortium, but they included the sample covariance between male and female marginal effects in calculating the standard error of the marginal effect difference. Since the sex-specifc marginal effects are computed only based on either male or female individuals and there are no overlaps between these two groups, the actual covariance of sex-specifc marginal effects should be zero. The inclusion of sample covariance term decreased the standard error of the effect difference and infated the test statistic12. Shungin et al, in searching for adipose-associated genetic loci, tested the differences in male and female effect sizes for identifed signifcant loci in a single meta-analysis cohort, again accounting only for the signifcant loci in the multiple testing procedure31. Pulit et al used the same sex-specifc effect test with our methods, but did not include a replication procedure for the sexually dimorphic results17. Other studies reporting sex-specifc effects in fat distributions also suffered from the previously described shortcomings—either low sample size, lack of systematic sex-specifcity analysis, or both28,57,58. In these previous studies, lower sample sizes reduced the statistical power of the sex- specifc analysis, and the lack of focus on sex-specifc loci in the analysis methods increases the risk of reporting nonreplicated sex-specifc fndings. By leveraging the large number of genetically European individuals in the UK Biobank and focusing our analysis on sex-specifc genetic loci, our method increases the power for sex-specifc locus discovery, while also reducing the possibility of false positives by testing sex-specifcity in both the discovery and replication stage. Thus, to our knowledge, our study represents the most systematic analysis of sex-specifc genetic effects with the highest statistical power for the traits analyzed, while also carefully controlling for false positive results.

Page 4/21 Our study also provides further evidence for sex-specifc genetic effects in 13 of the traits studied through PRS analysis. Although the performance of PRSs trained on the combined dataset was higher than all sex-specifc PRS models, the predictive performance of sex-specifc PRSs was signifcantly higher for models tested on the sex on which they were trained. This disparity in predictive performance was more clearly seen in the female replication sets. Though this may be due in part to the larger sizes of the female training and replication sets, this may also suggest that sex-specifc genetic effects play a larger role in females. The higher overall performance of the combined-sex PRS models compared to the sex-specifc PRS models due to the larger sample size of the training dataset further suggests possible approaches to build better performing PRSs combining information from both sexes in the future.

A notable shortcoming of this study is the limited discovery power for sex-specifcity in both the heritability analysis and SNP-level association analysis, due to limitations of heritability estimation for the former and the absence of fne-mapping frameworks for identifying SNPs with sex-specifc causal effects in the latter. As shown in Supplementary Figure 1, many traits displayed possible sex differences in heritability, but the standard errors of the heritability estimators drove the heritability difference score towards 0. As the precision of estimating heritability continues to improve with the effective sample size increasing, the power for discovering sex differences in heritability will increase, possibly yielding more traits with sexually dimorphic genetic effects. Our use of heritability as the primary criterion for further sex-specifc analysis also presents an overly stringent condition for analysis. Traits that have been suggested to have sexually dimorphic genetic effects in previous literature, such as diabetes and asthma, showed comparatively similar heritability scores and were excluded from our analysis14,16. For these traits, different regions of the genome in males and females may contribute genetic effects that end up at similar levels overall. Thus, a genome-wide heritability difference test does not have the resolution to detect sex-specifcity in such traits. But if we conduct sex-stratifed analyses for all traits without pre-screening by heritability difference test, the multiple comparison problem will arise. To overcome the issue, a stringent threshold will be adopted for sex-stratifed analyses that may degrade the overall power. Hence, to detect traits with similar heritability overall, a powerful pre-screening procedure with higher resolution is needed in the future.

Our SNP-level analysis also does not take advantage of all information available for analysis, such as linkage-disequilibrium information and annotation data. While our results are promising, with 31 novel loci across two trait groups, the presence of loci with signifcant sex-specifc differences in the primary analysis that were unable to be replicated, as well as unidentifed loci in the Double Manhattan Plots shown in Figure 5 and Supplementary Figure 2 that seem to be signifcant in one sex but not the other, suggest that our methods may still lack the sensitivity to identify loci with sex-specifc effects which may require more powerful statistical frameworks. Future methods may be developed and implemented to uncover both novel sex-specifc loci in traits with known sex-specifc genetic effects and novel traits with sex-specifc genetic effects.

Methods

Sex-stratifed GWAS Analysis

In the discovery stage, we used the same procedure proposed by Neale et al to perform GWAS analyses based on the UK Biobank data20. The GWAS were performed on a subset of UK Biobank data composed of white British individuals, who were selected through both a genetic criterion and their responses to self-reported ancestry. In the genetic criterion, the means and standard deviations for the frst 6 principal components from genotype data in a curated set of white British individuals were calculated, and individuals within 7 standard deviations of these principal components were initially selected for analysis. The individuals were further selected according to their responses to ancestry (UK Biobank Data-Field 21000), where only individuals who responded with “British”, “Irish”, or “White” ethnic background were selected. Finally, related individuals were removed from the analysis, yielding 361,194 unrelated individuals (194,174 females and 167,020 males). We considered variants following Neale et al with a relatively loose quality control procedure, selecting the variants with the imputation information score at >0.8, minor allele frequency exceeding 0.001, and Hardy-Weinberg equilibrium p-values > 1e-10. After these quality control procedures, 13.7 million SNPs remained in the analysis.

To make the quantitative traits more normally distributed for more robust linear regression, a rank-based inverse normal transformation was conducted before the analysis59. GWAS analyses were then performed in Hail using a linear regression model. For each trait, male- and female-specifc GWAS, as well as an all- individual-combined GWAS, were performed. The frst 20 principal components, sex, age, age2, as well as the interaction terms sex*age and sex*age2 were included as covariates in the combined GWAS, and the frst 20 principal components, age, and age2 were included as covariates in the sex-specifc GWAS.

Sex-specifc Differences in Heritability

Due to the large number of phenotypes presented in the UK Biobank, as well as the presence of redundant phenotypes (for example, many diseases were included as both ICD-10 codes and self-reported traits), 733 traits belonging to 7 categories, including lifestyles, clinical measurements, health and medical status, cognitive functions, biomarkers, and diagnosed diseases, were ultimately chosen for sex-specifc analysis. We began by using sex-differences in heritability—the proportion of phenotypic variance attributable to genetic variance—as a flter for all traits. Sex-specifc heritabilities were calculated for each trait using the LD Score Regression (LDSC)24 with the reference panel from the European-ancestry individuals in the phase III of the 1000 Project60. In order to select traits with signifcant difference in genetic effects, a heritability difference score was calculated for each trait:

Page 5/21 2 2 2 2 where h female and h male were the heritability estimates for females and males from the LDSC, and se(h female) and se(h male) were the corresponding standard errors of the heritability estimates, respectively. The scores were then tested using a two-tailed z-test. Due to the large number of traits tested, the Benjamini-Hochberg multiple testing procedure was used to control the false discovery rate (FDR) at the 0.05 level.

Genetic Correlations

To infer the similarity between male and female genetic architecture for each trait, we used LDSC to calculate the genetic correlation between males and females61. Genetic correlations between all traits that had signifcant differences in sex-specifc heritability were also calculated, and trait clusters based on genetic correlations were identifed for downstream analyses (Figure 3).

SSE Loci Identifcation

Each trait with signifcant sex-differences in heritabilities was then analyzed at the SNP level for potential sex-specifc associations. The traits were clustered into groups based on their genetic correlations. To get unifed SNP clumping results for each trait group, the summary statistics for each trait within the group were aggregated into a single set of summary statistics frst, taking the minimum p-value across all traits and both sexes within the group. Each of these aggregated summary statistics were then clumped using PLINK v2.0, with clumping distance set at 500kb, p-value threshold set at the genome-wide signifcance level 5e-8, and the LD threshold set at 0.1. The LD structure was calculated from European-ancestry individuals in the 1000 Genomes project60.

Loci where p-values in only males (pmale) or females (pfemale) passed genome-wide signifcance (<=5e-8) were considered as candidate SSE loci. To further flter out SNPs without signifcant effect size differences between both sexes, the following statistic

was then calculated for each SNP of the candidate SSE locus, in which βfemale and βmale were effect size estimates for each sex of the SNP, and se(βfemale) and se(βmale) were the standard errors of the effect size estimates. Sex-specifcity was tested using a two-tailed z-test with the FDR controlled at 10%. Candidate SSE loci that had signifcant sex-specifcity were then designated SSE loci.

Replication Analysis

Implicated traits and loci were then tested on a replication dataset, which was constructed to be independent but within the same genetic background as the discovery dataset. In accordance with the criterion for individuals chosen by Neale et al in the discovery set, individuals within 7 standard deviations of the means in self-identifed British individuals of the frst 6 principal components were considered genetically suitable for the replication set. To construct an independent replication set, all individuals who self-reported “British”, “Irish”, or “White” ethnic background in Data-Field 21000 were removed. Individuals with missingness over 0.1 were also removed from the replication set, and variants with a minor allele frequency less than 0.001, missing call rate greater than 0.05, or those that failed the Hardy-Weinberg equilibrium test at a threshold of 1e-10 were removed from the replication dataset. Only SNPs presented in the discovery stage were presented in the replication dataset. After these quality control measures, 76,132 individuals (42,537 females, 33,595 males) and 10,384,053 variants were included in the replication dataset. To confrm the suitability of the replication set, the discovery and replication sets were compared along the frst two principal components (Figure 6).

Sex-stratifed GWAS were then performed for each implicated sex-specifc trait with the replication dataset, with the frst 10 principal components, sex, age, age2, as well as the interaction between sex*age as covariates. Continuous traits were transformed with an inverse rank normal transformation before regression analysis59, and individuals who reported taking blood pressure medications in Data-Field 6153 were removed from GWASs for blood pressure traits. GWASs were performed using PLINK 2.0 with a linear regression model for continuous traits and a logistic model for binary traits.

To confrm the sex-specifcity of the discovered SSE loci, each locus was frst tested for association within each trait group in the replication dataset, controlling FDR within each trait-sex combination at 5%. The replicated loci were then classifed to be SSE in a similar manner as in the primary study, with one exception about the direction of the sex-difference. The two-tailed z test used for determining sex-difference in the primary study was not suitable in the replication study, as the direction of sex-specifc effect was also being tested. Thus, a modifed version of the one-tailed z test, as described by Jiang et al, was performed in favor of the two-tailed test used in the discovery analysis, with looser FDR threshold set at 30%25. Only the loci that were identifed as SSE in both the primary and replication dataset were considered as the fnal SSE loci.

Analysis of Sex-Specifc Loci

All identifed loci were frst examined in the primary literature for published associations. Literature review was performed using the Data Integrator tool from the UCSC genome browser, allowing queries of the GWAS catalog at specifc genomic loci62. Previously reported associations in related traits for SNPs 500kb upstream or downstream of the identifed SSE loci were queried using the Data Integrator tool. Related traits for each trait group included adiposity, anthropometric traits, body fat mass/percentage and distribution, leg fat mass, trunk fat mass, and obesity for fat traits, and hypertension, diastolic and systolic blood pressure for blood pressure traits. Of particular interest were previously identifed sex-specifc loci for body fat related traits by Rask-Andersen et al.13, Hubel et al.29, Lu et al.56, and Randall et al.12. Of the SSE loci identifed in BP traits, none were previously reported as sex-specifc loci—thus, no previous sex-specifc literature was considered for these traits.

Page 6/21 Genes neighboring all replicated loci were found with SNP-Nexus, an online tool for SNP analysis, with genes within 500kb of the SNP considered a neighboring gene63. All SSE loci were also analyzed for known QTL associations, using the QTLbase resource for quantitative trait loci64. All QTL associations were adjusted for multiple hypothesis testing using the Bonferroni correction, with a p-value of 0.05/293623477=1.70e-10 as a threshold. Due to the higher biological relevance of the QTL genes over the neighboring genes, signifcant QTL genes were used to generate fve gene sets for pathway analysis: a sex- agnostic set containing all neighboring genes, a male- and female-specifc gene set containing the union of QTL genes associated with non-SSE loci and male or female SSE genes, and a male- and female-specifc gene set only containing genes associated with the SSE loci in QTLbase. Pathway analysis was performed on these gene sets using g:Profler, an online tool for functional enrichment analysis65. Gene sets in Gene Ontology (GO)66,67, Kyoto Encyclopedia of Genes and Genomes (KEGG)68, and Reactome69 were analyzed, and a g:SCS—a multiple correction procedure introduced by Reimand et al better suited for hierarchical statistical tests—threshold of 0.05 was deemed signifcant65.

Polygenic Risk Prediction

In disease prediction, polygenic risk scores (PRS) can act as tools for intervention, disease screening, and life planning70. To further investigate the extent of sex-specifc genetic effects, both sex-specifc and sex-agnostic PRS were calculated using weights generated with two summary-statistics-based PRS methods: pruning + thresholding (P+T)22 and PRS-CS23. For sex-specifc PRS, SNP weights for each trait were calculated from sex-stratifed summary statistics. For sex-agnostic PRS, SNP weights for each trait were calculated from the combined summary statistics. All weights for each trait were scored on both the male and female replication datasets.

P+T PRS weights were calculated by clumping the summary statistics at different p-value thresholds (p = 1, 0.5, 0.05, 0.005, 5e-4, 5e-5, 5e-6), using an LD threshold of 0.1. Clumping was performed with PLINK v2.0. PRS-CS PRS weights were generated using the Markov Chain Monte Carlo method, using the default parameters of 1000 iterations, 500 burn-in iteration, and a thinning factor of 5. Default parameters of 1 for A and 0.5 for B for the gamma-gamma prior were used, and the global shrinkage parameter phi was not specifed, but was instead learnt from the data. To investigate the extent of sex-specifc genetic effects, PRS performance was compared between male- and female-trained models on sex-specifc replication datasets. A z-score comparing performance between male and female models was calculated as follows

and n is the sample size of the training set used to ft the PRS models. Signifcance was assessed with a two-tailed z test, with p < 0.05 considered as signifcant.

Declarations

DATA AVAILABILITY

The data that support the fndings of this study are available from the UK Biobank (ref: 29900), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the UK Biobank.

CODE AVAILABILITY

The genome-wide association studies were performed using Hail (https://hail.is/docs/0.2/index.html) and PLINK 2.0 (https://www.cog- genomics.org/plink/2.0/). The heritabilities were estimated by LD Score Regression (https://github.com/bulik/ldsc). All statistical analyses were performed based on R 4.0 (https://www.r-project.org/) and Python 3.0 (https://www.python.org/).

ACKNOWLEDGEMENTS

Our work was supported in part by a pilot grant from the Women’s Health Research at Yale and NIH grant R01 GM134005.

AUTHOR CONTRIBUTIONS

J.H. and W.J. designed the study and developed the statistical analysis pipeline. J.H. implemented the pipeline. Y.Y and H.Z. advised on statistical and genetics issues. J.H. and W.J. wrote the manuscript. All authors contributed to manuscript editing and approved the manuscript.

ETHICS DECLARATION

Page 7/21 Competing Interests

The authors declare no competing interests.

References

1 Khramtsova, E. A., Davis, L. K. & Stranger, B. E. The role of sex in the genomics of human complex traits. Nature Reviews Genetics 20, 173-190, doi:10.1038/s41576-018-0083-1 (2019).

2 Choi, B. G. & McLaughlin, M. A. Why Men's Hearts Break: Cardiovascular Effects of Sex Steroids. Endocrinology and Metabolism Clinics of North America 36, 365-377, doi:https://doi.org/10.1016/j.ecl.2007.03.011 (2007).

3 Postma, D. S. Gender Differences in Asthma Development and Progression. Gender Medicine 4, S133-S146, doi:https://doi.org/10.1016/S1550- 8579(07)80054-4 (2007).

4 Lockshin, M. D. Sex differences in . 15, 753-756, doi:10.1177/0961203306069353 (2006).

5 Aleman, A., Kahn, R. S. & Selten, J.-P. Sex Differences in the Risk of Schizophrenia: Evidence From Meta-analysis. Archives of General Psychiatry 60, 565-571, doi:10.1001/archpsyc.60.6.565 (2003).

6 Schousboe, K. et al. Sex differences in heritability of BMI: a comparative study of results from twin studies in eight countries. Twin Res 6, 409-421, doi:10.1375/136905203770326411 (2003).

7 Wells, J. C. K. Sexual dimorphism of body composition. Best Practice & Research Clinical Endocrinology & Metabolism 21, 415-430, doi:https://doi.org/10.1016/j.beem.2007.04.007 (2007).

8 Sandberg, K. & Ji, H. Sex differences in primary hypertension. Biology of Sex Differences 3, 7, doi:10.1186/2042-6410-3-7 (2012).

9 Rawlik, K., Canela-Xandri, O. & Tenesa, A. Evidence for sex-specifc genetic architectures across a spectrum of human complex traits. Genome Biol 17, 166-166, doi:10.1186/s13059-016-1025-x (2016).

10 Rinn, J. L. & Snyder, M. Sexual dimorphism in mammalian . Trends in Genetics 21, 298-305, doi:https://doi.org/10.1016/j.tig.2005.03.005 (2005).

11 Weiss, L. A., Pan, L., Abney, M. & Ober, C. The sex-specifc genetic architecture of quantitative traits in humans. Nature Genetics 38, 218-222, doi:10.1038/ng1726 (2006).

12 Randall, J. C. et al. Sex-stratifed genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet 9, e1003500-e1003500, doi:10.1371/journal.pgen.1003500 (2013).

13 Rask-Andersen, M., Karlsson, T., Ek, W. E. & Johansson, Å. Genome-wide association study of body fat distribution identifes adiposity loci and sex- specifc genetic effects. Nature Communications 10, 339, doi:10.1038/s41467-018-08000-4 (2019).

14 Myers, R. A. et al. Genome-wide interaction studies reveal sex-specifc asthma risk alleles. Hum Mol Genet 23, 5251-5259, doi:10.1093/hmg/ddu222 (2014).

15 Liu, L. Y., Schaub, M. A., Sirota, M. & Butte, A. J. Sex differences in disease risk from reported genome-wide association study fndings. Human Genetics 131, 353-364, doi:10.1007/s00439-011-1081-y (2012).

16 Orozco, G., Ioannidis, J. P. A., Morris, A., Zeggini, E. & consortium, D. Sex-specifc differences in effect size estimates at established complex trait loci. Int J Epidemiol 41, 1376-1382, doi:10.1093/ije/dys104 (2012).

17 Pulit, S. L. et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet 28, 166-174, doi:10.1093/hmg/ddy327 (2018).

18 Collins, R. What makes UK Biobank special? The Lancet 379, 1173-1174, doi:https://doi.org/10.1016/S0140-6736(12)60404-8 (2012).

19 Warren, H. R. et al. Genome-wide association analysis identifes novel blood pressure loci and offers biological insights into cardiovascular risk. Nature Genetics 49, 403-415, doi:10.1038/ng.3768 (2017).

20 UK Biobank GWAS, (2018).

21 Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203-209, doi:10.1038/s41586-018-0579-z (2018).

22 Privé, F., Vilhjálmsson, B. J., Aschard, H. & Blum, M. G. B. Making the Most of Clumping and Thresholding for Polygenic Scores. The American Journal of Human Genetics 105, 1213-1221, doi:https://doi.org/10.1016/j.ajhg.2019.11.001 (2019).

Page 8/21 23 Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications 10, 1776, doi:10.1038/s41467-019-09718-5 (2019).

24 Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics 47, 291-295, doi:10.1038/ng.3211 (2015).

25 Jiang, W. & Yu, W. Power estimation and sample size determination for replication studies of genome-wide association studies. BMC Genomics 17 Suppl 1, 3, doi:10.1186/s12864-015-2296-4 (2016).

26 Jiang, W. & Yu, W. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies. Bioinformatics 33, 500-507, doi:10.1093/bioinformatics/btw690 (2016).

27 Zhu, Z. et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. Journal of Allergy and Clinical Immunology 145, 537-549, doi:10.1016/j.jaci.2019.09.035 (2020).

28 Winkler, T. W. et al. The Infuence of Age and Sex on Genetic Associations with Adult Body Size and Shape: A Large-Scale Genome-Wide Interaction Study. PLoS Genet 11, e1005378, doi:10.1371/journal.pgen.1005378 (2015).

29 Hübel, C. et al. Genomics of body fat percentage may contribute to sex bias in anorexia nervosa. Am J Med Genet B Neuropsychiatr Genet 180, 428- 438, doi:10.1002/ajmg.b.32709 (2019).

30 Lind, L. Genetic Determinants of Clustering of Cardiometabolic Risk Factors in U.K. Biobank. Metabolic Syndrome and Related Disorders 18, 121-127, doi:10.1089/met.2019.0096 (2020).

31 Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187-196, doi:10.1038/nature14132 (2015).

32 Tachmazidou, I. et al. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits. The American Journal of Human Genetics 100, 865-884, doi:https://doi.org/10.1016/j.ajhg.2017.04.014 (2017).

33 Takeuchi, F. et al. Interethnic analyses of blood pressure loci in populations of East Asian and European descent. Nature Communications 9, 5052, doi:10.1038/s41467-018-07345-0 (2018).

34 Giri, A. et al. Trans-ethnic association study of blood pressure determinants in over 750,000 individuals. Nature Genetics 51, 51-62, doi:10.1038/s41588-018-0303-9 (2019).

35 Hoffmann, T. J. et al. Genome-wide association analyses using electronic health records identify new loci infuencing blood pressure variation. Nature Genetics 49, 54-64, doi:10.1038/ng.3715 (2017).

36 Kichaev, G. et al. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. The American Journal of Human Genetics 104, 65-75, doi:10.1016/j.ajhg.2018.11.008 (2019).

37 Lu, X. et al. Genome-wide association study in Chinese identifes novel loci for blood pressure and hypertension. Hum Mol Genet 24, 865-874, doi:10.1093/hmg/ddu478 (2014).

38 Evangelou, E. et al. Genetic analysis of over 1 million people identifes 535 new loci associated with blood pressure traits. Nature Genetics 50, 1412- 1425, doi:10.1038/s41588-018-0205-x (2018).

39 Hoffmann, T. J. et al. A Large Multiethnic Genome-Wide Association Study of Adult Body Mass Index Identifes Novel Loci. Genetics 210, 499, doi:10.1534/genetics.118.301479 (2018).

40 Lotta, L. A. et al. Association of Genetic Variants Related to Gluteofemoral vs Abdominal Fat Distribution With Type 2 Diabetes, Coronary Disease, and Cardiovascular Risk Factors. JAMA 320, 2553-2563, doi:10.1001/jama.2018.19329 (2018).

41 Ng, M. C. Y. et al. Discovery and fne-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium. PLoS Genet 13, e1006719, doi:10.1371/journal.pgen.1006719 (2017).

42 Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197-206, doi:10.1038/nature14177 (2015).

43 Akiyama, M. et al. Genome-wide association study identifes 112 new loci for body mass index in the Japanese population. Nature Genetics 49, 1458- 1467, doi:10.1038/ng.3951 (2017).

44 Graff, M. et al. Genome-wide physical activity interactions in adiposity ― A meta-analysis of 200,452 adults. PLoS Genet 13, e1006528, doi:10.1371/journal.pgen.1006528 (2017).

45 Galván-Femenía, I. et al. Multitrait genome association analysis identifes new susceptibility genes for human anthropometric variation in the GCAT cohort. Journal of Medical Genetics 55, 765, doi:10.1136/jmedgenet-2018-105437 (2018).

Page 9/21 46 Comuzzie, A. G. et al. Novel Genetic Loci Identifed for the Pathophysiology of Childhood Obesity in the Hispanic Population. PLOS ONE 7, e51954, doi:10.1371/journal.pone.0051954 (2012).

47 Turcot, V. et al. -altering variants associated with body mass index implicate pathways that control energy intake and expenditure in obesity. Nature Genetics 50, 26-41, doi:10.1038/s41588-017-0011-x (2018).

48 Wain, L. V. et al. Novel Blood Pressure Locus and Gene Discovery Using Genome-Wide Association Study and Expression Data Sets From Blood and the Kidney. Hypertension 70, e4-e19, doi:doi:10.1161/HYPERTENSIONAHA.117.09438 (2017).

49 Surendran, P. et al. Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. Nature Genetics 48, 1151-1161, doi:10.1038/ng.3654 (2016).

50 Ehret, G. B. et al. The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nature Genetics 48, 1171-1184, doi:10.1038/ng.3667 (2016).

51 Kato, N. et al. Trans-ancestry genome-wide association study identifes 12 genetic loci infuencing blood pressure and implicates a role for DNA methylation. Nature Genetics 47, 1282-1293, doi:10.1038/ng.3405 (2015).

52 Ehret, G. B. et al. Genetic variants in novel pathways infuence blood pressure and cardiovascular disease risk. Nature 478, 103-109, doi:10.1038/nature10405 (2011).

53 German, C. A., Sinsheimer, J. S., Klimentidis, Y. C., Zhou, H. & Zhou, J. J. Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale. Genetic Epidemiology 44, 248-260, doi:10.1002/gepi.22276 (2020).

54 Liu, C. et al. Meta-analysis identifes common and rare variants infuencing blood pressure and overlapping with metabolic trait loci. Nature Genetics 48, 1162-1170, doi:10.1038/ng.3660 (2016).

55 Wain, L. V. et al. Genome-wide association study identifes six new loci infuencing pulse pressure and mean arterial pressure. Nature Genetics 43, 1005- 1011, doi:10.1038/ng.922 (2011).

56 Lu, Y. et al. New loci for body fat percentage reveal link between adiposity and cardiometabolic disease risk. Nature Communications 7, 10495, doi:10.1038/ncomms10495 (2016).

57 Fox, C. S. et al. Genome-Wide Association for Abdominal Subcutaneous and Visceral Adipose Reveals a Novel Locus for Visceral Fat in Women. PLoS Genet 8, e1002695, doi:10.1371/journal.pgen.1002695 (2012).

58 Wen, W. et al. Genome-wide association studies in East Asians identify new loci for waist-hip ratio and waist circumference. Scientifc Reports 6, 17958, doi:10.1038/srep17958 (2016).

59 McCaw, Z. R., Lane, J. M., Saxena, R., Redline, S. & Lin, X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics n/a, doi:10.1111/biom.13214 (2019).

60 Auton, A. et al. A global reference for human genetic variation. Nature 526, 68-74, doi:10.1038/nature15393 (2015).

61 Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nature Genetics 47, 1236-1241, doi:10.1038/ng.3406 (2015).

62 Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences 106, 9362, doi:10.1073/pnas.0903103106 (2009).

63 Dayem Ullah, A. Z. et al. SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine. Nucleic Acids Research 46, W109-W113, doi:10.1093/nar/gky399 (2018).

64 Zheng, Z. et al. QTLbase: an integrative resource for quantitative trait loci across multiple human molecular phenotypes. Nucleic Acids Research 48, D983-D991, doi:10.1093/nar/gkz888 (2019).

65 Raudvere, U. et al. g:Profler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research 47, W191-W198, doi:10.1093/nar/gkz369 (2019).

66 Ashburner, M. et al. Gene ontology: tool for the unifcation of biology. The Gene Ontology Consortium. Nature genetics 25, 25-29, doi:10.1038/75556 (2000).

67 The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Research 47, D330-D338, doi:10.1093/nar/gky1055 (2018).

68 Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27-30, doi:10.1093/nar/28.1.27 (2000).

69 Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res 48, D498-d503, doi:10.1093/nar/gkz1031 (2020).

Page 10/21 70 Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nature Reviews Genetics 19, 581-590, doi:10.1038/s41576-018-0018-x (2018).

Tables

Table 1. Heritabilities and Between-Sexes Genetic Correlations for 16 Traits

Sex-specifc heritability estimates for 733 traits were obtained from sex-stratifed GWAS summary statistics using LDSC. Sex-differences in heritability were tested using a two-sided z-statistic constructed from the heritability estimates and their standard errors, and false discovery rate was controlled at 5% using the Benjamini Hochberg procedure. 16 traits were identifed with sex-specifc heritability differences, with the phenotype codes, descriptions, male and female heritability estimates and standard errors, computed heritability difference score, BH-adjusted p-value, and computed genetic correlation between males and females displayed in the table below.

Phenotype Phenotype Description Male Female Male Female Heritability BH- Heritability Heritability SE SE Difference Adjusted Correlation Score P-Value

DBP Diastolic blood pressure, automated 0.0498 0.0897 0.0029 0.0041 -7.9451 1.42E-12 0.9106 reading

SBP Systolic blood pressure, automated 0.0568 0.0903 0.0028 0.004 -6.8611 2.51E-09 0.956 reading

SNRG Snoring 0.0899 0.1354 0.0065 0.0069 -4.7999 0.0004 0.8867

BW Birth weight 0.0412 0.0657 0.0032 0.0042 -4.6400 0.0006 0.9766

LAFP Arm fat percentage (left) 0.1049 0.1293 0.0039 0.0042 -4.2572 0.0027 0.8859

TFM Trunk fat mass 0.1066 0.1306 0.0039 0.0041 -4.2413 0.0027 0.8979

CRT Creatinine (enzymatic) in urine 0.028 0.0403 0.0018 0.0023 -4.2114 0.0027 0.9187

RAFP Arm fat percentage (right) 0.1059 0.1281 0.0039 0.0042 -3.8733 0.0098 0.892

RLFP Leg fat percentage (right) 0.1071 0.1277 0.0036 0.004 -3.8280 0.0105 0.8953

HBP Vascular/heart problems diagnosed by 0.2002 0.2615 0.0092 0.0135 -3.7523 0.0119 0.9602 doctor: High blood pressure

WFM Whole body fat mass 0.1087 0.1299 0.0039 0.0041 -3.7465 0.0119 0.9066

LLFP Leg fat percentage (left) 0.1084 0.1272 0.0035 0.004 -3.5371 0.0230 0.8908

LAFM Arm fat mass (left) 0.1072 0.1282 0.0041 0.0043 -3.5345 0.0230 0.9082

RLFM Leg fat mass (right) 0.1078 0.1276 0.0037 0.0043 -3.4904 0.0253 0.899

RAFM Arm fat mass (right) 0.1083 0.1281 0.0041 0.0043 -3.3326 0.0421 0.9012

LLFM Leg fat mass (left) 0.1092 0.1267 0.0038 0.0043 -3.0496 0.0786 0.8973

Table 2. Signifcant SSE Loci

Sex-specifc loci were identifed through a two-stage procedure, where candidate sex-specifc loci in the discovery set were tested for sex-specifcity in the replication set. After replication, 47 loci were identifed as sex-specifc, of which 31 were not previously reported to be sex-specifc in the traits studied. All identifed sex-specifc loci are provided in the table below, with columns denoting the locus , the chromosomal band, the rsID, the sex that the locus was identifed to be specifc to, the male and female p-values, the male and female beta estimates and their standard errors, nearby genes identifed using SNP-Nexus, signifcantly association eQTL genes, the traits that the locus was identifed in, as well as the study identifying sex-specifc effects in the trait group, if applicable.

Page 11/21 CHR Band locus Sex P- P-value Beta SE Beta SE Nearby Genes QTL Genes Traits Previ value Female Male Male Female Female Ident Male By

1 q22 rs4276913 F 0.1562 0.0044 -0.0041 0.0029 -0.0080 0.0028 HMGN2P18, MUC1, RAFP, Rask KRTCAP2 MTX1, LAFP, Ande MSX1, RAFM, et al. ADAM15, LAFM GBA, EFNA1, KRTCAP2, GBAP1, SLC50A1, MTX1P1, EFNA3, THBS3, DPM3

1 q25.2 rs6425446 F 0.0021 1.42E- -0.0092 0.0030 -0.0195 0.0029 RP11-63B19.1, RAFP Rask 11 Ande SEC16B et al. Lu et Hube al.

1 q25.1 rs3766694 F 0.3266 1.08E- -0.0033 0.0034 -0.0201 0.0031 KIAA0040 KIAA0040 DBP 10

1 q22 rs72704117 M 1.71E- 0.0681 -0.0719 0.0094 -0.0163 0.0090 RP11-263K19.4 PMF1, LAFP Rask 14 ADAM15, Ande GBAP1, et al. EFNA3, THBS3

1 q22 rs72706148 M 1.52E- 0.4083 -0.0789 0.0107 -0.0086 0.0103 GON4L ADAM15, LAFP, Rask 13 GLMP, RAFP Ande GBAP1, et al. SMG5, THBS3

2 q11.2 rs13002946 F 0.9175 4.51E- -0.0003 0.0033 -0.0128 0.0031 AFF3, LONRF2, LAFP, RAFP 05 LINC01104 AFF3

2 p16.1 rs3791679 F 0.0002 1.78E- -0.0142 0.0038 -0.0324 0.0035 EFEMP1 EFEMP1 TFM Rask 20 Ande et al.

2 q31.1 rs79869125 F 0.0189 2.02E- -0.0106 0.0045 -0.0282 0.0042 AC096649.3, EXTL2P1 LLFP 11

2 q36.3 rs2943650 M 1.85E- 0.0140 -0.0208 0.0033 -0.0076 0.0031 AC068138.1, IRS1 LAFP, Rand 10 MIR5702 RAFP, al., LLFP Lu et Hube al.

3 p12.1 rs11719438 M 3.58E- 0.3505 0.0273 0.0041 0.0036 0.0038 CADM2 TFM 11

3 q25.31 rs13322435 M 7.74E- 0.9172 -0.0194 0.0030 -0.0003 0.0029 LEKR1, TIPARP, LAFP, RAFP 11 LINC00880 AC092944.1

3 p25.2 rs2596902 M 0.0018 0.0332 -0.0100 0.0032 -0.0065 0.0031 IQSEC1 RPL32 LAFP Rand al.

3 q13.31 rs2918217 M 1.53E- 0.9195 0.0290 0.0045 -0.0004 0.0043 LSAMP LSAMP LAFM 10

5 q13.3 rs4704187 F 0.0019 2.53E- -0.0096 0.0031 -0.0251 0.0030 ANKRD31 COL4A3BP, RAFP, Rask 17 HMGCR, LAFP Ande POC5, et al. GCNT4, ANKRD31

5 q33.3 rs7701003 F 0.0027 4.37E- -0.0097 0.0032 -0.0250 0.0029 CTC-436K13.2, RP11-542A14.1 SBP 18

6 p21.33 rs12665166 F 0.7010 0.0008 -0.0016 0.0042 0.0130 0.0039 XXbac- DDR1, DBP BPG299F13.16, LY6G5B, HLA-C TCF19, PSORS1C3, NOTCH4, AL662844.4, TNF, MICB, VARS2, HLA- B, GZMK, MICA, HLA- C, CCHCR1, Page 12/21 HLA-L, TMEM154, HLA-S

6 p22.1 rs9257424 M 0.0003 0.0015 0.0124 0.0034 0.0104 0.0033 C6orf100, HLA-F, RAFM KRT18P1 BTN3A2, HIST1H4L, ZSCAN9, TRIM27, HLA-G, ZNF311, IFITM4P

6 p21.32 rs9268644 M 8.81E- 0.0060 0.0227 0.0028 0.0072 0.0026 HLA-DRA FAM8A1, RLFP 16 HLA-DQA1, ARHGAP24, DEF8, HLA- H, HLA-G, HLA-DRB5, TRIM56, HLA-DQB1- AS1, SQSTM1, HLA-C, HCP5, TNXA, TINF2, RPL34, CYP21A1P, SLC44A4, TNXB, LIMS1, TOMM22, HLA-DQB1, HLA-DRA, DDAH2, HLA-DRB1, AOAH, HLA- DQB2, CSNK2A1, TOM1, HLA- DQA2, TSBP1, PRKRA, NOTCH4, ZNF20, SKIV2L, SSRP1, HLA- DRB6, DHX30

7 p22.2 rs1182199 M 7.34E- 5.63E- -0.0276 0.0034 -0.0130 0.0032 GNA12 GNA12, LAFM Rask 16 05 AC006028.1, Ande PFDN4, et al. AMZ1

7 q11.23 rs6955671 F 7.34E- 6.48E- 0.0157 0.0035 0.0211 0.0032 WBSCR16 NCF1C, RAFM 06 11 RCC1L, NCF1, GTF2IRD2B, HIP1, STAG3L2, POM121C, PMS2P3, GTF2IP1, GTF2I, STAG3L1, PMS2P5

7 p14.3 rs215607 M 5.19E- 0.1305 -0.0150 0.0037 -0.0050 0.0033 PDE1C LLFM, WFM 05

7 p13 rs799449 M 5.23E- 0.0088 0.0126 0.0031 0.0073 0.0028 OGDH, ZMIZ2 DDX56, LLFM 05 ZMIZ2, H2AFV, TMED4, CCM2, PPIA

8 p21.3 rs11786089 F 0.2068 1.13E- 0.0039 0.0031 0.0211 0.0030 HR FAM160B2, RAFM 12 NUDT18

8 p21.2 rs55968245 F 0.1162 7.88E- 0.0054 0.0035 0.0230 0.0032 CTC-756D1.3, NKX3-1, TFM 13 SLC25A37 ENTPD4, DNAJC5, LOXL2

11 q13.1 rs112437639 F 0.3885 3.42E- -0.0108 0.0125 -0.0694 0.0111 RP11-770G2.2 RNASEH2C, SBP 10 OVOL1, RIN1, CTSW,

Page 13/21 SIPA1, SPDYC

11 q23.3 rs8181524 F 0.5412 1.11E- -0.0035 0.0057 0.0333 0.0055 LINC00900 RAFM 09

11 p11.2 rs34942378 M 3.37E- 0.0229 -0.0414 0.0057 -0.0125 0.0055 C11orf49 MADD, SPI1 LAFP, RAFP 13

12 q13.13 rs7134677 F 0.0165 4.88E- -0.0083 0.0035 -0.0234 0.0031 HOXC4 HOXC4 SBP 14

12 p13.32 rs76895963 M 1.76E- 0.0108 0.1131 0.0122 0.0294 0.0115 CCND2 CCND2 RAFM, LAFM 20

13 q12.2 rs12872889 F 0.4926 7.64E- 0.0025 0.0037 0.0214 0.0033 FLT3 PAN3, FLT3, RLFM, LLFM, R 11 PAN3-AS1 LAFM

13 q21.1 rs35559811 F 0.0195 6.99E- -0.0080 0.0034 -0.0255 0.0033 RNA5SP30, LINC00374 LAFP 15

13 q34 rs656533 F 0.4205 3.93E- -0.0046 0.0057 -0.0470 0.0053 RN7SL783P, COL4A1 DBP 19

13 q34 rs7334306 F 0.6179 2.71E- -0.0023 0.0045 0.0316 0.0042 COL4A2 RAB20 DBP 14

13 q34 rs9555690 F 0.8029 7.49E- 0.0013 0.0051 -0.0364 0.0047 COL4A2 COL4A2 DBP 15

14 q23.1 rs1955695 F 0.2403 9.52E- -0.0041 0.0035 -0.0172 0.0032 RP11- C14orf39, TFM 08 1042B17.3, PCNX4 RP11- 1042B17.1

14 q32.33 rs3212076 F 0.8889 1.22E- 0.0004 0.0030 0.0228 0.0030 XRCC3 EIF5, RAFP 14 AL049840.1, ZFYVE21, TRMT61A, KIF26A, APOPT1, PPP1R13B, TDRD9, XRCC3, CKB, AL049840.4, AL139300.1, KLC1, BAG5

16 q22.1 rs11075747 F 0.0171 4.31E- 0.0074 0.0031 0.0256 0.0029 WWP2 NPIPB14P, LLFP, Rask 19 CLEC18C, RLFP Ande MIR140, et al. PDXDC2P, Hube CLEC18A, al. WWP2, SF3B3

16 p11.2 rs143504748 F 0.2866 8.14E- 0.0062 0.0058 0.0380 0.0051 TMEM219 MVP, YPEL3, RLFM, Rask 14 INO80E, LLFM, Ande MAPK3 LAFP, et al. WFM, LLFP, TFM

16 p11.2 rs34898535 F 3.55E- 3.08E- -0.0133 0.0032 -0.0316 0.0030 STX1B, STX4 ITGAM, LAFM, Rask 05 25 CCDC189, RAFP Ande ZNF668, et al. PRSS53, ITGAD, MAPK3, BCKDK, STX4, STX1B, KAT8, ARMC5, PHKG2, ITGAX, ZNF646, HSD3B7, VKORC1, SETD1A

16 q12.2 rs62033401 M 4.58E- 0.0332 -0.0311 0.0047 -0.0093 0.0044 FTO RBL2 TFM, Rask 11 RAFM, Ande WFM, et al. LAFM, LAFP, Lu et RAFP Hube al.

Page 14/21 18 q11.2 rs11664194 F 0.3381 2.46E- -0.0032 0.0033 -0.0214 0.0031 RP11-863N1.4, RPS4XP18 DBP 12

18 q21.32 rs2168711 F 1.36E- 1.33E- 0.0157 0.0032 0.0346 0.0030 RNU4-17P, RP11-795H16.2 LLFP Rask 06 30 Ande et al.

Lu et Hube al.

20 q11.22 rs143384 F 0.0195 3.09E- 0.0076 0.0032 0.0272 0.0030 GDF5 UQCC1, TFM Rask 19 FAM83C, Ande CPNE1, et al. MMP24, MAP1LC3A, PROCR, SPAG4, GGT7, EDEM2, CEP250, GDF5, RBM39, SCAND1, EIF6

20 q11.22 rs6087571 F 0.0280 4.06E- 0.0100 0.0045 0.0318 0.0042 AHCY, CTD- TRPC4AP, TFM 14 3216D2.4 EIF2S2, MAP1LC3A, PIGU, GGT7, TP53INP2, ITCH, MYH7B

20 q11.22 rs6088638 F 0.0173 4.53E- 0.0099 0.0042 0.0280 0.0039 ACSS2 TRPC4AP, TFM 13 MAP1LC3A, PIGU, PROCR, GGT7, TP53INP2, EDEM2, MYH7B, EIF6

21 q22.11 rs28451064 F 0.0599 9.52E- 0.0089 0.0047 0.0348 0.0043 AP000320.7 SLC5A3, DBP 16 AP000317.1, MRPS6, LINC00310, LINC00649

22 q13.2 rs28489620 F 0.0120 6.24E- -0.0085 0.0034 -0.0208 0.0030 TEF, CTA- CENPM, LLFM 12 223H9.9 CYP2D7, DESI1, SMDT1, CCDC134, PMM1, CYP2D8P, CYP2D6, MCHR1, MEI1, SNU13, L3MBTL2, NDUFA6, WBP2NL, TEF

Table 3. Polygenic Risk Score Accuracy

Polygenic risk scores for all 16 traits with sex-specifc heritability differences were calculated using both pruning + thresholding and PRS-CS, a Bayesian approach. Risk scores weights were based on a sex-agnostic model, a male-only model, and a female-only model for the discovery set, and performance was measured with the R2 prediction values for each trait-sex pair in the replication set. The performance percent improvement was calculated as the difference between the performance of the sex-specifc model and the sex-agnostic model divided by the performance of the sex-agnostic model. The highest performance for each trait-sex pair is noted in bold.

Page 15/21 Trait Test Set Sex P+T Male P+T Female PRS-CS Male PRS-CS Female P+T p-value PRS-CS p-value

BW Male 0.0108 0.0085 0.0227 0.0304 0.764534 0.311802

Female 0.0074 0.0226 0.0252 0.0459 0.025517 0.002113

LAFP Male 0.0389 0.0346 0.0845 0.0688 0.570159 0.034216

Female 0.0299 0.0515 0.0731 0.099 0.001299 7.79E-05

TFM Male 0.0399 0.0364 0.0884 0.0719 0.643714 0.025771

Female 0.0332 0.0535 0.0787 0.1048 0.002472 6.5E-05

RAFP Male 0.0396 0.0342 0.085 0.0709 0.475767 0.057033

Female 0.0299 0.0511 0.0736 0.0949 0.001598 0.00117

RLFP Male 0.0398 0.0361 0.0783 0.0669 0.624919 0.124982

Female 0.0292 0.046 0.0669 0.0856 0.01251 0.004548

WFM Male 0.0441 0.0387 0.0907 0.0774 0.474733 0.071695

Female 0.0331 0.0508 0.0817 0.0997 0.00836 0.005909

LLFP Male 0.0411 0.0363 0.0775 0.0673 0.525765 0.169891

Female 0.0284 0.0463 0.0662 0.086 0.007801 0.002664

LAFM Male 0.043 0.0368 0.093 0.0783 0.412185 0.04633

Female 0.0325 0.0487 0.0783 0.0971 0.015866 0.004099

RLFM Male 0.0447 0.038 0.0899 0.0752 0.375155 0.046701

Female 0.0307 0.0464 0.0753 0.094 0.019541 0.004366

RAFM Male 0.0433 0.0377 0.0952 0.0824 0.458738 0.082237

Female 0.0324 0.0492 0.0777 0.0963 0.012364 0.004528

LLFM Male 0.0446 0.0384 0.09 0.0762 0.411795 0.061792

Female 0.0301 0.0472 0.0765 0.0935 0.010978 0.009548

DBP Male 0.0169 0.0163 0.0417 0.0399 0.946324 0.837962

Female 0.0118 0.0266 0.0327 0.0618 0.048966 8.57E-05

SBP Male 0.0141 0.0157 0.0405 0.0387 0.857645 0.838062

Female 0.0102 0.0194 0.0284 0.044 0.222022 0.036301

HBP Male 0.0034 0.0032 0.009 0.0086 0.982216 0.964342

Female 0.0021 0.0045 0.0081 0.0097 0.751454 0.832307

SNRG Male 0.0037 0.0038 0.0089 0.0091 0.98964 0.979227

Female 0.0025 0.0078 0.0081 0.0168 0.438389 0.201699

CRT Male 0.0068 0.0031 0.0109 0.0103 0.630718 0.937688

Female 0.0022 0.0069 0.0074 0.0138 0.492093 0.348079

Figures

Page 16/21 Figure 1

Flowchart of Analysis Procedure Sex-stratifed GWAS summary statistics in an initial discovery set for 733 traits were tested for sex-specifc heritability difference, and sex-specifc loci were selected from 16 traits to identify 360 candidate loci. A replication set was used to replicate 47 sex-specifc loci, and a literature review, QTL analysis, and pathway analysis were conducted using the replicated loci. Finally, the discovery and replication sets were used to compare sex-specifc and sex-agnostic PRS models.

Figure 2

Plot of Male and Female Heritabilities Male and female heritability estimates and their respective standard errors for a range of complex traits and diseases were calculated and compared using a two-sided z-score approach. A scatter plot of male and female heritability estimates for the 16 traits found with sex- specifc differences in heritability are shown below, with standard errors for each estimate indicated by the error bars.

Page 17/21 Figure 3

Plot of Genetic Correlations Between the 16 Traits Genetic correlation estimates were calculated between traits using LDSC and used to identify trait groups within the 16 traits identifed with sex-specifc heritability differences. Correlation estimates showed 5 clusters corresponding to snoring (SNRG), birth weight (BW), fat traits (WFM, RLFP, RLFM, LLFP, LLFM, RAFP, RAFM, LAFP, LAFM, TFM), creatinine levels (CRT), and blood pressure traits (DBP, SBP, HBP). These trait groups were used in subsequent analyses.

Figure 4

Quantile-Quantile Plots of the 16 Traits P-values from the sex-stratifed GWAS summary statistics were visualized on quantile-quantile plots. Plots were generated for each of the fve trait groups, with FM traits shown in (a), BP traits shown in (b), creatinine levels shown in (c), snoring shown in (d), and birth weight shown in (e). The intercepts estimated from LDSC (λLDSC) and the genomic control infation factors (λGC) were labelled in the legends. All traits had LDSC intercepts near 1, indicating that confounding effects were well adjusted in both males and females. The high values of λGC indicate the polygenicity of the traits. Male p-values are plotted as circles and female p-values as triangles.

Page 18/21 Figure 5

Double Manhattan Plots of Trait Groups with Signifcant Loci P-values for the sex-stratifed GWAS summary statistics were visualized using double Manhattan plots, with log-transformed female p-values on the positive y-axis and log-transformed male p-values on the negative y-axis. Double Manhattan plots were made for FM and BP traits, the two trait groups with replicated SSE loci, and are shown in subplots (a) and (b), respectively. Gene names corresponding to novel sex-specifc loci that were identifed after replication are labelled in both plots.

Page 19/21 Figure 6

Comparison of Discovery and Replication Sets Along the First Two Principal Components The discovery set was selected from individuals who were within 7 standard deviations of the frst 6 principal components and selected “British”, “Irish”, or “White” as their ancestry. The replication set was constructed as all individuals within 7 standard deviations of the frst 6 principal components who had not selected the previous responses. The cleaned discovery and replication sets were plotted along their frst two genetic principal components to determine the replication dataset’s suitability for the replication portion of the analysis.

Figure 7

Comparison of PRScs Polygenic Risk Predictive Performance of Sex-Specifc and Sex-Agnostic Models Polygenic risk scores for all 16 traits with sex-specifc heritability differences were calculated using both pruning + thresholding and PRScs, a Bayesian approach. Risk scores weights were based on a sex-agnostic model, a male-only model, and a female-only model for the discovery set. All 12 models were then tested on the replication set, and PRS performance for each

Page 20/21 trait group was aggregated. The performance of each model in the multi-trait groups (Fat, BP) are shown as box-and-whisker plots in subplots (a) and (b), respectively, and in the single-trait groups (Snoring, Creatinine, Birth Weight) as bar plots in subplots (c), (d), and (e). Colors indicate the sex of the testing set, where blue indicates performance on the male replication set and red indicates performance on the female replication set.

Supplementary Files

This is a list of supplementary fles associated with this preprint. Click to download.

SupplementaryInformation.docx SupplementaryTables.zip

Page 21/21