1 Supplementary Note
2 Identification of 370 loci for age at onset of sexual and 3 reproductive behaviour, highlighting common aetiology 4 with reproductive biology, externalizing behaviour and 5 longevity
6
7 Melinda C. Mills1,2,†,*, Felix C. Tropf1,2,3,4,†, David M. Brazel1,2,†, Natalie van Zuydam5, Ahmad Vaez6,7, 8 eQTLGen Consortium, BIOS Consortium, Tune H. Pers8,9, Harold Snieder6, John R.B. Perry10, Ken K. 9 Ong10,†, Marcel den Hoed5,†, Nicola Barban11,†, and Felix R. Day10,†,* on behalf of the Human 10 Reproductive Behaviour Consortium 11 12 1 Leverhulme Centre for Demographic Science, University of Oxford, Oxford, United Kingdom 13 2 Nuffield College, University of Oxford, Oxford, United Kingdom 14 3 École Nationale de la Statistique et de L’administration Économique (ENSAE), Paris, France 15 4 Center for Research in Economics and Statistics (CREST), Paris, France 16 5 The Beijer Laboratory and Department of Immunology, Genetics and Pathology, Uppsala University 17 and SciLifeLab, Uppsala, Sweden 18 6 Department of Epidemiology, University of Groningen, University Medical Center Groningen, 19 Groningen, The Netherlands 20 7 Department of Bioinformatics, Isfahan University of Medical Sciences, Isfahan, Iran 21 8 The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical 22 Sciences, University of Copenhagen, Copenhagen, Denmark 23 9 Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark 24 10 MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United 25 Kingdom 26 11 Institute of Social and Economic Research, University of Essex, Essex, United Kingdom 27 28 † Denotes equal contribution 29 * Correspondence to Melinda C. Mills, [email protected], and Felix R. Day, 30 [email protected] 31
32 33 1 Contents 2 List of Supplementary Figures ...... 4 3 List of Supplementary Tables ...... 5 4 1. Background and Phenotype Definitions ...... 7 5 1.1 Background ...... 7 6 1.2 Phenotype Definitions ...... 7 7 2. Phenotypic and genotypic changes in the onset of human reproductive behaviour over time ...... 8 8 2.1 Phenotypic changes in the onset of human reproductive behaviour ...... 8 9 2.2 A multifactorial life course approach to understanding human reproductive behaviour...... 11 10 2.3 Heterogeneity in heritability across birth cohorts ...... 12 11 3. Overview of GWAS meta-analysis ...... 14 12 3.1 Participating cohorts ...... 14 13 3.2 Sample inclusion criteria ...... 15 14 3.3 Genotyping and imputation ...... 16 15 3.4 Models used to test for association ...... 16 16 3.5 Analysis of X chromosome ...... 16 17 3.6 Quality Control (QC): filters & diagnostic checks ...... 16 18 3.6.1 Filters ...... 17 19 3.6.2 Diagnostic graphs ...... 17 20 3.6.3 SNPs and cohorts excluded ...... 18 21 3.7 Meta Analyses ...... 18 22 3.8 MTAG results ...... 19 23 3.9 Summary of discovered loci and Manhattan plots ...... 19 24 4. Polygenic score prediction ...... 24 25 4.1 Calculation of polygenic scores ...... 24 26 4.2 Out of sample prediction ...... 24 27 4.3 Accounting for right-censoring and comparing top and bottom 5% PGS ...... 25 28 5. Testing population stratification and environmentally mediated parental genetic effects of 29 childhood socioeconomic status ...... 27 30 5.1 Testing Population Stratification: LD Score intercept test ...... 27 31 5.2 Polygenic score prediction by childhood socio-economic status ...... 28 32 6. Genetic correlations with related traits ...... 30 33 6.1 Genetic correlation with 28 related traits ...... 30 1 6.2 Genetic correlations by sex ...... 31 2 7. Uncovering shared genetic etiology with Genomic SEM ...... 34 3 7.1 AFB and AFS regression educational attainment (EA) and trait X ...... 34 4 7.2 Reproductive biology and externalizing behaviour explanation of variance ...... 37 5 8. Bi-directional MR of reproductive behaviour, teenage behavioural disinhibition and onset of later 6 life disease...... 40 7 8.1 Background, methods and innovation ...... 40 8 8.2 Results MR ...... 40 9 9. Later age at first birth linked to longevity...... 41 10 9.1 Background and innovation ...... 41 11 9.2 Data and measurement ...... 42 12 9.3 Methods of analysis ...... 42 13 9.4 Results: Later reproductive timing predicts longevity ...... 43 14 10. Gene prioritization ...... 43 15 10.1 Methods ...... 43 16 10.2 Results ...... 46 17 10.2.1 Candidate genes in brain ...... 46 18 10.2.2 Candidate genes in glands ...... 47 19 10.2.3 Candidate genes in female reproductive organs ...... 47 20 10.2.4 Candidate genes in male reproductive organs ...... 49 21 11. Sex-specific genetic effects ...... 49 22 11.1 Genetic overlap among sexes: LD score bivariate regression ...... 50 23 11.2 Sex specific loci ...... 52 24 11.2.1 Methods and identification of 10 additional associations for AFS and 1 for AFB ...... 52 25 11.2.2 Gene Prioritization Results AFS ...... 52 26 11.2.3 Gene Prioritization Results AFB ...... 52 27 12 Contributions and Acknowledgments...... 53 28 12.1 Author and Cohort Contributions ...... 53 29 12.2 Acknowledgements ...... 59 30 References ...... 65 31 1 List of Supplementary Figures 2 Figure S1. Age at first birth (AFB) panel A and Age at first sex (AFS) panel B by cohort, UK Biobank 3 Figure S2. Correlation plot between Age at first birth and Age at first sex by birth cohort, UK Biobank 4 Figure S3. A multifactorial life course approach to understanding the timing of reproductive 5 behaviour 6 Figure S4A. SNP heritability for AFB, women in UK Biobank 7 Figure S4B. SNP heritability for AFS, women and men in UK Biobank 8 Figure S5. Manhattan plots, Age at first sex (AFS), Pooled (A), Women (B) and Men (C) 9 Figure S6. Manhattan plots, Age at first birth (AFB), Pooled (A), Women (B) and Men (C) 10 Figure S7. Variance explained from Polygenic scores for Age at First Birth and Age at First Sex using 11 PRSice, LDPred and MTAG+LDPred in out-of-sample cohorts 12 Figure S8. Nelson-Aalen hazard estimates of first sex by age. Comparison between top 5% and 13 bottom 5% PGS of age at first sex 14 Figure S9. Nelson-Aalen hazard estimates of first birth by age. Comparison between top 5% and 15 bottom 5% PGS of age at first birth 16 Figure S10A. AFB PGS score by percentile groups and parent’s educational level 17 Figure S10B. AFS PGS score by percentile groups and parent’s educational level 18 Figure S11. Genetic correlations and SNP heritabilities between and among reproductive, 19 behavioural, psychiatric, substance use, personality and anthropometric traits 20 Figure S12A. A path diagram showing the structure of the genetic multiple regression model fit to EA 21 and AFB 22 Figure S12B. A path diagram showing the structure of the genetic multiple regression model fit to 23 AFS and EA 24 Figure S13. A heat map showing the genetic correlations between and among the fertility GWAS 25 phenotypes, the sex hormone phenotypes, and other phenotypes related to reproductive 26 biology, as calculated by LD score regression. 27 Figure S14. A path diagram for a GenomicSEM model of the relative associations of an externalizing 28 latent factor, age at menopause, and age at menarche with age at first birth in women 29 Figure S15. Coefficients (and Cis) of bi-directional MR of human reproductive behaviour (AFB, AFS), 30 age initiated smoking and educational attainment on Type 2 diabetes and Coronary 31 Artery Disease later in life 32 Figure S16. Protein-protein interactions identified using STRING for genes that are highly expressed 33 at the protein level in: A) brain and result in a nervous system or neurological behavior 34 phenotype in mutant mice; B) glands and result in an endocrine/exocrine phenotype in 35 mutant mice; C-D) female (C) or male (D) reproductive organs and result in a 36 reproductive phenotype in mutant mice. Pink lines highlight experimentally determined 37 interactions. 38 Figure S17. Gene prioritization of AFS and AFB by sex 39 Figure S18. Genetic overlap amongst the sexes for AFS and AFB, LD score bivariate regression 40 1 List of Supplementary Tables 2 Table S1. Description of participating cohorts 3 Table S2 Cohort phenotype description 4 Table S3a. Sample size individuals with automsome chromosome information in participating 5 cohorts 6 Table S3b. Sample size individuals with sex chromosome information in participating cohorts 7 Table S4. Genotyping and imputation 8 Table S5. Description of SNP filtering and cohort exclusion for age at first birth (AFB) analyses for 9 women 10 Table S6. Description of SNP filtering and cohort exclusion for age at first sex (AFS) analyses for 11 women 12 Table S7. Description of SNP filtering and cohort exclusion for age at first birth (AFB) analyses for 13 women 14 Table S8. Description of SNP filtering and cohort exclusion for age at first sex (AFS) analyses for 15 women 16 Table S9. Association Results for 88 independent SNPs that reached genome-wide significance (P < 17 5×10-8) in the pooled-sex GWAS of Age at First birth (AFB), AFB Males and Females 18 Table S10. Association Results for 261 independent SNPs that reached genome-wide significance (P < 19 5×10-8) in the pooled-sex GWAS of Age at First Sex (AFS), AFS Males and Females 20 Table S11. Genetic correlations (rg) AFB and AFS with selected phenotypes 21 Table S12A . Unstandardized results from genetic multivariate regression models examining the 22 relationship between AFS in males and EA, accounting for the genetic correlation of AFS 23 with a third phenotype 24 Table S12B. Unstandardized results from genetic multivariate regression models examining the 25 relationship between EA and AFB, accounting for the genetic correlation of EA with a 26 third phenotype 27 Table S12C. Standardized results from genetic multivariate regression models examining the 28 relationship between AFS and EA, accounting for the genetic correlation of AFS with a 29 third phenotype 30 Table 12D. Unstandardized results from genetic multivariate regression models examining the 31 relationship between AFS and EA, accounting for the genetic correlation of AFS with a 32 third phenotype 33 Table S12E. Standardized results from genetic multivariate regression models examining the 34 relationship between EA and AFB in males, accounting for the genetic correlation of EA 35 with a third phenotype 36 Table S12F. Unstandardized results from genetic multivariate regression models examining the 37 relationship between EA and AFB in males, accounting for the genetic correlation of EA 38 with a third phenotype 39 Table S12G. Standardized results from genetic multivariate regression models examining the 40 relationship between EA and AFB in females, accounting for the genetic correlation of EA 41 with a third phenotype 42 Table 12H. Unstandardized results from genetic multivariate regression models examining the 43 relationship between EA and AFB in females, accounting for the genetic correlation of EA 44 with a third phenotype 45 Table 12I. Standardized results from genetic multivariate regression models examining the 46 relationship between AFS in males and EA, accounting for the genetic correlation of AFS 47 with a third phenotype 1 Table 12J. Unstandardized results from genetic multivariate regression models examining the 2 relationship between AFS in males and EA, accounting for the genetic correlation of AFS 3 with a third phenotype 4 Table 12K. Standardized results from genetic multivariate regression models examining the 5 relationship between AFS in females and EA, accounting for the genetic correlation of AFS 6 with a third phenotype 7 Table 12L. Unstandardized results from genetic multivariate regression models examining the 8 relationship between AFS in females and EA, accounting for the genetic correlation of AFS 9 with a third phenotype 10 Table 13A. Bi-Directional MR, Years of education and AFB and AFB/AFS with risk taking and age at 11 smoking initiation 12 Table S13B. Mendelian Randomization (MR) of age at first birth (AFB) to Coronary arterty disease 13 (CAD) and Type 2 diabetes (T2D) and age at first sex (AFS) to CAD and T2D, and 14 Educational Attainment to CAD and T2D 15 Table S14. Polygenic score (PGS) prediction of age at first birth (AFB), educational attainment (EA) 16 and risk on parental longevity 17 Table S15A. Results from DEPICT tissue enrichment analysis for age at first sex (AFS) 18 Table S15B. Results from DEPICT tissue enrichment analysis for age at first birth (AFB) 19 Table 15C. Results from DEPICT gene prioritization for age at first sex (AFS) 20 Table S15D. Results from DEPICT gene prioritization for age at first birth (AFB) 21 Table S15E. Results from DEPICT cell type enrichment analysis using mouse brain RNA sequencing 22 data for age at first sex (AFS) 23 Table S15F. Results from DEPICT cell type enrichment analysis using mouse brain RNA sequencing 24 data for age at first birth (AFB) 25 Table S15G. Results from DEPICT cell type enrichment analysis using tabula muris RNA sequencing 26 data for age at first sex (AFS) 27 Table S16A. Search terms used for the Phenolyzer analysis for the three areas of interest 28 Table S16B. Results of Phenolyzer analysis age at first birth (AFB) and age at first sex (AFS) 29 Table S17A. The results of in silico sequencing and in silico lookup of GWAS associations of AFB. 30 AF_EUR indicates the allele frequency of the alternative allele (A2) in the European 31 population. 32 Table S17B. The results of in silico sequencing and in silico lookup of GWAS associations of AFS. 33 AF_EUR indicates the allele frequency of the alternative allele (A2) in the European 34 population. 35 Table S18A. Summary data-based Mendelian Randomization (SMR) for age at first sex (AFS) 36 Table S18B. Summary data-based Mendelian Randomization (SMR) for age at first birth (AFB) 37 Table S19A. Summary of gene prioritization results across all approaches for age at first sex (AFS) 38 Table S19B. Summary of gene prioritization results across all approaches for age at first birth (AFB) 39 Table S19C. Summary of gene prioritization results across all approaches for age at first sex (AFS) and 40 age at first birth (AFB) 1 1. Background and Phenotype Definitions
2 1.1 Background 3 Previous studies have shown that the onset of human reproductive behaviour – age at first sexual 4 intercourse (AFS) and age at first birth (AFB) – have a genetic basis. AFB has a SNP-heritability of 5 15%1 and AFS 15-17% (see Section 3), with two genome-wide association studies (GWAS) in 2016 6 identifying 10 genetic loci linked to AFB2 and 38 associated with AFS.3 A detailed discussion of the 7 motivation behind the study of these traits including the evolutionary causes of genetic variance in 8 reproductive behaviour, additive and dominant genetic variation and environmental variation in 9 fertility behaviour can be found in the Supplementary Note of Barban et al. (2016).2 A description of 10 the data and methods used in this study can be found in the online Methods section appended to 11 the article.
12 The current analysis extends previous work in several appreciable ways. First, this study has a 13 sizeable increase in sample size, making this the largest GWAS to date on these phenotypes. 14 Previous work on AFS3 examined a small sample of 125,667 individuals from the UK Biobank with a 15 study on AFB2 examining 251,151 individuals. The current study is considerably larger for both AFS 16 (N=397,338 pooled; N=214,547 women; N=182,791) and AFB (N=542,901 pooled; N=418,758 17 women; N=124,008 men). A second extension is that we use 1000G imputed genotype data, which 18 in addition to the larger sample, allows us to detect considerably more signals. Third, we include an 19 X-Chromosome analysis, allowing us to uncover additional novel loci. A fourth advance is the ability 20 to find markedly more biological signals. Fifth, our extensive analyses of the correlation and 21 underlying etiology of these traits reveals an underlying genetic basis of AFS and AFB with other 22 traits. This includes externalizing behaviour and substance use for early AFS and AFB and links to 23 internalizing traits and infertility disease for later AFS and AFB. Sixth, we show that AFB is a stronger 24 predictor for late age onset of disease and longevity, even beyond known standard predictors such 25 as educational attainment. Finally, we demonstrate how that our polygenic scores are sensitive to 26 gene-environment correlation (rGE) and childhood socioeconomic status.
27 1.2 Phenotype Definitions 28 An overview of participating cohorts is found in Table S1, with a description provided shortly in 29 Section 3. The detailed phenotype definitions and questions drawn from each of the cohorts are 30 included in Supplementary Table S2.
31 Age at first sexual intercourse (AFS) is treated as a continuous measure and assessed using 32 questions such as What was your age when you first had sexual intercourse? This is often defined by 33 more detailed divisions such as (sexual intercourse includes vaginal, oral or anal intercourse). Ages 34 less than 12 are normally excluded. The UKBiobank requires confirmation of ages in the range 4-12, 35 and excludes all answers that were less than 4. For out of sample replication, other studies include 36 12 as the minimum age, if they do not have a study specific lower limit. Age at first sexual 37 intercourse tends to have a markedly non-normal distribution, so a within-sex inverse rank normal 38 transformation is required. 1 Age at first birth (AFB) is treated as a continuous measure either asked directly or created from 2 several survey questions (e.g., birthdate of participant and date of birth of first child). The most 3 common question was: How old were you when you had your first child? Or What is the date of birth 4 of your first child? Individuals were eligible for inclusion if they were assessed for AFB and had given 5 birth to a child.
6 2. Phenotypic and genotypic changes in the onset of human 7 reproductive behaviour over time
8 2.1 Phenotypic changes in the onset of human reproductive behaviour 9 Over the past forty years, there has been a rapid postponement of age at first birth (AFB) by 4-5 10 years to a mean AFB for women around 29 many advanced societies.4 The biological ability to 11 conceive a child already starts to decline for many women as early as 25, with around 50% of 12 women sterile by the age of 40.5 This postponement has been related to multiple social, economic, 13 and cultural factors, which has been documented in several detailed reviews.4,6 A central factor is 14 the introduction of effective contraception and ability to control fertility and engage in individual 15 choice since the late 1960s. Another key factor is the well-documented association between 16 women’s gains in educational attainment and that relationship with later fertility, particularly for 17 more recent birth cohorts. This is related to women’s stronger labour market attachment and their 18 realization that fertility postponement avoids large motherhood wage penalties. In fact, by each a 19 year a woman delays motherhood, she increases her career earnings by 9%.6 Other factors are the 20 strong cultural and ideational changes and norms surrounding sexual behaviour, entry into 21 parenthood and the role of children who are often no longer strongly required for economic and 22 labour support to parents. Finally, multiple structural factors such as the availability of childcare, 23 gender equity, housing and resources all play a vital role.
24 Figure S1 (Panel A) examines phenotypic data from the UK Biobank and shows the shift in the 25 distribution of AFB not only to later to ages, but also a wider spread in the distribution itself. Figure 26 S1 (Panel B) of AFS shows that in earlier cohorts, there was a bi-modal distribution, one which had 27 earlier sexual intercourse often tied to socio-economic circumstances, problem or risky behaviour.7 28 The other group engaging in later sexual initiation, has been found to be tied to higher educational 29 goals and achievement with early sexual intercourse tied to with higher calculated risk of pregnancy, 30 which would disturb longer-term life planning and career goals.8 The panel also shows a narrowing 31 of the distribution over time to earlier ages. 1
2 Figure S1. Age at first birth (AFB) panel A and Age at first sex (AFS) panel B by cohort, UK Biobank
3 Figure S2 documents how sexual debut was linked to first childbirth in earlier birth cohorts (<1941, 4 r=0.60) to a relative uncoupling in more recent birth cohorts (>1960, r=0.31). Related to this is a 5 large body of demographic work that has examined the decoupling of sex with marriage.6
6 Figure S2 illustrates the gradual decoupling of sexual initiation with reproduction. Here we see that 7 the correlation or timing between AFS and AFB was concentrated and closer together in earlier birth 8 cohorts whereas with more recent birth cohorts, it is increasingly more widely distributed over time. 9 In other words, the classic association of sexual behaviour with marriage and childbearing held in 10 earlier cohorts has waned over time, largely due to the introduction of effective contraception and 11 changes in social norms about sexual behaviour outside of a marital union.6
12 1 Figure S2. Correlation plot between Age at first birth and Age at first sex by birth cohort, UK Biobank
2 1 2.2 A multifactorial life course approach to understanding human 2 reproductive behaviour 3
4 As Figure S3 illustrates, the theoretical model driving the analytical approach in this paper adopts a 5 multifactorial life course model. Our central aim is to isolate genetic loci related to the timing of the 6 onset of reproductive behaviour. It is increasingly acknowledged, however, that it is essential to 7 understand and interpret genetic findings in relation to environmental context, but also for these 8 phenotypes, reproductive biological factors, externalizing or behavioural disinhibition and 9 internalizing behaviour. Adopting a life course approach means that we consider different life span 10 phases from pre-adolescence (onset of menarche, voice-breaking, childhood socioeconomic status), 11 to adolescence where AFS often occurs and then early to mid-adulthood and AFB. We therefore 12 consider key phenotypes measured around AFS, such as age at smoking initiation and contraception. 13 We extend this approach to understand how these earlier life course conditions (e.g., childhood 14 socioeconomic status or timing of AFS/AFB) impact later-life disease development and longevity.
15 Figure S3. A multifactorial life course approach to understanding the timing of reproductive 16 behaviour
17
11
1 2.3 Heterogeneity in heritability across birth cohorts 2 A recent study demonstrated that estimates from GWAS discoveries are substantially smaller across 3 populations compared to within populations.9 Simulations showed that the results reflected 4 heterogeneity in gene – environment interaction rather than genetic heterogeneity. In other words, 5 particularly for complex traits and diseases such as reproductive behaviour or others such as 6 educational attainment or BMI, it is more difficult to determine the influence of genetic versus 7 socio-environmental factors. That study demonstrated that although GWA studies combine data 8 from individuals across different time periods, it is implausible to assume that genetic effects are 9 uniform across time.
10 To test whether this was a concern with our current analysis, Figures S4A and B show how the SNP 11 heritability estimates change over time for AFB and AFS using the UK Biobank. Our SNP heritability 12 estimate refers to the proportion of the additive genetic variance explained by common SNPs across 13 the genome over the overall phenotypic variance ( ) of the trait: 10 = 14 ℎ 15 The phenotypic variance is the sum of additive genetic and environmental variance, i.e., + 16 , where is the additive genetic variance explained by all common SNPs across the genome and = 17 is the residual variance. The methods we applied have been detailed elsewhere. 10–14 Briefly, we 18 applied a linear mixed model 19