1 Supplementary Note

2 Identification of 370 loci for age at onset of sexual and 3 reproductive behaviour, highlighting common aetiology 4 with reproductive biology, externalizing behaviour and 5 longevity

6

7 Melinda C. Mills1,2,†,*, Felix C. Tropf1,2,3,4,†, David M. Brazel1,2,†, Natalie van Zuydam5, Ahmad Vaez6,7, 8 eQTLGen Consortium, BIOS Consortium, Tune H. Pers8,9, Harold Snieder6, John R.B. Perry10, Ken K. 9 Ong10,†, Marcel den Hoed5,†, Nicola Barban11,†, and Felix R. Day10,†,* on behalf of the Human 10 Reproductive Behaviour Consortium 11 12 1 Leverhulme Centre for Demographic Science, University of Oxford, Oxford, United Kingdom 13 2 Nuffield College, University of Oxford, Oxford, United Kingdom 14 3 École Nationale de la Statistique et de L’administration Économique (ENSAE), Paris, France 15 4 Center for Research in Economics and Statistics (CREST), Paris, France 16 5 The Beijer Laboratory and Department of Immunology, Genetics and Pathology, Uppsala University 17 and SciLifeLab, Uppsala, Sweden 18 6 Department of Epidemiology, University of Groningen, University Medical Center Groningen, 19 Groningen, The Netherlands 20 7 Department of , Isfahan University of Medical Sciences, Isfahan, Iran 21 8 The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical 22 Sciences, University of Copenhagen, Copenhagen, Denmark 23 9 Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark 24 10 MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United 25 Kingdom 26 11 Institute of Social and Economic Research, University of Essex, Essex, United Kingdom 27 28 † Denotes equal contribution 29 * Correspondence to Melinda C. Mills, [email protected], and Felix R. Day, 30 [email protected] 31

32 33 1 Contents 2 List of Supplementary Figures ...... 4 3 List of Supplementary Tables ...... 5 4 1. Background and Phenotype Definitions ...... 7 5 1.1 Background ...... 7 6 1.2 Phenotype Definitions ...... 7 7 2. Phenotypic and genotypic changes in the onset of human reproductive behaviour over time ...... 8 8 2.1 Phenotypic changes in the onset of human reproductive behaviour ...... 8 9 2.2 A multifactorial life course approach to understanding human reproductive behaviour...... 11 10 2.3 Heterogeneity in heritability across birth cohorts ...... 12 11 3. Overview of GWAS meta-analysis ...... 14 12 3.1 Participating cohorts ...... 14 13 3.2 Sample inclusion criteria ...... 15 14 3.3 Genotyping and imputation ...... 16 15 3.4 Models used to test for association ...... 16 16 3.5 Analysis of X chromosome ...... 16 17 3.6 Quality Control (QC): filters & diagnostic checks ...... 16 18 3.6.1 Filters ...... 17 19 3.6.2 Diagnostic graphs ...... 17 20 3.6.3 SNPs and cohorts excluded ...... 18 21 3.7 Meta Analyses ...... 18 22 3.8 MTAG results ...... 19 23 3.9 Summary of discovered loci and Manhattan plots ...... 19 24 4. Polygenic score prediction ...... 24 25 4.1 Calculation of polygenic scores ...... 24 26 4.2 Out of sample prediction ...... 24 27 4.3 Accounting for right-censoring and comparing top and bottom 5% PGS ...... 25 28 5. Testing population stratification and environmentally mediated parental genetic effects of 29 childhood socioeconomic status ...... 27 30 5.1 Testing Population Stratification: LD Score intercept test ...... 27 31 5.2 Polygenic score prediction by childhood socio-economic status ...... 28 32 6. Genetic correlations with related traits ...... 30 33 6.1 Genetic correlation with 28 related traits ...... 30 1 6.2 Genetic correlations by sex ...... 31 2 7. Uncovering shared genetic etiology with Genomic SEM ...... 34 3 7.1 AFB and AFS regression educational attainment (EA) and trait X ...... 34 4 7.2 Reproductive biology and externalizing behaviour explanation of variance ...... 37 5 8. Bi-directional MR of reproductive behaviour, teenage behavioural disinhibition and onset of later 6 life disease...... 40 7 8.1 Background, methods and innovation ...... 40 8 8.2 Results MR ...... 40 9 9. Later age at first birth linked to longevity...... 41 10 9.1 Background and innovation ...... 41 11 9.2 Data and measurement ...... 42 12 9.3 Methods of analysis ...... 42 13 9.4 Results: Later reproductive timing predicts longevity ...... 43 14 10. Gene prioritization ...... 43 15 10.1 Methods ...... 43 16 10.2 Results ...... 46 17 10.2.1 Candidate genes in brain ...... 46 18 10.2.2 Candidate genes in glands ...... 47 19 10.2.3 Candidate genes in female reproductive organs ...... 47 20 10.2.4 Candidate genes in male reproductive organs ...... 49 21 11. Sex-specific genetic effects ...... 49 22 11.1 Genetic overlap among sexes: LD score bivariate regression ...... 50 23 11.2 Sex specific loci ...... 52 24 11.2.1 Methods and identification of 10 additional associations for AFS and 1 for AFB ...... 52 25 11.2.2 Gene Prioritization Results AFS ...... 52 26 11.2.3 Gene Prioritization Results AFB ...... 52 27 12 Contributions and Acknowledgments...... 53 28 12.1 Author and Cohort Contributions ...... 53 29 12.2 Acknowledgements ...... 59 30 References ...... 65 31 1 List of Supplementary Figures 2 Figure S1. Age at first birth (AFB) panel A and Age at first sex (AFS) panel B by cohort, UK 3 Figure S2. Correlation plot between Age at first birth and Age at first sex by birth cohort, UK Biobank 4 Figure S3. A multifactorial life course approach to understanding the timing of reproductive 5 behaviour 6 Figure S4A. SNP heritability for AFB, women in UK Biobank 7 Figure S4B. SNP heritability for AFS, women and men in UK Biobank 8 Figure S5. Manhattan plots, Age at first sex (AFS), Pooled (A), Women (B) and Men (C) 9 Figure S6. Manhattan plots, Age at first birth (AFB), Pooled (A), Women (B) and Men (C) 10 Figure S7. Variance explained from Polygenic scores for Age at First Birth and Age at First Sex using 11 PRSice, LDPred and MTAG+LDPred in out-of-sample cohorts 12 Figure S8. Nelson-Aalen hazard estimates of first sex by age. Comparison between top 5% and 13 bottom 5% PGS of age at first sex 14 Figure S9. Nelson-Aalen hazard estimates of first birth by age. Comparison between top 5% and 15 bottom 5% PGS of age at first birth 16 Figure S10A. AFB PGS score by percentile groups and parent’s educational level 17 Figure S10B. AFS PGS score by percentile groups and parent’s educational level 18 Figure S11. Genetic correlations and SNP heritabilities between and among reproductive, 19 behavioural, psychiatric, substance use, personality and anthropometric traits 20 Figure S12A. A path diagram showing the structure of the genetic multiple regression model fit to EA 21 and AFB 22 Figure S12B. A path diagram showing the structure of the genetic multiple regression model fit to 23 AFS and EA 24 Figure S13. A heat map showing the genetic correlations between and among the fertility GWAS 25 phenotypes, the sex hormone phenotypes, and other phenotypes related to reproductive 26 biology, as calculated by LD score regression. 27 Figure S14. A path diagram for a GenomicSEM model of the relative associations of an externalizing 28 latent factor, age at menopause, and age at menarche with age at first birth in women 29 Figure S15. Coefficients (and Cis) of bi-directional MR of human reproductive behaviour (AFB, AFS), 30 age initiated smoking and educational attainment on Type 2 diabetes and Coronary 31 Artery Disease later in life 32 Figure S16. Protein-protein interactions identified using STRING for genes that are highly expressed 33 at the protein level in: A) brain and result in a nervous system or neurological behavior 34 phenotype in mutant mice; B) glands and result in an endocrine/exocrine phenotype in 35 mutant mice; C-D) female (C) or male (D) reproductive organs and result in a 36 reproductive phenotype in mutant mice. Pink lines highlight experimentally determined 37 interactions. 38 Figure S17. Gene prioritization of AFS and AFB by sex 39 Figure S18. Genetic overlap amongst the sexes for AFS and AFB, LD score bivariate regression 40 1 List of Supplementary Tables 2 Table S1. Description of participating cohorts 3 Table S2 Cohort phenotype description 4 Table S3a. Sample size individuals with automsome chromosome information in participating 5 cohorts 6 Table S3b. Sample size individuals with sex chromosome information in participating cohorts 7 Table S4. Genotyping and imputation 8 Table S5. Description of SNP filtering and cohort exclusion for age at first birth (AFB) analyses for 9 women 10 Table S6. Description of SNP filtering and cohort exclusion for age at first sex (AFS) analyses for 11 women 12 Table S7. Description of SNP filtering and cohort exclusion for age at first birth (AFB) analyses for 13 women 14 Table S8. Description of SNP filtering and cohort exclusion for age at first sex (AFS) analyses for 15 women 16 Table S9. Association Results for 88 independent SNPs that reached genome-wide significance (P < 17 5×10-8) in the pooled-sex GWAS of Age at First birth (AFB), AFB Males and Females 18 Table S10. Association Results for 261 independent SNPs that reached genome-wide significance (P < 19 5×10-8) in the pooled-sex GWAS of Age at First Sex (AFS), AFS Males and Females 20 Table S11. Genetic correlations (rg) AFB and AFS with selected phenotypes 21 Table S12A . Unstandardized results from genetic multivariate regression models examining the 22 relationship between AFS in males and EA, accounting for the genetic correlation of AFS 23 with a third phenotype 24 Table S12B. Unstandardized results from genetic multivariate regression models examining the 25 relationship between EA and AFB, accounting for the genetic correlation of EA with a 26 third phenotype 27 Table S12C. Standardized results from genetic multivariate regression models examining the 28 relationship between AFS and EA, accounting for the genetic correlation of AFS with a 29 third phenotype 30 Table 12D. Unstandardized results from genetic multivariate regression models examining the 31 relationship between AFS and EA, accounting for the genetic correlation of AFS with a 32 third phenotype 33 Table S12E. Standardized results from genetic multivariate regression models examining the 34 relationship between EA and AFB in males, accounting for the genetic correlation of EA 35 with a third phenotype 36 Table S12F. Unstandardized results from genetic multivariate regression models examining the 37 relationship between EA and AFB in males, accounting for the genetic correlation of EA 38 with a third phenotype 39 Table S12G. Standardized results from genetic multivariate regression models examining the 40 relationship between EA and AFB in females, accounting for the genetic correlation of EA 41 with a third phenotype 42 Table 12H. Unstandardized results from genetic multivariate regression models examining the 43 relationship between EA and AFB in females, accounting for the genetic correlation of EA 44 with a third phenotype 45 Table 12I. Standardized results from genetic multivariate regression models examining the 46 relationship between AFS in males and EA, accounting for the genetic correlation of AFS 47 with a third phenotype 1 Table 12J. Unstandardized results from genetic multivariate regression models examining the 2 relationship between AFS in males and EA, accounting for the genetic correlation of AFS 3 with a third phenotype 4 Table 12K. Standardized results from genetic multivariate regression models examining the 5 relationship between AFS in females and EA, accounting for the genetic correlation of AFS 6 with a third phenotype 7 Table 12L. Unstandardized results from genetic multivariate regression models examining the 8 relationship between AFS in females and EA, accounting for the genetic correlation of AFS 9 with a third phenotype 10 Table 13A. Bi-Directional MR, Years of education and AFB and AFB/AFS with risk taking and age at 11 smoking initiation 12 Table S13B. Mendelian Randomization (MR) of age at first birth (AFB) to Coronary arterty disease 13 (CAD) and Type 2 diabetes (T2D) and age at first sex (AFS) to CAD and T2D, and 14 Educational Attainment to CAD and T2D 15 Table S14. Polygenic score (PGS) prediction of age at first birth (AFB), educational attainment (EA) 16 and risk on parental longevity 17 Table S15A. Results from DEPICT tissue enrichment analysis for age at first sex (AFS) 18 Table S15B. Results from DEPICT tissue enrichment analysis for age at first birth (AFB) 19 Table 15C. Results from DEPICT gene prioritization for age at first sex (AFS) 20 Table S15D. Results from DEPICT gene prioritization for age at first birth (AFB) 21 Table S15E. Results from DEPICT cell type enrichment analysis using mouse brain RNA sequencing 22 data for age at first sex (AFS) 23 Table S15F. Results from DEPICT cell type enrichment analysis using mouse brain RNA sequencing 24 data for age at first birth (AFB) 25 Table S15G. Results from DEPICT cell type enrichment analysis using tabula muris RNA sequencing 26 data for age at first sex (AFS) 27 Table S16A. Search terms used for the Phenolyzer analysis for the three areas of interest 28 Table S16B. Results of Phenolyzer analysis age at first birth (AFB) and age at first sex (AFS) 29 Table S17A. The results of in silico sequencing and in silico lookup of GWAS associations of AFB. 30 AF_EUR indicates the allele frequency of the alternative allele (A2) in the European 31 population. 32 Table S17B. The results of in silico sequencing and in silico lookup of GWAS associations of AFS. 33 AF_EUR indicates the allele frequency of the alternative allele (A2) in the European 34 population. 35 Table S18A. Summary data-based Mendelian Randomization (SMR) for age at first sex (AFS) 36 Table S18B. Summary data-based Mendelian Randomization (SMR) for age at first birth (AFB) 37 Table S19A. Summary of gene prioritization results across all approaches for age at first sex (AFS) 38 Table S19B. Summary of gene prioritization results across all approaches for age at first birth (AFB) 39 Table S19C. Summary of gene prioritization results across all approaches for age at first sex (AFS) and 40 age at first birth (AFB) 1 1. Background and Phenotype Definitions

2 1.1 Background 3 Previous studies have shown that the onset of human reproductive behaviour – age at first sexual 4 intercourse (AFS) and age at first birth (AFB) – have a genetic basis. AFB has a SNP-heritability of 5 15%1 and AFS 15-17% (see Section 3), with two genome-wide association studies (GWAS) in 2016 6 identifying 10 genetic loci linked to AFB2 and 38 associated with AFS.3 A detailed discussion of the 7 motivation behind the study of these traits including the evolutionary causes of genetic variance in 8 reproductive behaviour, additive and dominant genetic variation and environmental variation in 9 fertility behaviour can be found in the Supplementary Note of Barban et al. (2016).2 A description of 10 the data and methods used in this study can be found in the online Methods section appended to 11 the article.

12 The current analysis extends previous work in several appreciable ways. First, this study has a 13 sizeable increase in sample size, making this the largest GWAS to date on these phenotypes. 14 Previous work on AFS3 examined a small sample of 125,667 individuals from the UK Biobank with a 15 study on AFB2 examining 251,151 individuals. The current study is considerably larger for both AFS 16 (N=397,338 pooled; N=214,547 women; N=182,791) and AFB (N=542,901 pooled; N=418,758 17 women; N=124,008 men). A second extension is that we use 1000G imputed genotype data, which 18 in addition to the larger sample, allows us to detect considerably more signals. Third, we include an 19 X-Chromosome analysis, allowing us to uncover additional novel loci. A fourth advance is the ability 20 to find markedly more biological signals. Fifth, our extensive analyses of the correlation and 21 underlying etiology of these traits reveals an underlying genetic basis of AFS and AFB with other 22 traits. This includes externalizing behaviour and substance use for early AFS and AFB and links to 23 internalizing traits and infertility disease for later AFS and AFB. Sixth, we show that AFB is a stronger 24 predictor for late age onset of disease and longevity, even beyond known standard predictors such 25 as educational attainment. Finally, we demonstrate how that our polygenic scores are sensitive to 26 gene-environment correlation (rGE) and childhood socioeconomic status.

27 1.2 Phenotype Definitions 28 An overview of participating cohorts is found in Table S1, with a description provided shortly in 29 Section 3. The detailed phenotype definitions and questions drawn from each of the cohorts are 30 included in Supplementary Table S2.

31 Age at first sexual intercourse (AFS) is treated as a continuous measure and assessed using 32 questions such as What was your age when you first had sexual intercourse? This is often defined by 33 more detailed divisions such as (sexual intercourse includes vaginal, oral or anal intercourse). Ages 34 less than 12 are normally excluded. The UKBiobank requires confirmation of ages in the range 4-12, 35 and excludes all answers that were less than 4. For out of sample replication, other studies include 36 12 as the minimum age, if they do not have a study specific lower limit. Age at first sexual 37 intercourse tends to have a markedly non-normal distribution, so a within-sex inverse rank normal 38 transformation is required. 1 Age at first birth (AFB) is treated as a continuous measure either asked directly or created from 2 several survey questions (e.g., birthdate of participant and date of birth of first child). The most 3 common question was: How old were you when you had your first child? Or What is the date of birth 4 of your first child? Individuals were eligible for inclusion if they were assessed for AFB and had given 5 birth to a child.

6 2. Phenotypic and genotypic changes in the onset of human 7 reproductive behaviour over time

8 2.1 Phenotypic changes in the onset of human reproductive behaviour 9 Over the past forty years, there has been a rapid postponement of age at first birth (AFB) by 4-5 10 years to a mean AFB for women around 29 many advanced societies.4 The biological ability to 11 conceive a child already starts to decline for many women as early as 25, with around 50% of 12 women sterile by the age of 40.5 This postponement has been related to multiple social, economic, 13 and cultural factors, which has been documented in several detailed reviews.4,6 A central factor is 14 the introduction of effective contraception and ability to control fertility and engage in individual 15 choice since the late 1960s. Another key factor is the well-documented association between 16 women’s gains in educational attainment and that relationship with later fertility, particularly for 17 more recent birth cohorts. This is related to women’s stronger labour market attachment and their 18 realization that fertility postponement avoids large motherhood wage penalties. In fact, by each a 19 year a woman delays motherhood, she increases her career earnings by 9%.6 Other factors are the 20 strong cultural and ideational changes and norms surrounding sexual behaviour, entry into 21 parenthood and the role of children who are often no longer strongly required for economic and 22 labour support to parents. Finally, multiple structural factors such as the availability of childcare, 23 gender equity, housing and resources all play a vital role.

24 Figure S1 (Panel A) examines phenotypic data from the UK Biobank and shows the shift in the 25 distribution of AFB not only to later to ages, but also a wider spread in the distribution itself. Figure 26 S1 (Panel B) of AFS shows that in earlier cohorts, there was a bi-modal distribution, one which had 27 earlier sexual intercourse often tied to socio-economic circumstances, problem or risky behaviour.7 28 The other group engaging in later sexual initiation, has been found to be tied to higher educational 29 goals and achievement with early sexual intercourse tied to with higher calculated risk of pregnancy, 30 which would disturb longer-term life planning and career goals.8 The panel also shows a narrowing 31 of the distribution over time to earlier ages. 1

2 Figure S1. Age at first birth (AFB) panel A and Age at first sex (AFS) panel B by cohort, UK Biobank

3 Figure S2 documents how sexual debut was linked to first childbirth in earlier birth cohorts (<1941, 4 r=0.60) to a relative uncoupling in more recent birth cohorts (>1960, r=0.31). Related to this is a 5 large body of demographic work that has examined the decoupling of sex with marriage.6

6 Figure S2 illustrates the gradual decoupling of sexual initiation with reproduction. Here we see that 7 the correlation or timing between AFS and AFB was concentrated and closer together in earlier birth 8 cohorts whereas with more recent birth cohorts, it is increasingly more widely distributed over time. 9 In other words, the classic association of sexual behaviour with marriage and childbearing held in 10 earlier cohorts has waned over time, largely due to the introduction of effective contraception and 11 changes in social norms about sexual behaviour outside of a marital union.6

12 1 Figure S2. Correlation plot between Age at first birth and Age at first sex by birth cohort, UK Biobank

2 1 2.2 A multifactorial life course approach to understanding human 2 reproductive behaviour 3

4 As Figure S3 illustrates, the theoretical model driving the analytical approach in this paper adopts a 5 multifactorial life course model. Our central aim is to isolate genetic loci related to the timing of the 6 onset of reproductive behaviour. It is increasingly acknowledged, however, that it is essential to 7 understand and interpret genetic findings in relation to environmental context, but also for these 8 phenotypes, reproductive biological factors, externalizing or behavioural disinhibition and 9 internalizing behaviour. Adopting a life course approach means that we consider different life span 10 phases from pre-adolescence (onset of menarche, voice-breaking, childhood socioeconomic status), 11 to adolescence where AFS often occurs and then early to mid-adulthood and AFB. We therefore 12 consider key phenotypes measured around AFS, such as age at smoking initiation and contraception. 13 We extend this approach to understand how these earlier life course conditions (e.g., childhood 14 socioeconomic status or timing of AFS/AFB) impact later-life disease development and longevity.

15 Figure S3. A multifactorial life course approach to understanding the timing of reproductive 16 behaviour

17

11

1 2.3 Heterogeneity in heritability across birth cohorts 2 A recent study demonstrated that estimates from GWAS discoveries are substantially smaller across 3 populations compared to within populations.9 Simulations showed that the results reflected 4 heterogeneity in gene – environment interaction rather than genetic heterogeneity. In other words, 5 particularly for complex traits and diseases such as reproductive behaviour or others such as 6 educational attainment or BMI, it is more difficult to determine the influence of genetic versus 7 socio-environmental factors. That study demonstrated that although GWA studies combine data 8 from individuals across different time periods, it is implausible to assume that genetic effects are 9 uniform across time.

10 To test whether this was a concern with our current analysis, Figures S4A and B show how the SNP 11 heritability estimates change over time for AFB and AFS using the UK Biobank. Our SNP heritability 12 estimate refers to the proportion of the additive genetic variance explained by common SNPs across 13 the genome over the overall phenotypic variance ( ) of the trait: 10 = 14 ℎ 15 The phenotypic variance is the sum of additive genetic and environmental variance, i.e., + 16 , where is the additive genetic variance explained by all common SNPs across the genome and = 17 is the residual variance. The methods we applied have been detailed elsewhere. 10–14 Briefly, we 18 applied a linear mixed model 19

= + + 20 where y is an N×1 vector of dependent variables, N is the sample size, β is a vector for fixed effects of 21 the M covariates in N×M matrix X (including the intercept and potential confounders such as birth 22 year), g is the N×1 vector with each of its elements being the total genetic effect of all common SNPs 23 for an individual, and e is an N×1 vector of residuals. We have g~N(0, ) and e~N(0, ). Hence, 24 the variance matrix of the observed phenotypes is: 25 , 26 We used BOLT software15 as an efficient solution = for mixed-linear + models and controlled for the first 27 20 principal components as well as assessment centre, chip and birth year of the participants. We only 28 included individuals who self-reported as British individuals to reduce heterogeneity due to cultural 29 background.

30 Figure S4A illustrates a steady increase in SNP-heritability by birth cohort for AFB of women (N = 31 164,486) from just under 10% for those born in 1940, climbing to around 23% for the latest cohorts 32 born in 1965. We note, however, that these results should also be considered in relation to healthy 33 volunteer sample selection and thus ascertainment bias in the UK Biobank, with replication across 34 other large samples vital for future research. Importantly, the genetic correlation across birth cohorts

12

1 in not significantly different from 1. For individuals born before and after 1950, for example, rG = 0.93, 2 SE 0.04, the same genes appear to have gained in importance for AFB in women in the UK.

3

4 5 Figure S4A. SNP heritability for AFB, women in UK Biobank

6

7 8 Figure S4B. SNP heritability for AFS, women and men in UK Biobank

13

1 Figure S4B shows the SNP heritability of AFS for men (178,988) and women (210,363), respectively, 2 across birth cohorts. For men we observe a slight u-shaped pattern, with heritability ranging 3 between 23% to just under 14%. For women the trend is similar to AFB with a relatively steady 4 increase in heritability by birth cohort over time from around 13% in the oldest cohorts born in 1940 5 to around 16% for those born in 1965. Again, genetic correlations from before 1950 and after could 6 not be statistically distinguished from 1, both for women (rG = .96, SE 0.04) and men (0.98, SE 0.15). 7 This is in line with work that has shown changes in genetic associations with smoking over 8 time and across cohorts.16,17

9 3. Overview of GWAS meta-analysis 10 The discovery of genetic variants associated with AFB and AFS is based on cohort-level genome-wide 11 association studies that were quality-controlled and meta-analysed by two separate independent 12 centres at the University of Oxford and University of Cambridge. We followed standard QC protocol 13 and employed the software packages QCGWAS18 and EasyQC,19 which allowed us to harmonize the 14 files and identify possible sources of errors in association results. This procedure entailed that 15 diagnostic graphs and statistics were generated for each set of GWAS results (i.e., for each file). In 16 the case where apparent errors could not be amended by stringent QC, cohorts were excluded from 17 the meta-analysis.

18 3.1 Participating cohorts 19 A total of 36 cohorts participated in our study, with phenotype inclusion varying according to the 20 availability of the phenotype. Table S1 provides a description of the cohorts, including the type of 21 sampling, country, coverage of birth years of respondents, mean age and standard deviation and 22 scientific reference for more information on each study. Table S2 provides details on the specific 23 phenotype descriptions for each cohort. Table S3a-b provide the sample sizes of the adjusted pooled 24 analysis and by women and men in the case of family data. Cohorts who agreed to participate 25 followed an Analysis Plan posted on the Open Science Framework preregistration site 26 https://osf.io/b4r4b/ on February 08, 2017. Although AFB and number of children ever born (NEB) 27 were examined together in our previous research and considered in the additional analysis plan2, 28 due to the strong relationship between AFS and AFB, number of children ever born (NEB) and 29 childlessness (CL) were examined separately in another paper.

30 As Table S3a shows, for autosomal chromosomes the total number of individuals in the pooled 31 meta-analysis for AFB was 542,901, with a larger sample for women (418,758) than men (124,008). 32 Table S3a provides the detailed information on the descriptive information of the cohorts with X 33 Chromosome data, with aggregated numbers taken from Table S3a shown here as a summary. For 34 AFS, only data from the UK Biobank was used in the initial GWAS (leaving additional cohorts for out 35 of sample replication), with 397,338 individuals in total, 214,547 women and 182,791 men – both for 36 autosomal chromosomes and X chromosome. Note that the samples for women and men do not 37 directly equate to the pooled sample size since some cohorts are family-based and only participated 38 in the pooled analysis

14

1 Table S3a (excerpt). Summary Sample Sizes and Descriptives, age at first birth (AFB) and age at 2 first sexual intercourse (AFS)

Sample AFB AFS

Autosomal Chr X Chr Autosomal Chr X Chr Women 418,758 320,987 214,547 214,547 Men 124,008 110,463 182,791 182,791 Pooled 542,901 431,450 397,338 397,338 3

4 3.2 Sample inclusion criteria 5 Individuals were eligible for inclusion in analyses if they met the following conditions:

6 Age at first birth (AFB)

7 a. They were assessed for AFB and have given birth to a child (parous); both for females 8 and for males. 9 c. All relevant covariates (year of birth) are available for the individual; 10 d. They were successfully genotyped genome-wide (recommended individual genotyping 11 rate > 95%); 12 e. They passed the cohort-specific standard quality controls, e.g., excluding individuals 13 who are genetic outliers in the cohort. 14 f. They were of European ancestry. 15

16 Age at first sexual intercourse (AFS)

17 a. They were assessed for AFS and have had sexual intercourse; both for females and for 18 males. 19 c. All relevant covariates (year of birth, age) are available for the individual; 20 d. They were successfully genotyped genome-wide (recommended individual genotyping 21 rate > 95%); 22 e. They passed the cohort-specific standard quality controls, e.g., excluding individuals 23 who are genetic outliers in the cohort. 24 f. They were of European ancestry. 25

26 European ancestry samples were chosen in this discovery study due to the availability of 27 large samples20 and for no biological or substantive reason. We acknowledge that social 28 science research has shown large differences in the initiation of AFS and AFB by 29 socioeconomic differences and the socially constructed category of race and ethnicity.21,22 30 Socioeconomic differences are examined in this article, but the results in the current GWAS 31 are only applicable to European Ancestry groups and need further cross-ancestry discovery.

15

1 3.3 Genotyping and imputation 2 Table S4 provides an overview of the cohort-specific details on the genotyping platform, pre- 3 imputation quality control filters applied to the genotype data, imputation software used, the 4 reference used for imputation and the presence of X chromosome data. We asked cohorts to include 5 all autosomal SNPs imputed from the 1000G panel (at a minimum) to allow analyses across different 6 genotyping platforms. Cohorts with denser reference panels were asked to communicate this to our 7 team. Cohorts were asked to provide unfiltered results since filters on imputed markers and so forth 8 would be applied at the meta-analysis stage.

9 3.4 Models used to test for association 10 Analysts were asked to run linear regression on AFB and a transformed AFS variable. Since age at first 11 sexual intercourse tends to have a markedly non-normal distribution (see Fig S1), we asked the analyst 12 for a within-sex inverse rank normal transformation before running statistical models. Analysts were 13 asked to include birth year of the respondent (represented by birth year – 1900/10), its square and 14 cubic to control for non-linear birth cohort effects. For those with family-based data, we suggested 15 controlling for non-independence of family members or only include one family member in the 16 analyses. We furthermore asked studies with family data to run a pooled GWAS on both sexes. 17 Combined analyses that included both men and women also needed to include interactions of birth 18 year and its polynomials with sex. In general, we asked to include top principal components to control 19 for population stratification and cohort specific covariates if appropriate. Some cohorts only used birth 20 year and not its polynomials because of multi-collinearity issues/convergence of the GWA analysis. 21 Omission of these nonlinear birth year effects is unlikely to lead to biased inferences, since genotypes 22 are not usually considered to be truly associated with birth year. However, inferences might be less 23 accurate (i.e., have larger standard errors), since omission of nonlinear birth year effects can lead to 24 larger residual variation.

25 3.5 Analysis of X chromosome 26 The analysis on the X chromosome was completed using one of three approaches. First, XWAS 27 software (http://keinanlab.cb.bscb.cornell.edu/content/xwas) where we suggested using the --var- 28 het-weightcommand to generate results that are were possible to meta-analyse. Second, SNPtest, 29 using the -method newml while this assumes complete X-inactivation (i.e. that a male with an 30 allele is the same as a homozygous female) the effect estimates and SE approximate ½ of the 31 corresponding betas that are produced by the XWAS software. Third, the analysis could have been 32 performed using BOLT-LMM, performed as a separate analysis from the autosomal variants, but 33 which includes typed autosomal variants when fitting model parameters. This later method was 34 used for the data in UK Biobank so the AFS data was wholly derived using this method.

35 3.6 Quality Control (QC): filters & diagnostic checks 36 We followed the QC protocol of the GIANT consortium23 and employed an adapted version software 37 package QCGWAS18 which allows the inclusion of structural variants, in order to standardize files 38 across cohorts and EasyQC19 to conduct the Quality Control (QC) filtering variants and producing 39 diagnostic graphs and statistics as described below. Where errors could not be amended by 40 combining stringent QC with file-inspections, queries to cohorts and corrections, cohorts were

16

1 excluded from the meta-analysis. See also Tables S5 (AFB) and S6 (AFS) for QC results on autosomal 2 and Tables S7 (AFB) and S8 (AFS) for X chromosomes for AFB and AFS.

3 3.6.1 Filters 4 a) Missing data: We filtered variants where information on both reference and other allele were 5 missing, where the estimated effect, p-value, standard error, expected allele frequency or number of 6 observations were missing.

7 b) Implausible values: We filtered variants where p-values > 1 or < 0, standard errors = 0 or = 8 infinite, expected allele frequency > 1 or < 0, N < 0, call rate > 1 or < 0, an SE of the effect estimate 9 which is approximately 40% greater than the expected SE based on MAF and standard deviation and 10 for those with an >10% (see Winkler et al 24 for an the approximation for quantitative and 11 Rietveld et al.25 for quantitative and binary traits). 12 c) Quality thresholds: We filtered variants where expected allele frequency = 1 or = 0 (monomorphic 13 variants), N < 100 to guard against spurious associations due to overfitting of the model, minor allele 14 count <6 to guard against spurious associations with low frequency-SNPs and genotyped SNPs which 15 were not in Hardy-Weinberg Equilibrium (HWE), with significant thresholds of threshold of 10 in 16 case N < 1,000, 10 in case 1,000 ≤ N < 2,000, 10 in case 2,000≤N<10,000 and no filter in case 17 N>10,000, imputed markers with imputation quality < 40% and SNPs with a callrate < 95%, if 18 discrepancies between reported and expected p-value based on effect estimates and standard 19 errors are detected.

20 d) Data harmonization: We matched the cohort based summary statistics with a 1000 Genome 21 reference panel phase 1 version 3 reference panel provided by Winkler et al.19 EasyQC drops 22 mismatched of variants which cannot be solved straight away such as duplicates, allele mismatches 23 or missing or invalid alleles. Based on graphical inspections, we applied cohort specific filters to cut 24 drop variant with obvious deviations between expected allele frequency based on the reference 25 panel and observed allele frequency.

26 3.6.2 Diagnostic graphs 27 We produced three key diagnostic graphs for visual inspection by the two independent QC centres in 28 Oxford and Cambridge. If problems were detected which could not be resolved by more stringent 29 QC we had to remove the cohort from the analysis. The key diagnostic graphs depicted:

30 a) An allele frequency (AF) plot to identify errors in allele frequencies and strand 31 orientations using the 1000 Genome phase 1 version 3 reference panel provided by 32 Winkler et al.19 33 b) A PZ plot to assess the consistency of the reported p-values versus the Z score 34 calculated based on effect sizes and standard errors. 35 c) A PRS plot of predicted versus reported standard error as developed by Winkler et 36 al19 and implemented by Okbay et al.26

17

1 3.6.3 SNPs and cohorts excluded 2 a) Autosomal chromosomes

3 Overall, the quality of studies was good (for full results of the QC-filters described above see Table 4 S5 and S6 for autosomal SNPs). Two files needed to be excluded (INGI-Carlantino for men and 5 women – whilst pooled is in the meta-analysis) due to the filter on sample size. For autosomal 6 chromosomes and AFB, the remaining cohorts provided 61 files, 36 for women only, 18 for men only 7 and 7 pooled (for family data). Two studies did not provide imputation quality (KORA F3, N =1,066; 8 and KORA F4, N =1,111) and enter the meta-analysis with only 584,866 and 496,556 SNPs 9 respectively. For the NHS cohorts, results from our previous discovery2 based on HapMap reference 10 panels were reused with number of SNPs between 2,395,852 and 2,41,810. For all other cohorts, the 11 number of variants in the analysis range between 6,583,592 for women from LBC 1921 and 12 16,458,651 for women in Pelotas with an average of 9,770,818. For AFS, between 16,413,259 and 13 16,552,240 variants from the UKBiobank have entered analysis after QC.

14 b) X chromosome

15 For AFB, 13 cohorts provided information on the X chromosome. Overall, we received 23 files, 13 for 16 women, 8 for men and 2 for the pooled analysis in case there were relatives in the data. On average 17 275,023 variants survived QC with a minimum of 99,794 in women from WLS to 998,304 for the 18 women in the UKBiobank sample (see Table S3b for full descriptives). For AFS, the UKBiobank 19 provided results for between 977,536 and 990,735 variants on the X chromosome after QC between 20 (see Table S3b; S7-8 for SNP filtering).

21 3.7 Meta Analyses 22 Cohort association results (after applying the QC filters) were combined using sample-size weighted 23 meta-analysis, implemented in METAL.27 Sample-size weighting is based on Z-scores and can 24 account for different phenotypic measurements among cohorts. The two QC centres agreed in using 25 sample-size weighting to allow cohorts to introduce study-specific covariates in their cohort-level 26 analysis. Only SNPs that were observed in at least 50% of the participants for a given phenotype-sex 27 combination were passed to the meta-analysis. SNPs were considered genome-wide significant at P- 28 values smaller than 5×10-8 (α of 5%, Bonferroni-corrected for a million tests). The meta-analyses 29 were carried out by two independent analysts in different centres in Oxford and Cambridge. 30 Comparisons were made to ensure concordance of the identified signals between the two 31 independent analysts. The PLINK clumping function was used to identify the most significant SNPs in 32 associated regions (termed “lead SNPs”).

33 We performed meta-analysis on the pooled samples for AFS and AFB and then as a separate 34 analysis, separate meta-analyses by sex. The sex-specific results are discussed in more detail in 35 section 5. To understand the magnitude of the estimated effects, we used an approximation method 36 to compute unstandardized regression coefficients based on the Z-scores of METAL output obtained 37 by sample-size-weighted meta-analysis, allele frequency and phenotype standard deviation. Further 38 details of the approximation procedure are available in the Supplementary Information of Rietveld

18

1 et al..25 We performed conditional and joint multiple SNP analysis (COJO) to identify further 2 independent SNPs.

3

4 3.8 MTAG results 5 MTAG results28 calculated from GWA meta-analysis results of the following related phenotypes: age 6 at first birth, age at first sex, number of children ever born, childlessness, since they are all highly 7 correlated. Using summary statistics from the pooled GWAS of each of the traits, MTAG uses 8 bivariate score regression to account for unobserved sample overlap.

9

10 3.9 Summary of discovered loci and Manhattan plots 11 The Manhattan plots of the pooled hits, followed by separate plots for women and men can be 12 found in Supplementary Figures S5 for AFS and S6 for AFB by pooled (A), women (B) and men (C). A 13 summary of the number of loci discovered is shown below, divided by autosomal and X chromosome 14 and the pooled and sex-specific hits. The full list of association results are included in the 15 supplementary Tables for autosomal chromosomes (S9, AFB; S10 AFS)

16 Summary of loci discovered

Phenotype Autosomal chromosomes X chromosome Total Pooled Women Men Age at first sex (AFS) 271 2 8 0 281 Age at first birth (AFB) 84 1 0 4 89 Total 355 3 8 4 370 17

19

1 Figure S5. Manhattan plots, Age at first sex (AFS), Pooled (A), Women (B) and Men (C)

2 A

3

4

5

20

1 Figure S5. Manhattan plots, Age at first sex (AFS), Pooled (A), Women (B) and Men (C), continued

2 B

3

4 C

5

21

1 Figure S6. Manhattan plots, Age at first birth (AFB), Pooled (A), Women (B) and Men (C)

2 A

3

4

22

1 Figure S6. Manhattan plots, Age at first birth (AFB), Pooled (A), Women (B) and Men (C), continued

2 B

3

4 C

5

23

1 4. Polygenic score prediction

2 4.1 Calculation of polygenic scores 3 We calculated three sets of polygenic scores:

4 1) Pruning and Thresholding polygenic scores using PRSice29 Polygenic scores were calculated 5 using all SNPs in the sample and with the software default values for clumping (250kb 6 window; r2=.1). Weights were based on meta-analysis results excluding the specific cohort 7 from the calculation. 8 9 2) LDpred polygenic scores30 The LD reference was calculated from the same genotyped files 10 (AddHealth and UKHLS). We set the prior distribution for the causal fraction of SNPs equal to 11 one. LDpred weights were then calculated under the infinitesimal model. Initial weights 12 were based on meta-analysis results excluding the specific cohort from the calculation. 13 14 3) MTAG+ LDpred polygenic scores. This set of scores was calculated using the same 15 methodology in 2), but it is based on MTAG results28 calculated from GWA meta-analysis 16 results of the following related phenotypes: age at first birth, age at first sex, number of 17 children ever born, childlessness.

18 For both traits, we ran ordinary least-squares (OLS) regression models and report the incremental R2 19 as a measure of goodness-of-fit of the model. Confidence intervals are based on a 1,000 bootstrap 20 sample.

21 4.2 Out of sample prediction 22 To validate the performance of the polygenic score, we performed out-of-sample prediction scores 23 for AFB and AFS in two cohorts: the National Longitudinal Study of Adolescence to Adult Health (Add 24 Health),31 based in the US and the UK Household Longitudinal Study - Understanding Society 25 (UKHLS).32 For each out of sample calculation, we excluded the single respective cohort from the 26 GWA meta-analysis in order to obtain independent summary statistics that have been used as 27 weights in the calculation of the polygenic scores.33

28 The results of the polygenic score analyses are depicted in Figure S7 The proportion of variance 29 explained by polygenic scores constructed with the MTAG+LDpred method is:

30 Age at First Birth (AFB): 4.80% in the UKHLS and 2.54% in the Add Health

31 Age at First Sex (AFS): 5.79% in the Add Health

24

Explained Variance from polygenic scores

AFB AFS

0.06

Method 2

^ 0.04

R PRSice

a t l LDpred e

D MTAG+LDpred

0.02

0.00

AddHealth UKHLS AddHealth UKHLS Cohort 1

2 Figure S7. Variance explained from Polygenic scores for Age at First Birth and Age at First Sex using 3 PRSice, LDPred and MTAG+LDPred in out-of-sample cohorts

4 Notes: The National Longitudinal Study of Adolescence to Adult Health and the UK Household 5 Longitudinal Study

6 Running a linear regression predicting AFB (n=4,989) and AFS (N=9.058) in AddHealth, using the AFB 7 and AFS PGS respectively, residualized on the first 10 genomic PCs, we can also interpret the effect 8 in terms of time.

9 A 1 standard deviation (SD) change in the AFB PGS is associated with a change of: 0.48 years 10 (SE=0.05) in AFB or around 25 weeks, which is 6.3 months.

11 A 1 SD change in the AFB AFS is associated with a change of: 0.56 years (SE = 0.03) in AFS or around 12 29 weeks, which is 7.3 months.

13 4.3 Accounting for right-censoring and comparing top and bottom 5% PGS 14 A common limitation in the study of the genetic determinants of AFB and AFS is right censoring, 15 which is when an individual does not experience the event of first sex or birth by the time of the 16 interview.34 If right censoring is not accounted for, measurements are assessed only on those who 17 experience the event (i.e., first sex or birth of first child) before the interview date. Moreover, it 18 does not account for the proportion of respondents that remain childless. This problem is commonly

25

1 refereed in the statistical literature as “right censoring”, since the outcome is not observed for all 2 respondents, despite the fact that part of the respondent is still “at risk” of experiencing childbirth 3 or sexual debut. We performed additional analysis on the Add Health sample to account for this 4 statistical issue, which is particularly pertinent given the younger age of this sample. As noted in 5 Table S1 this cohort includes individuals born between 1974-1983 in the United States with a mean 6 age of 28.9 (SD 1.74).

7 Here we report the median age for AFB and AFS, an estimator less sensitive to censoring than the 8 mean.34 The median AFB for men in the pooled sample is 28.42 and 26.92 for women. The median 9 AFS for men and women is both age 17.1

10 To control for right-censored data, we first estimate nonparametric hazard functions based on 11 Nelson-Aalen estimates (Figures S8-S9). To compare individuals with different polygenic scores, we 12 plotted the estimated hazard of experiencing AFS and AFB for individuals at the top 5% of the 13 polygenic score (respectively AFB and AFS) with individuals in the bottom 5% score.

14 Our results on age at first birth (Figure S9) shows that respondents with higher score for AFB are 15 associated with a lower risk of childbearing at any age, with a slight increase after age 27. The results 16 for age at first sex show that individuals with high score for AFS (i.e., having sex at later age) are less 17 likely to have first sexual intercourse before age 19. Here sex differences appear to be very relevant. 18 Polygenic scores for age at first sex appear to be more relevant in explaining age at sexual debut 19 for women than men.

20

21

22 Figure S8. Nelson-Aalen hazard estimates of first sex by age. Comparison between top 5% and 23 bottom 5% PGS of age at first sex

24

25

1 AFS is measured in years in the Add Health Sample, while AFB is calculated in months

26

1

2 Figure S9. Nelson-Aalen hazard estimates of first birth by age. Comparison between top 5% and 3 bottom 5% PGS of age at first birth

4 In a second part of the analysis, we estimated semi-parametric Cox regression models35 in which we 5 calculate the effect of genetic score on the hazard of having sex or a child for the first time, 6 conditional on age. This class of models takes censoring into account and is widely used to study 7 fertility timing.34

8 The relative hazard ratio of the polygenic score for AFB is 0.85 for women and 0.91 for men. The 9 models show that an increase of one standard deviation in the PGS is associated with an increase of 10 15% in AFB for women and 10% for men. In other words, an increase in one SD of the AFB PGS 11 relates to a 15% and 10% later age at first birth for women and men respectively.

12 The relative hazard ratio of the polygenic score for AFS is 0.74 for both women and men. This is 13 equivalent to an increase of 26% in AFS associated with of one standard deviation in the PGS. In 14 other words, an increase in one SD of the AFS PGS equates to a 26% postponement of age at first 15 sexual intercourse.

16 5. Testing population stratification and environmentally mediated parental 17 genetic effects of childhood socioeconomic status 18 To test whether population stratification biased our results or lead to false positives, we used the LD 19 Score intercept method described in Bulik-Sullivan et al.36 We then examined the potential impact of 20 environmentally mediated parental genetic effects on our PGSs, by engaging in PGS prediction 21 across low, medium and high PGS percentiles by parent’s education, which is a proxy childhood 22 socioeconomic status.

23 5.1 Testing Population Stratification: LD Score intercept test 24 We used the LDSC software37 to estimate LD Score regression for each of the phenotypes using the 25 summary statistics from the meta-analyses based on all available data. For each phenotype, we used 26 the “eur_w_ld_chr” files of LD Scores computed by Finucane et al.38 and made available at 27 https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.

27

1 bz2. These LD Scores were computed with genotypes from the European-ancestry samples in the 2 using only HapMap3 SNPs. Only HapMap3 SNPs with MAF > 0.01 were included 3 in the LD Score regression.

4 We did not apply GC to the summary statistics we used to estimate the LD Score regression since 5 genomic control tends to produce a downward bias of the intercept of the LD score regression.

6 Below a summary shows that that the LD Score intercepts are significantly but not substantially 7 different from 1. We also see that the mean of 2 statistics for all the SNPs in the LD Score regressions 8 range from 1.6203 to 2.2252 for the phenotypes. Under the null hypothesis that there is no 9 confounding bias and that the SNPs have no causal effects on the phenotypes, the mean 2 statistics 2 10 would be 1, thus mean statistics greater than 1 indicate that some SNPs are associated with the 11 phenotypes. These estimates imply that approximately 5.5% and 5.6% of the observed inflation in the 12 mean 2 statistics for AFB and AFS, respectively, is accounted for by confounding bias (due to 13 population stratification or other confounds), rather than a polygenic signal. This is calculated from 14 the Standard Errors using the 95% Confidence Intervals. These estimates may also be inflated by 15 model misspecification or LD score mismatching.

16 Summary of LD score intercept, SE and mean 2statistics for AFB and AFS

Age at first birth (AFB) Age at first sex (AFS) LD score intercept 1.0341 (0.0089) 1.0687 (0.0132) Lambda GC 1.471 1.8419 Mean 2 1.6203 2.2252 Ratio 0.0549 (0.0143) 0.0561 (0.0107) 17

18 5.2 Polygenic score prediction by childhood socio-economic status 19 To explore the impact of environmentally mediated parental genetic effects on our PGSs, we 20 examined PGS prediction across low (0-10%), medium (50-60%) and high (90-100%) PGS percentiles 21 by parent’s education, which is a proxy childhood socioeconomic status, using AddHealth. This also 22 follows recent research that has demonstrated that PGSs differ in their predictive accuracy and 23 ability not only by ancestry, but also by factors such as the socioeconomic environment within 24 ancestry groups.39 Figures S10A-B show the PGS score prediction for AFS (A) and AFB (B) divided into 25 three percentile groups by parental level of education (college versus no college). Here we see 26 variations in prediction by PGS percentile and parental education. Figure S10A. shows that those 27 who are in the 90-100th PGS percentile for later AFB indeed postpone first childbirths particularly 28 past the ages of 27. This PGS is accentuated for those in the highest (90-100%) and moderate (50- 29 60%) percentile groups particularly for those who have high educated parents.

28

1

2 Figure S10A. AFB PGS score by percentile groups and parent’s educational level

3 Figure S10B shows that those in the highest AFS PGS percentile (90-100%), which would predict later 4 age at first sexual debut and have higher educated parents have both lower sexual initiation but also 5 are the group from age 18 and above who have a systematically later age at first sexual intercourse. 6 This suggests that even within these groups there is also considerable heterogeneity that might be 7 related to higher or lower behavioural disinhibition and externalizing behaviour regardless of the 8 parental environment.

9

10

11 Figure S10B. AFS PGS score by percentile groups and parent’s educational level

29

1 6. Genetic correlations with related traits 2 In order to understand the genetic relationships among and between our fertility phenotypes and 3 potentially related phenotypes, we calculated their SNP heritabilities and their genetic correlations 4 (Figure S11). The estimates represent the genetic correlation between the two traits using all 5 polygenic effects captured by the SNPs, based on the LD-score regression method developed by 6 Bulik-Sullivan et al.37 We used summary statistics and the 1000 Genomes reference set and 7 restricted the analysis to European populations. They estimate a SNP’s LD score, which measures the 8 amount of genetic variation tagged by a SNP. We also follow the common convention of restricting 9 our analyses to SNPs with MAF >0.01, thus ensuring that all analyses are performed using a set of 10 SNPs that are imputed with reasonable accuracy across all cohorts. The standard errors (SEs) are 11 produced by the LDSC python software package that uses a block jackknife over the SNPs.

12 6.1 Genetic correlation with 28 related traits 13 We estimated the genetic overlap or correlation between 28 different traits, first pooled by both 14 sexes and also divided by sex. Traits are divided into 6 different categories of:

15  reproductive (age at menarche,40 age at menopause,41 age at voice breaking,42 number of 16 children ever born (by sex),43 number of sexual partners,44 miscarriage or stillbirth,45 age 17 started oral contraceptives45, breast cancer46), 18  behavioural (years of education,47 cognitive performance,47 subjective well-being,48 risk 19 tolerance in adulthood44), 20  psychiatric disorders (attention deficit hyperactivity disorder (ADHD), 49,50 schizophrenia 21 (SCZ),51 bipolar disorder,52 major depressive disorder (MDD)53 and anorexia54), 22  substance use disorders (alcoholic drinks per week (DPW),55 age at initiation smoking (SI),55 23 cannabis use55, cigarettes per day (CPD)55, smoking cessation55), 24  personality (neuroticism,28 openness to experience,56 loneliness57), 25  anthropometric (height,58 waist to hip ratio for BMI (by sex),59 and BMI (by sex)60)

26 Note that we include the most recent GWAS results when they are openly shared and available. For 27 example, a 2018 study of intelligence61 was updated by later by another study of cognitive 28 performance,47 in which we use the most recent. From previous research we know that the onset of 29 reproductive behaviour phenotypes are correlated with reproductive, behavioural (e.g., educational 30 attainment), personality and anthropometric traits.2 Age at menarche, menopause and age at voice 31 breaking was included as a fundamental aspect of reproductive biology and breast cancer was 32 included because of evidence linking it to age at menarche.62 A negative phenotypic and genotypic 33 relationship with later AFB and lower NEB has been well-established in previous studies.6,63 Number 34 of sexual partners and age at starting oral contraception can be a reflection of biological 35 development, but also risk-taking behaviour. Previous research has also linked onset of 36 reproductive behaviour to achieving higher levels of education and cognitive performance, linking it 37 to later sexual debut64 and fertility postponement.63,65

38 Twin research in behaviour genetics has consistently shown a link between behavioural disinhibition 39 and externalizing behaviour with psychiatric disorders and substance use disorders,66 yet few have

30

1 linked it to early sexual and reproductive behaviour.67 Given our focus on the timing of events – 2 namely early sexual debut and teenage pregnancy – we also wanted to test whether there was an 3 underlying genetic propensity related to early reproductive behaviour onset with behavioural 4 disinhibition, externalizing behaviour and risk taking. We therefore extended the analysis to examine 5 psychiatric and substance use disorders and personality. Previous phenotypic studies have shown 6 that AFB is negatively correlated with neuroticism.68 Others have shown a U-shaped genetic 7 relationship between schizophrenia and AFB.69 Finally, previous work has also linked reproductive 8 success to anthropometric traits such as height, BMI and waist-hip ratio.70,71

9 6.2 Genetic correlations by sex 10 We examined whether patterns of genetic correlation varied across the sexes (Figure 3, Main Paper). 11 We also examined whether patterns of genetic correlation varied by birth cohort but found little to 12 no evidence for variation across cohorts (results available upon request). A summary Table 13 containing these results is in Table S11.

14 Reproductive traits. Age at menarche for girls and age at voice breaking for boys marks the start of 15 the reproductive career and adolescent development.62 Studies have shown that earlier menarche is 16 related to great sexual risk taking as an evolutionary reproductive strategy, particularly in harsh 17 family environments.72 Whereas variation in age at menarche is often more related to living 18 conditions and nutritional status, age at menopause appears to be mainly influenced by biological 19 factors and primarily the reproductive history of individuals.73 Numerous demographic studies have 20 shown that a later AFB is linked with a lower number of children due to voluntary desires for fewer 21 children and unintended childlessness due to fecundity and infertility problems.6 To capture 22 infertility and fertility problems, we also included miscarriage or stillbirth. A higher number of sexual 23 partners has been phenotypically correlated with a cluster of adolescent risk behaviours such as 24 unintended pregnancy and substance use.74 Finally, we included the age at the start of oral 25 contraception which is a unique marker that proxies biological development and risk aversion to 26 pregnancy.75

27 We find strong correlations across virtually all of the reproductive traits. There is a positive 28 correlation with later AFS and AFB with later age at menarche, menopause and voice breaking 29 (around 0.12 to 0.27). A later age at menarche has been previously associated with subfecundity, 30 diminished ovarian function and infertility.76 We also find a negative genetic correlation with 31 number of children ever born (NEB), considerably stronger for males (AFB males –0.87 ±0.09; AFB 32 females, –0.67 ±0.02) and also with miscarriage or stillbirth (AFB females, –0.51 ±0.06; AFS females, 33 –0.67 ±0.06). In other words, later AFB and AFS are genetically correlated with a lower NEB. We see 34 that the genetic propensity for later AFS and AFB are rather negatively correlated with ever 35 experiencing a miscarriage or stillbirth. There was also a striking positive correlation between the 36 age at starting oral contraceptives (AFB females, 0.76 ±0.03; AFS females, 0.88 ±0.06), suggesting 37 later sexual and reproductive onset was also linked with later age at using contraceptives, which 38 may serve as a marker for development. Number of sexual partners was negatively correlated with 39 later AFS/AFB and stronger for the related sexual behaviour trait of AFS (AFS males –0.57 ±0.02; AFS 40 females, –0.59 ±0.02), than AFB (AFB males –0.25 ±0.06; AFB females, –0.25 ±0.02).

31

1 Behavioural traits. Some of the strongest correlations are with behavioural traits, particularly 2 educational attainment for women, with a robust overlap of AFB (0.74 ±0.01) compared to AFS (0.53 3 ±0.01), also noted in previous studies.2,9 The strong relationship with AFB and education is not 4 surprising since phenotypically, higher educational attainment is associated with later AFB in most 5 advanced societies.6 Others previously found a relationship of initiation of sexual activity and fertility 6 with higher cognitive ability,65 with additional research required to separate whether these cognitive 7 scores are (very) likely confounded by socio-economic status and environment. Others have found a 8 relationship between higher cognitive ability and later age at first sexual intercourse.64 Here we also 9 see a negative genetic correlation between adult risk tolerance and later AFS/AFB (AFB females, – 10 0.25 ±0.03; AFB males, – 0.29 ±0.07; AFS females, –0.40 ±0.03; AFS males, –0.40 ±0.02) or other 11 words, those less genetically prone to risk are also less prone to early teenage sex and teenage 12 pregnancies. Conversely, if the variables were reverse coded it would mean that those more 13 genetically prone to risky behaviour in adulthood are also prone to earlier sexual debut and earlier 14 births.

15 Psychiatric disorders. One of strongest genetic associations in our analysis is with ADHD (AFB 16 females, –0.63 ±0.03; AFB males, – 0.68 ±0.09; AFS females, –0.58 ±0.03; AFS males, –0.61 ±0.03), 17 and to some extent also Major Depressive Disorder (MDD) (AFB females, –0.42 ±0.03; AFB males, – 18 0.33 ±0.08; AFS females, –0.37 ±0.03; AFS males, –0.32 ±0.03). ADHD has been phenotypically 19 related to elevated risky sexual behaviour, often comorbid with problematic substance use problems 20 such as alcohol use, smoking and cannibus.77 The mechanism is linked to higher levels of behavioural 21 disinhibition, externalizing and hyperactive and impulsive behaviours. Interestingly, the internalizing 22 psychiatric disorder of anorexia, is positively genetically correlated with postponement of early sex 23 and births. This disorder is often related to exaggerated cognitive control, rigid behaviour and 24 impaired ability to be flexible or impulsive.78 In other words, postponement of early sex and 25 childbirth appears to be related to the opposite spectrum of behavioural disinhibition and 26 externalizing of ADHD and early sex and teenage pregnancy.

27 Substance use. Addictive and substance use also had striking correlations, particularly with age at 28 onset of smoking (AFB females, 0.73 ±0.03; AFB males, 0.74 ±0.07; AFS females, 0.67 ±0.03; AFS 29 males, 0.68 ±0.03), which provides a rare genetic marker of a window into adolescence and timing of 30 early sexual behaviour and pregnancies. Related to this is a negative correlation of ever engaging in 31 cannabis use (AFB females, –0.25 ±0.05; AFB males, – 0.22 ±0.13; AFS females, –0.43 ±0.06; AFS 32 males, –0.43 ±0.06). Earlier smoking may capture an underlying propensity for a variety of 33 behavioural disinhibition and externalizing behaviours in adolescence. There is also an established 34 link between smoking with a longer time to conception and decreased fertility.79 Smoking has been 35 linked to problems with preimplantation, shrinking size and quality of oocytes and decreased sperm 36 motility in men.80,81 Another plausible mechanism is that an earlier age at smoking is linked to a 37 lower socioeconomic status, linked to multiple environmental risk factors and a higher co-morbidity 38 of related diseases.82 Smoking often serves as a strong marker for structural and resource 39 disadvantage.

40 Personality traits. For these traits, the most striking finding is the relationship with openness to 41 experience and later AFS/AFB, particularly with men. We also see a positive correlation with late AFS

32

1 and AFB and loneliness (AFB females, 0.40 ±0.03; AFB males, 0.24 ±0.07; AFS females, 0.37 ±0.03; 2 AFS males, 0.31 ±0.03) and to some extent a negative correlation with neuroticism in females (AFB 3 females, –0.21 ±0.05), which may be related to the ability to find a partner.83

4 Anthropometric traits. With the exception of BMI, we find no striking results in relation to 5 anthropometric variables. BMI is related to pubertal development but a very low and very high BMI 6 is found in delay the timing and number of children.84

7 We recognize that these are only correlations and it is likely there are also pleiotropic variants with 8 multiple biological effects or other factors such as some traits that are mediated by environmental 9 influences. For this reason we explore additional analyses in an attempt to understand the 10 underlying etiology and causal relationships between these traits.

11

12 Figure S11. Genetic correlations and SNP heritabilities between and among reproductive, behavioural, 13 psychiatric, substance use, personality and anthropometric traits

14 Note: Calculated by LD score regression, with SNP heritabilities along the diagonal.

33

1 7. Uncovering shared genetic etiology with Genomic SEM 2 In an attempt to understand the etiology the correlations described in the previous section, we used 3 the R package GenomicSEM85 to fit multivariate genetic regression models. GenomicSEM uses 4 structural equation modelling to decompose the genetic covariance matrix, calculated using 5 multivariate LD score regression, of a set of traits. The user specifies a model, the parameters of 6 which are estimated by minimizing the difference between the observed genetic covariance matrix 7 and the covariance matrix derived from the model. Formally, structural equation models subsume 8 many statistical methods and are quite flexible. One model that can be fit using GenomicSEM is the 9 multivariate genetic regression model. In this model, some trait C is regressed on traits A and B, 10 producing estimates of the genetic correlation of A with C, independent of B, and of B with C, 11 independent of A. We note that this model is equivalent to a simple mediation model, with C as the 12 dependent variable and either B or C as the mediator.

13 7.1 AFB and AFS regression educational attainment (EA) and trait X 14 We fit a series of such models in which AFB was regressed on EA and a trait X (Table S12A-SL and 15 Figure S12A). AFB was chosen as the dependent outcome become an individual’s first birth most 16 often occurs after they have completed their education. We also fit an analogous series of models in 17 which AFS was regressed on EA (Table S12A-L and Figure 12B), which showed similar patterns of 18 conditional association.

19 In each case, the conditional association of EA and AFB remained substantial, suggesting that the 20 genetic association of EA with AFB is largely independent of the genetic components of personality 21 (as measured by openness, neuroticism, and risk tolerance), BMI, loneliness, MDD, cognitive 22 performance, substance use, and sexual behaviour (as measured by number of sexual partners). 23 Additionally, the conditional association of cognitive performance with AFB was close to zero, 24 suggesting that cognitive performance does not influence AFB above and beyond its effect on EA. 25 These observations support the conclusion that the genetic correlation between EA and AFB is 26 mediated by environmental mechanisms—those who have high educational attainment have been 27 exposed to an environment that encourages later childbirth.

28 We note, however, that the conditional association of EA and AFB was smallest in the model of age 29 of initiation of smoking (AI). AI is partially genetically distinct from other aspects of cigarette 30 smoking and captures, in part, risk tolerance in adolescence, since most regular smokers initiate in 31 adolescence.86 AI, then, might capture an aspect of adolescent risky behaviour that our measure of 32 risk tolerance (taken in middle age) does not, explaining its apparent mediation of the relationship 33 between AFB, AFS, and EA.

34 In order to explore the genetic relationship between reproductive biology and our phenotypes of 35 interest, we obtained results from a GWAS of sex hormone levels (Figure S13) and fit sex-specific 36 genetic multivariable regression models for AFB (Tables S12E and F) and AFS (Tables E and F). We 37 observe no significant moderation of the association of these variables with EA. In summary, a wide 38 range of variables are unable to explain the association between AFB and AFS and EA.

34

1 We noted substantial genetic correlations between AFB and AFS (rg = 0.82, SE = 0.03) and between 2 AFS and EA (rg = 0.61, SE = 0.02), paralleling the correlation between EA and AFB. We fit a genetic 3 multivariable regression model in which EA was regressed on AFB and AFS and found a substantial 4 conditional standardized association of EA and AFB (beta = 0.70, SE = 0.05) but a small conditional 5 standardized association of EA and AFS (beta = 0.04, SE = 0.05), as expected since AFS occurs before 6 AFB.

7

8

9 Figure S12A. A path diagram showing the structure of the genetic multiple regression model fit to EA 10 and AFB

11

12 Figure S12B. A path diagram showing the structure of the genetic multiple regression model fit to 13 AFS and EA

35

1

2 Figure S13. A heat map showing the genetic correlations between and among the fertility 3 GWAS phenotypes, the sex hormone phenotypes, and other phenotypes related to 4 reproductive biology, as calculated by LD score regression.

5 Notes: BMI = Body mass index; VoiceBroke = Age voice broke; Menopause = Age at 6 menopause; Menarche = Age at menarche; FreeT = Free testosterone; SHBG = Sex 7 hormone-binding globulin; NEB = Number ever born

8

36

1 7.2 Reproductive biology and externalizing behaviour explanation of 2 variance 3 From our analyses, it emerged that the timing of the onset of reproductive behaviour appears to be 4 driven by both reproductive biology and externalizing behaviour.63 To test the potential amount of 5 variance that each explained we engaged in Exploratory Factor Analysis (EFA) and an additional 6 Genomic SEM.

7 First, we used exploratory factor analysis (EFA), which is a means of studying the relationships 8 between a set of variables. This method was used to examine whether the genetic signal of the 9 onset of reproductive behaviour originated from two genetically distinguishable subclusters of a 10 reproductive biology component and an externalizing behaviour component. To engage in a simple 11 test of this theory, we used proxies of these categories, namely, age at menarche and risk tolerance. 12 To test this theory we fit a two factor EFA model to the genetic covariance matrix of AFB, AFS, NEB, 13 risk tolerance, and age at menarche. The model accounted for 47% of the overall variance but only 14 22% of the variance attributed to risk tolerance and 4% of the variance to age at menarche, 15 indicating the genetic relationships between these variables are partially but not fully captured by a 16 two-factor model.

17 To test this further we then focussed on more robust and additional measures of reproductive 18 biology and externalizing behaviour and engaged in a sex-specific analysis of AFB for women. In 19 order to parse out the genetic influences on age at first birth, we fit a genomic structural equation 20 model (Genomic SEM) where AFB in women is regressed on age at menopause, age at menarche, 21 and a latent factor representing the common genetic tendency to externalizing behaviour (Figure 22 S14). The factor is measured by AFS in women, age at initiation of smoking, age first used oral 23 contraception, and ADHD, with the model scaled to unit variance for the latent factor. The fitted 24 model had a CFI equal to 0.95 and an SRMR equal to 0.09, suggesting a reasonable fit.

25 The standardized residual variance for AFB in the model is 0.12 (SE = 0.04), indicating that most of 26 AFB’s SNP heritability can be accounted for by age at menopause, age at menarche, and our 27 externalizing factor, with all three variables having a statistically significant independent effect, but 28 the externalizing factor showing the strongest association by far. We can conclude that the genetic 29 traits we include predict 88% of the genetic variance for women’s age at first birth and that 30 externalizing behaviour explains most of the common genetic variance of AFB in women, in the 31 contexts measured in our study. This is of course considering the standard caveats for LDSC genetic 32 variance and covariance estimates and that we include common variants only, and examine this 33 using information from selective European Ancestry populations.87,88 We note also that selection 34 bias, induced by the fact that AFB can only be measured among individuals with at least one live 35 birth, may have inflated this estimate.

36

37

1

2 Figure S14. A path diagram for a Genomic SEM model of the relative associations of an externalizing 3 latent factor, age at menopause, and age at menarche with age at first birth in women

4 Standardized parameter estimates are shown with standard errors in parentheses. AFS (F) = Age at 5 first sexual intercourse in women; AFB (F) = Age at first birth in women; AgeSmokeInit = Age of 6 smoking initiation; AgeOralContra = Age first used oral contraception; ADHD = Attention-deficit 7 hyperactivity disorder; AgeMenarch = Age at menarche; AgeMeno = Age at menopause

8

38

1

2 Figure S13. A heat map showing the genetic correlations between and among the fertility 3 GWAS phenotypes, the sex hormone phenotypes, and other phenotypes related to 4 reproductive biology, as calculated by LD score regression.

5 Notes: BMI = Body mass index; VoiceBroke = Age voice broke; Menopause = Age at 6 menopause; Menarche = Age at menarche; FreeT = Free testosterone; SHBG = Sex 7 hormone-binding globulin; NEB = Number ever born

39

8. Bi-directional MR of reproductive behaviour, teenage behavioural disinhibition and onset of later life disease

8.1 Background, methods and innovation Previous studies have suggested a link between reproductive behaviour and health outcomes.89–92 The previous analyses showed a considerable overlap between the genetic loci identified for reproductive behaviours and educational attainment. Using GenomicSEM we aimed to capture the unmeasured factor and etiology underlying teenage behavioural disinhibition and externalizing behaviour which we investigated using traits such as age at smoking initiation, personality traits and adult risk taking. It is plausible, however, that the casual pathways connecting these phenotypes are potentially bidirectional and that each of our measured phenotypes might offer distinct contributions. We then tested whether causal pathways linking these phenotypes are potentially bidirectional and whether our phenotypes might offer distinct contributions.

We identified 1000 Genomes proxies for our SNPs and used these in multivariate Mendelian Randomisation (MR) models. First, we modelled the interplay between AFB, AFS and EA (educational attainment)20 as well as risk taking (measured in adulthood)21 and age at smoking initiation (AI).22 In each case IVW23 and MR-EGGER24 methods were performed, with an additional round of IVW performed once a Steiger filter25 had been applied to remove SNPs that appears to show a primary association with the outcome rather than the exposure. Multivariate MR was use to try to dissect causal pathways.26

A second set of MR analyses focused on links to late life diseases, namely type 2 diabetes (T2D)27 and coronary artery disease (CAD)28, using the same methods. T2D and CAD were chosen since they are common diseases with a strong behavioural component. In particular, we use multivariate methods to test whether AFS or AFB had independent effects once the well-established links to length of educational attainment were controlled for. The model shows the effects of variants discovered for AFB and AFS and education on two key later life diseases, controlling for the alternative pathways represented by each phenotype. These analysis were also performed in a sex-specific manner using the available outcome data for men and women separately for type 2 diabetes.

8.2 Results MR The bidirectional analysis, suggested a complex causal web where each of the assessed phenotypes appear to have an important impacts on the others (Table S13A). The one exception was behavioural disinhibition during teenage years proxied by age at initiation of smoking (AI). This suggests that the specific timing of the onset of behavioural disinhibition may be crucial. Both AFS, and age at smoking initiation are phenotypes that represent behavioural disinhibition and externalizing behaviour in precisely in the window of adolescence and early adulthood. The links between both AFS and AFB with educational attainment strongly suggested a bidirectional association between earlier reproductive behaviour and shorter years in education. This may help to answer a persistent question in the demographic literature about the causality between reproductive behaviour and educational attainment.22,96

The associations with diseases in later life confirmed that there was a strong association of years in education with onset of disease later in life (Figure S15, Table 13B). However, in both cases the

40 association with education was substantially attenuated by the inclusion of the betas for the SNP effects on AFB. These results hold despite the relatively weaker estimates that use the SNPs discovered specifically for AFB (potentially due to comparatively small numbers of SNPs in the analysis). The attenuation was also seen in models adjusting for BMI, suggesting that there is a specific effect of reproductive timing that is a risk factor for type 2 diabetes in women, but not in men. This finding holds considerable importance since the majority of research has assumed that it is years of education (or similar socioeconomic proxy measures) that are the causal factors driving many diseases in later life. Our analyses show that once we control for both BMI and reproductive timing, the effect of education is considerably weaker, at least on type 2 diabetes.

Figure S15. Coefficients (and Cis) of bi-directional MR of human reproductive behaviour (AFB, AFS), age initiated smoking and educational attainment on Type 2 diabetes and Coronary Artery Disease later in life

9. Later age at first birth linked to longevity

9.1 Background and innovation Since AFB appears to be predictive of later life onset of disease, we extended the analysis to discover whether reproductive timing was related to longevity. Here we specifically test trade-offs between reproductive behaviour and senescence, which has been argued in the pace of aging literature.97 The disposable soma theory of the evolution hypothesizes that longevity demands investments in

41 somatic maintenance that in turn reduce the resources that are available for reproduction.98 Using historical data from the British aristocracy, previous research has shown that AFB was the highest for women who died at the oldest ages and the lowest in women who died early.99 Using a genealogical database from Utah (1860-1899), researchers applied Cox proportional hazard models to demonstrate that women who had children later and had fewer children enjoyed longer lives.100 Using contemporary data, another study in the U.S. likewise showed that the odds of longevity and survival to 90 years was significantly higher in women who had a later age at first childbirth.101

Using the PGS from our 2016 study of AFS, Mostafavi et al. also previously examined this question.102 We improve that previous analyses in several distinct ways. First, we use the entire UK Biobank sample, which is considerably larger. Second, the PGSs are calculated using a k-folds cross-validation procedure. Third, we embrace the nature of the data to consider right censoring (i.e., those alive at the time of observation) and estimate survival models.34 Fourth, we control for other related PGSs such as educational attainment and risky behaviour. Fifth, we stratify by the local authority district at birth, which has been shown to be important for life expectancy in the UK. Finally, we also control for parental fertility by adding the controls for the number of siblings.

9.2 Data and measurement In the UK Biobank, each individual was asked to provide the age of their mother and father at each visit and also the reported age at death of each parent if applicable. To conduct the survival analysis, we used the most recent assessment visit or average ages reported at recruitment and any repeated assessment visits. For parents who were still alive at the date of the last assessment (i.e., right censored),34 we included that date as the last observation. We also removed adopted individuals, respondents with non-European Ancestry or those who self-identify as Non-White and those who had missing values for some of the covariates (N=48,980), resulting in 474,946 European ancestry individuals with age at death information for their mother (446,419) and their father (438,125).

9.3 Methods of analysis We calculated PGSs for AFB, Educational attainment (EA)47 and risky behaviour44 from UK Biobank adopting the following procedure. We first split the sample in 10 random groups. We then iteratively estimated genome-wide association results for 9/10th of the sample and used these association results as weights for the calculation of polygenic scores in the remaining 1/10th of the sample. Polygenic scores are calculated using PRSice on a set of independent genotyped SNPs. We then estimated three sets of Cox Proportional hazard models to estimate the effect of the PGS of AFB on maternal and paternal age at death. All models controls for the first 10 Genetic Principal Components, sex and year of birth of the respondents and are stratified by Local Authority District at birth calculated using the geo-coordinated provided in the UK Biobank. This is due to the fact that there is considerable geographical variation in life expectancy at birth in England and Wales, which is largely attributed to differences in material deprivation.103 Model (1) and (4) in Table S14 are the baseline models. Models (2) and (5) include as covariates the polygenic scores of Educational Attainment and Risky behaviour (high value of the PGS correspond to higher risk adversity). Model (3) and (6) include number of siblings (as proxy for parental fertility) as covariates. These models restrict the analysis to mortality after age 60 to limit the possibility that early mortality affects parental fertility (collider bias).104

42

9.4 Results: Later reproductive timing predicts longevity Results indicate that 1 standard deviation in the PGS for AFB is associated to a reduction in mortality between 2-4% at any age, which holds across all model specifications. As respondents’ PGSs are only a proxy of parental genetic predisposition, these estimates are likely affected random measurement errors, leading to attenuation bias. Results are consistent with the work of Mostafavi et al.,102 who used a PGS derived from our previous 2016 study, but with limitations which our study improves upon, discussed previously. Overall, this analysis shows a common genetic basis of late fertility and longer lifespan and confirms that human reproductive timing and life histories involve a trade-off between longevity and reproduction.

Table S14. Polygenic score (PGS) prediction of age at first birth (AFB), educational attainment (EA) and risk on parental longevity

Maternal Age at Death Paternal Age at Death

Model (1) Model (2) Model (3) Model (4) Model (5) Model (6)

PGS AFB 0.976*** 0.966*** 0.961*** 0.979*** 0.971*** 0.969***

(0.00203) (0.00230) (0.00409) (0.00182) (0.00207) (0.00380)

PGS EA 0.969*** 0.968*** 0.973*** 0.974***

(0.00209) (0.00371) (0.00188) (0.00347)

PGS Risk 0.995** 0.985*** 0.997 0.992**

(0.00230) (0.00406) (0.00206) (0.00378)

Number of Siblings 1.053*** 1.051***

(0.00188) (0.00178)

Observations 398,448 398,448 132,553 391,108 391,108 120,725

Survival over age 60 NO NO YES NO NO YES

Note: PGS = Polygenic Score; AFB = age at first birth; EA=Educational attainment PGS.47 Relative Risk Ratios, exponentiated SEs in parentheses. All models control for first 10 Principal Components, Respondents’ Year of Birth, and Sex. All models are stratified by Local Authority District at Birth. *** p<0.01, ** p<0.05, * p<0.1

10. Gene prioritization

10.1 Methods We used multiple approaches to prioritize the most likely causal gene(s) at loci identified as being associated with AFS and/or AFB. First, DEPICT was used to perform pathway analyses, identify

43 enrichment for cell types and tissues, and prioritize candidate genes.105 DEPICT is agnostic to the outcomes analysed in the GWAS and employs predicted gene functions. For both AFS and AFB, all SNPs with P<1x10-5 in the pooled analysis were used as input. For both outcomes, DEPICT’s tissue enrichment analysis showed significant enrichment for tissues in the nervous system (Supp Tables S15A (AFS) S15B (AFB)). DEPICT’s integrated gene prioritization approach yielded 94 genes for AFS and 14 genes for AFB at FDR<0.05 (Supp Tables S15C (AFS), S15D (AFB)). Based on the results of the tissue enrichment analysis, we next used DEPICT to identify nervous system cell types that are enriched for expression of genes in loci reaching P<1x10-5 in the GWAS, using RNAseq data from mouse brain.106 This yielded 41 enriched cell types for AFS, and 14 for AFB (Supp Tables S15E (AFS), S15F (AFB)). A similar approach using tabula muris RNAseq data107 helped prioritize another nine central nervous system and pancreatic cell types for AFS (Supp Table S15F). For enriched cell types from mouse brain and tabula muris, the top-10 contributing genes were selected as candidate genes. This resulted in the prioritization of 296 genes for AFS and 95 for AFB based on mouse brain; and 97 genes for AFS based on tabula muris data.

Secondly, we used Phenolyzer (v1.1), to prioritize candidate genes by integrating prior knowledge and phenotype information.108 Here we used the regions defined by DEPICT v1.1 (see above), reflecting loci reaching P<1x10-5 in first instance. Phenolyzer takes free text input and interprets these as disease names by using a word cloud to identify synonyms. It then queries precompiled databases for the disease names to find and score relevant seed genes. The seed genes are subsequently expanded to include related (predicted) genes based on several types of relationships, e.g., protein-protein interactions, transcriptional regulation and biological pathways. Phenolyzer uses machine learning techniques on seed genes and predicted gene rankings to produce an integrated score for each gene. We used search terms capturing three broad areas, i.e. (in)fertility, congenital neurological disorders and psychological traits, based on results from pathway, tissue and cell type enrichment analyses (Supp tables S16A-B). Phenolyzer identified 107 and 47 candidate genes with a score >0.3 for AFS and AFB, respectively. Results for the top candidate genes identified by Phenolyzer can be found in Box 1 below.

Box 1 – Literature/text mining using Phenolyzer v1.1

We used Phenolyzer v1.1108 to identify genes that may be involved in age at first sex and age at first birth using search terms related to psychological traits, infertility and neurological disorders (Supp Table S16A). A total of 107 and 47 candidate genes were identified for AFS and AFB, respectively. Reassuringly some well-known genes related to fertility were prioritized.

The top six genes prioritized for AFS were: FGFR1: ESR1; GATA4; LEPR; CYP17A1 and CGA.

FGFR1 (Fibroblast growth factor receptor 1) and FGFR1 with depression. Mutations in FGFR1 have been associated with Kallmann syndrome, a heterogeneous that associates variable gonadotropin-releasing hormone (GnRH) deficiency with anosmia and, sometimes, other non- reproductive clinical features109 and that is characterised by decreased testosterone, azoospermia and infertility (ORPHANET:478). The gene has also been associated with lower fertility (HP:0000144); non-obstructive azoospermia (HP:0011961); and lower serum testosterone (HP:0040171).

ESR1 (Estrogen receptor 1) is related to multiple fertility traits, including endometriosis,110 age at menarche111, male infertility112 and our previous study on age at first birth.63 ESR1 has been

44 associated with alcoholism,113 psychosis neuroticism114 and substance-induced schizophrenia (umls:C0033941).

GATA4 (GATA Binding Protein 4) was identified through human phenotype ontology (HPO) term and gene associations. The gene has been associated with testicular abnormalities (OMIM:615542). Through HPO to gene associations it has been associated with abnormal spermatogenesis, decreased serum testosterone and abnormal circulating follicle-stimulating hormone (FSH) levels.

LEPR (Leptin receptor) has been associated with infertility, delayed puberty, decreased serum testosterone levels, abnormal serum estradiol through HPO phenotype to genotype associations and DisGeNET(disgenet.org). The gene has also been associated with impairment in personality functioning (HPO phenotype gene association).

CYP17A1 (Cytochrome P450 Family 17 Subfamily A Member 1) is a key enzyme in the steroidogenic pathway that produces progestins, mineralocorticoids, glucocorticoids, androgens and estrogens.115,116 Mutations in this gene have been associated with endometriosis,117 recurrent pregnancy loss,118 age at menarche,119 and serum estrogen and progesterone levels.116

CGA (Glycoprotein Hormones, Alpha Polypeptide) has been associated with ectopic pregnancy and is known to interact with FSHB (Follicle Stimulating Hormone Subunit Beta),120 LHB (Luteinizing Hormone Beta Polypeptide),121 and LHCGR (Luteinizing Hormone/Choriogonadotropin Receptor).122

The top 6 genes prioritized for AFB were: FSHB; ESR1; GNAI2; RHOA; HDAC3; and CDC42.

FSHB (Follicle Stimulating Hormone Subunit Beta) has been associated with FSH deficiency (OMIM:229070), polycystic ovarian syndrome123 and through HPO phenotype to gene and DisGeNET with female and male infertility and oligospermia.

GNAI2 (Guanine Nucleotide Binding Protein (G Protein), Alpha Inhibiting Activity Polypeptide 2) was prioritized for psychological disorders and interacts with other genes associated with cannabis dependence (Cannabinoid Receptor 1),124 but also with genes related to fertility, like LHB.125 The gene has also been associated with schizophrenia through DisGeNET.

The RHOA (Ras Homolog Family Member A) locus has been associated with general cognitive ability.126

HDAC3 (Histone Deacetylase 3) has been shown to interact with genes related to psychological disorders like JUN,127,128 and genes related to male fertility (ARID4A).129,130

Thirdly, we used in silico sequencing to identify non-synonymous variants with an R2 for LD>0.7 with the lead SNPs in AFS and AFB-associated loci.131 This yielded 24 genes for AFS and 16 for AFB that may drive the GWAS associations through direct effects on protein function (Supp Table S17A-B).

Fourthly, we used Summary data-based Mendelian Randomization (SMR) and heterogeneity in dependent instruments (HEIDI)132 using eQTL data from brain133 and whole blood.134 This -6 approach provided 39 and 73 genes that showed evidence (PSMR<5x10 ) of mediating the association between AFS and GWAS identified loci based on results from brain and blood, respectively, compared with 15 and 29 genes for AFB (Supp Table S18A (AFS), S18B (AFB)).

45

Finally, we integrated findings across all approaches and retained genes in loci that reached genome- wide significance, and that were located within 1M bp of a GWAS lead SNP. This resulted in the prioritization of 314 genes in 153 loci for AFS, and 106 genes in 37 loci for AFB (Supp Tables S19A (AFS), S19B (AFB)), or 386 unique genes across the two traits (Supp Table S19C). We next used data from the Human Protein Atlas135 to identify genes amongst these 386 genes that are expressed at a low, medium or high protein level in brain, glands, and/or reproductive organs at a ‘supported’ or ‘enhanced’ degree of reliability. For the 99 genes that fulfilled these criteria, we mapped the brain, glandular and reproductive cell types in which they are highly expressed at the protein level;136 used a text-mining approach to extract functions from entries in Entrez, GeneCards and Uniprot; and identified phenotypes in mutant mice from the Mouse Genome Informatics (MGI) database137 (Figure 4, Main Text).

10.2 Results Gene prioritization using multiple approaches (see Methods) highlighted genes acting in three broad tissue types: brain, glands and reproductive organs. Some of these have known effects on traits related to cognitive ability, addiction, psychiatric traits and fertility (see below). These results partly mirror and compliment the rigorous post-GWAS in silico association analyses we performed for loci identified for age at first sex and age at first birth.

10.2.1 Candidate genes in brain Twenty-four of the 99 prioritized genes are both highly expressed in central nervous system cell types at the protein level,136 and yield a nervous system or neurological phenotype in mutant mice.138 Within these 24 genes, STRING databases139 highlighted experimentally determined protein- protein interactions of HDAC3 with GTF2I, TOP2B, E2F1 and MEF2C (Supp Fig S16). All five genes are highly expressed at the protein level in neuronal cells of the cerebral cortex, and all but MEF2C are highly expressed in Purkinje cells and the molecular layer of the cerebellum. Histone deacetylase 3 (HDAC3) is essential for Purkinje cell function,140 embryonic brain development and neuro- differentiation,141 long-term memory formation,142 and blood brain barrier permeability;143 loss of general transcription factor IIi (GTF2I) induces increased sociability and anxiety in mice;144,145 DNA topoisomerase II beta (TOP2B) is required for transcription of a set of long neuronal genes in cerebellar granule neurons,146 as well as for development and survival of post-mitotic neurons in the retina;147 E2F transcription factor 1 (E2F1) contributes to Purkinje cell degeneration,148 modulates neuronal apoptosis,149,150 and functions as a cell cycle suppressor in mature neurons;151 and myocyte enhancer factor 2C (MEF2C) plays a key role in cortical network activity by regulating inhibitory vs. excitatory synaptic transmission.152

NCAM1 and NFASC also interact at the protein level (Supp Fig S16). Both are highly expressed at the protein level in neuropils of the cerebral cortex, and both play a role in development of the nervous system.153 Neural cell adhesion molecule 1 (Ncam1) null mice are prone to risk seeking behavior,154 and plasma NCAM1 levels were negatively associated with social motivation, communication, and responsiveness in children with autism spectrum disorders.155 Neurofascin (NFASC) plays a key role in formation of the nodes of Ranvier and function of myelinated axons;156 mutations in this gene cause severe neurodevelopmental disorders.157

46

10.2.2 Candidate genes in glands Twelve of the 99 prioritized genes are highly expressed at the protein level in glands,136 and additionally yield an endocrine or exocrine phenotype in mutant mice.138 Of these, experimentally determined protein-protein interactions were observed for SUMO1 with RECQL4 and PML (Supp Fig S16), which are highly expressed at the protein level in glandular cells of the adrenal (SUMO1 and RECQL4), parathyroid (SUMO1) and/or thyroid glands (RECQL4 and PML). Small ubiquitin-related modifier 1 (SUMO1) binds target proteins as part of a post-translational modification system - i.e. sumoylation - and plays a role in nuclear transport, transcriptional regulation, apoptosis and protein stability. RecQ like helicase 4 (RECQL4) plays an essential role in DNA replication during development and is required for viability and fertility,158 while promyelocytic leukemia (PML) is recruited to phosphorylated testis receptor 2 and sumoylated, which in turn suppresses cell proliferation.159

A protein-protein interaction was also observed between CGA and FSHB (Supp Fig S16), which are both highly expressed in anterior pituitary gland cells. CGA encodes the alpha subunit of follicle stimulating hormone (FSH) – as well as three other glycoprotein hormones – and plays a role in ectopic pregnancies through interaction with Forkhead Box L2 (FOXL2).160 FSHB encodes the beta subunit of FSH, which stimulates the growth of ovarian follicles in women, and acts on the Sertoli cells of the testis to stimulate sperm production in men. Mutations in FSHB have been reported in infertile men,161,162 and in women with isolated FSH deficiency and hypogonadism.163

10.2.3 Candidate genes in female reproductive organs Nine of the 99 prioritized genes are highly expressed at the protein level in female reproductive organs136 and additionally show a reproductive phenotype in mutant mice.138 Of these nine genes, ESR1 interacts at the protein level with SUMO1, ARNT, CAV1 and E2F1 (Supp Fig S16). Small ubiquitin-like modifier 1 (SUMO1) sumoylates estrogen receptor (ER)alpha (ESR1) in the presence of estrogen, which is a requirement for normal ERalpha-induced transcription.164 SUMO-1 and ERalpha are both highly expressed at the protein level in glandular cells of the fallopian tube and endometrium. SUMO-1 also co-localizes with Forkhead Box L2 (FOXL2) in stromal and glandular cells of the endometrium. Forkhead Box L2 has been implicated in the pathogenesis of endometriosis165 and ectopic pregnancies.166

Knockdown of Aryl hydrocarbon receptor nuclear translocator (ARNT) has been shown to suppress key angiogenic genes - including VEGFA, possibly through FSH167 - leading to deficient angiogenesis in placental vasculature; malformed thin villous vessels; elevated fetoplacental vascular resistance; and high morbidity and mortality in fetal growth restriction.168

Caveolin 1 (CAV1) promotes human trophoblast cell proliferation, migration and invasion by activating the focal adhesion kinase signaling pathway. Appropriate differentiation and invasion of trophoblast cells is required for normal implantation and placental development, and caveolin 1 gene expression was lower in placenta of unexplained spontaneous abortions than in placenta from induced abortions.169

47

Figure S16. Protein-protein interactions identified using STRING for genes that are highly expressed at the protein level in: A) brain and result in a nervous system or neurological behavior phenotype in mutant mice; B) glands and result in an endocrine/exocrine phenotype in mutant mice; C-D) female (C) or male (D) reproductive organs and result in a reproductive phenotype in mutant mice. Pink lines highlight experimentally determined interactions.

In A, HDAC3, TOP2B, GTF2I and E2F1 are all highly expressed at the protein level in Purkinje and molecular layer cells in the cerebellum and neuronal cells in the cerebral cortex; HDAC3 and MEF2C are both highly expressed in neuronal cells of the cerebral cortex; and NCAM1 and NFASC are both highly expressed in neuropils of the cerebral cortex. In B, SUMO1 and RECQL4 are both highly expressed in glandular cells of the adrenal gland, SUMO1 is highly expressed in glandular cells of the parathyroid gland; and RECQL4 and PML are both highly expressed in glandular cells of the thyroid gland. CGA and FSHB are both highly expressed in cells in the anterior pituitary gland. In C, SUMO1 and ESR1 are both highly expressed in glandular cells of the endometrium and fallopian tube; ARNT and ESR1 both are highly expressed in endometrial stroma cells and glandular cells of the fallopian tube; E2F1 and ESR1 are both highly expressed in squamous epithelial cells of the vagina; and CAV1 and ESR1 are both highly expressed in endometrial stroma cells. In D, GTF2I and HDAC3 are both highly expressed in epididymis glandular cells, and POR and CYP17A1 are both highly expressed in Leydig cells of the testis.

48

E2F transcription factor 1 (E2F1) is one of 11 genes in the PI3K/AKT pathway with a lower expression in cumulus cells from oocytes that went on to produce a pregnancy vs. those that did not. This differential expression was concluded to likely be driven by a downregulation of ERalpha.170 In separate efforts, E2F1 was identified as one of six transcription factors that likely induce pregnancy- induced pancreatic islet expansion;171 and was shown to activate ribonucleotide reductase 2 - an important effector of progesterone signalling - to induce cell proliferation and decidualization in mouse uterus.172 10.2.4 Candidate genes in male reproductive organs Of the 11 genes that are highly expressed at the protein level in male reproductive tissues136 and additionally show a reproductive phenotype in mutant mice,138 a protein-protein interaction was observed of HDAC3 with GTF2I and E2F1 (Supp Fig S16). While the mechanisms by which HDAC3 with GTF2I influence reproductive behavior in male reproductive tissues remain to be established, loss of E2F transcription factor 1 (E2F1) has been shown to induce severe and progressive testicular atrophy and less spermatogonia apoptosis during the first wave of spermatogenesis in young mice, and resulted in further exacerbation of testicular atrophy due to loss of spermatocytes in adult mice by loss of spermatogonia stem cells.173

A second protein-protein interaction was observed for CYP17A1 and POR (Supp Fig S16), which are both highly expressed at the protein level in the testis’ Leydig cells. Cytochrome P450 family 17 subfamily A member 1 (CYP17A1) is a key enzyme in the steroidogenic pathway that produces progestins, mineralcorticoids, glucocorticoids, androgens and estrogens,174 while cytochrome p450 oxidoreductase (POR) is required for normal steroidogenesis.175

Seven of the 97 prioritized genes were highly expressed at the protein level in spermatogonia, preleptotene spermatocytes, pachytene spermatocytes, round or early spermatids, and/or elongated or late spermatids (Figure 4, Main Text).136 Of these, Krueppel-like factor 17 (KLF17) encodes a germ cell-specific transcription factor that in mice plays important roles in spermatid differentiation and oocyte development;176,177 and zona pellucida binding protein (ZPBP) participates in sperm morphogenesis and binding between acrosome-reacted sperm and the egg-specific extracellular matrix (the zona pellucida.178,179 Furthermore, epigenetic, allele-specific activation of the testis-specific gene nucleoporin 210 like (NUP210L) – possibly by changing binding affinity for testis receptor 2 - was recently observed in prefrontal cortex neurons of G allele carriers (but not CC carriers) in rs114697636 (MAF 3%).180 Rs114697636 is in linkage disequilibrium with the locus’ lead SNP for AFS (rs113142203, D’ 0.90). Furthermore, the DNA methylation state of NUP210L has recently been linked with psychologic development disorders,181 and common variants near NUP210L have been identified in GWAS for intelligence and mathematical ability,47,182 providing an elegant example of how a testis-specific gene that is highly expressed at the protein level in developing and mature sperm can influence the brain in some individuals. The roles of ELAVL2, LRRC37A2, C1orf56 and C20orf144 in reproductive behavior through male reproductive organs remain to be established.

11. Sex-specific genetic effects Sex-specific genetic effects have been proposed and found previously for reproductive behaviour.2,183,184 Sex-specific effects in these behavioural phenotypes are likely driven by biological,

49 behavioural and social normative and cultural factors. First, there are sex differences in the biological makeup, processes and diseases that are related to sexual and fertility behaviour and related diseases.185,186 A later age at first birth has been related to infertility and reproduction related traits in women, such as ovulatory problems, tubal damage, endometriosis, cervical cancer and polycystic ovarian syndrome.184 Fecundability is also influenced by sex-specific hormonal processes,187 as well as by a behavioural component, e.g. through educational attainment, personality, risk, or impulsivity.188 In turn, these traits have differential effects on male and female reproduction.189,190 In addition to the pooled GWAS, we also ran sex-specific GWAS meta-analyses for both phenotypes. In doing so, we detected two genome-wide significant (p-value<5x10-8) loci for AFS in women, eight for AFS in men and one for AFB in women. Gene prioritization in sex-specific loci resulted in the prioritization of 11 genes for AFB in women, one gene for AFS in women and 23 genes for AFS in men. Of these, 12 genes at three loci were expressed at the protein level in relevant tissues (Figure S18).

11.1 Genetic overlap among sexes: LD score bivariate regression We used LD score bivariate regression191 to estimate the genetic correlation between men and women based on the sex-specific summary statistics from the meta-analysis results. Figure S17 shows the genetic correlations across the traits by sex. Considering the high correlations, these results indicate a large genetic overlap among the sexes, particularly for AFB (0.95). We see, however, that the genetic overlap between men and women is still high, but lower for AFS at rg=0.79.

Figure S17. Genetic overlap amongst the sexes for AFS and AFB, LD score bivariate regression

50

Figure S18. Gene prioritization of AFS and AFB by sex

51

11.2 Sex specific loci 11.2.1 Methods and identification of 10 additional associations for AFS and 1 for AFB Considering the lower sex-specific correlation particularly with AFS, we opted to examine sex- specific loci in more detail. A total of 242 unique variants were associated with either AFS or AFB in the sex stratified analyses. In order to determine if there was evidence for sex-specific effects, we compared the allelic effects for these SNPs between men and women and derived a p-value for heterogeneity.192 Based on a multiple testing correction for the number of variants (0.05/242=2x10- 4), we identified for: (1) AFS, 2 sex specific associations in women and 8 in men; and, (2) AFB, 1 sex specific association in women only. We selected a region of 2Mb around these lead SNPs to identify the genes that may be represented by these lead SNPs. We then conducted gene prioritization as we did for the main AFB and AFS analyses.

11.2.2 Gene Prioritization Results AFS AFS: Women. There were two genes prioritized for AFS in women: FANCL and ERBB4. FANCL is a member of the Fanconi anemia complementation group. Variants in this locus have also been associated with smoking, autism and schizophrenia.44,193 Orphanet (84) and DISGENET (umls:C0015625) links the gene to fanconi anemia and delayed puberty. ERBB4 has been linked to abnormal fear/anxiety-related behavior (HP:0100852 (HPO_PHENOTYPE_GENE)), and schizophrenia194 as well as PCOS.195

AFS: Men. We prioritised genes related to mental development and psychological disorders. NRXN1 deletions in this gene have been associated with schizophrenia and autism;196 SLC44A1 is a choline transporter. Choline is key in cerebral inhibition. Abnormal inhibition has been implicated in a number of neuropsychiatric illnesses such as substance abuse and depressive disorders;197 NR1H3 (LXRalpha) may have links to major depressive disorder;198 and H2FAX has been linked to Ataxia- telangiectasia neurogenerative disorder and Nijmegen Breakage Syndrome (NBS) microcephaly and short stature199 and higher chance of cancer.

There were other genes related to haematological factors that were prioritised for AFS in men: F2 and HMBS. F2 has been related to miscarriage and heavy menstrual bleeding (OMIM:614390).200 Neither of which can be related to men. SLC24A5 is another gene that was prioritised and is a major locus for skin pigmentation and is known to be under selection.201

11.2.3 Gene Prioritization Results AFB AFB: Women. Recall that one sex-specific loci was found for women. In the region of the lead variant there are many genes prioritised related to immunity and specifically, cheomokine receptors. CCR1 protein and expression is increased in women with endometriosis.202 CCR5 is decreased in swim-up sperm in infertile men and that may be associated with male infertility.203 CXCR6 is the receptor for CXCL16. Disruption in this pathway in the endometrium has been observed in women who experience spontaneous abortion.204

52

12 Contributions and Acknowledgments

12.1 Author and Cohort Contributions MCM and FRD designed and led the study. MCM wrote the paper and supplementary note with contributions by authors for respective analyses and comments on draft by all main authors. DMB conducted phenotypic changes, phenotype preparation, LD Score and genetic correlations, Genomic SEM and exploratory factor analysis and sex-specific effects. NB conducted GWAS meta-analysis, MTAG, PGS prediction, survival models, and Cox models of longevity. FCT and FRD conducted the cohort QC. FCT conducted GREML cohort heritability analysis and phenotype preparation in UKBB. FRD ran Mendelian Randomization, conducted GWAS analyses and JRBP conducted COJO and X- Chromosome analysis. NvZ conducted DEPICT and Phenolyzer analyses. AV and HS conducted in silico sequencing and SMR analyses. TP conducted cell type enrichment analyses. MdH integrated gene prioritization results and performed downstream analyses, e.g. Human Protein Atlas; Entrez, GeneCards and Uniprot mining; and STRING Protein-Protein interaction analyses. Authors in the Human Reproductive Behaviour Consortium provided data and cohort analyses. The eQTLGen Consortium provided data for additional analyses. All authors critically reviewed and approved the final version of the paper.

Author list

Melinda C. Mills1,2,†,*, Felix C. Tropf1,2,3,4,†, David M. Brazel1,2,†, Natalie van Zuydam5, Ahmad Vaez6,7, eQTLGen Consortium, BIOS Consortium, Tune H. Pers8,9, Harold Snieder6, John R.B. Perry10, Ken K. Ong10,†, Marcel den Hoed5,†, Nicola Barban11,†, and Felix R. Day10,†,* on behalf of the Human Reproductive Behaviour Consortium

1 Leverhulme Centre for Demographic Science, University of Oxford, Oxford, United Kingdom 2 Nuffield College, University of Oxford, Oxford, United Kingdom 3 École Nationale de la Statistique et de L’administration Économique (ENSAE), Paris, France 4 Center for Research in Economics and Statistics (CREST), Paris, France 5 The Beijer Laboratory and Department of Immunology, Genetics and Pathology, Uppsala University and SciLifeLab, Uppsala, Sweden 6 Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands 7 Department of Bioinformatics, Isfahan University of Medical Sciences, Isfahan, Iran 8 The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark 9 Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark 10 MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom 11 Institute of Social and Economic Research, University of Essex, Essex, United Kingdom

† Denotes equal contribution * Correspondence to Melinda C. Mills, [email protected], and Felix R. Day, [email protected]

53

Human Reproductive Behaviour Consortium - Author information

Evelina T. Akimova1, Sven Bergmann2,3,4, Jason D. Boardman5, Dorret I. Boomsma6, Marco Brumat7, Julie E. Buring8,9, David Cesarini10,11,12, Daniel I. Chasman8,9, Jorge E. Chavarro13,14,15, Massimiliano Cocca16, Maria Pina Concas16, George Davey-Smith17, Gail Davies18, Ian J. Deary18, Tõnu Esko19,20, Oscar Franco21, Audrey J. Gaskins14,15,22, Eco J.C. de Geus6, Christian Gieger23, Giorgia Girotto7,16, Hans Jörgen Grabe24, Erica P. Gunderson25, Kathleen Mullan Harris26, Fernando P. Hartwig17,27, Chunyan He28,29, Diana van Heemst30, W. David Hill18, Georg Homuth31, Bernando Lessa Horta27, Jouke Jan Hottenga6, Hongyang Huang13, Elina Hyppӧnen32,33, M. Arfan Ikram21, Rick Jansen34, Magnus Johannesson35, Zoha Kamali36, Maryam Kavousi21, Peter Kraft13,37, Brigitte Kühnel23, Claudia Langenberg38, Lifelines Cohort Study39,40, Penelope A. Lind41, Jian’an Luan38, Reedik Mägi19, Patrik K.E. Magnusson42, Anubha Mahajan43,44, Nicholas G. Martin45, Hamdi Mbarek6,46, Mark I. McCarthy43,44, George McMahon47, Matthew B. McQueen48, Sarah E. Medland41, Thomas Meitinger49, Andres Metspalu19,50, Evelin Mihailov19, Lili Milani19, Stacey A. Missmer13,51,52, Stine Møllegaard53, Dennis O. Mook-Kanamori54,55, Anna Morgan16, Peter J. van der Most39, Renée de Mutsert54, Matthias Nauck56, Ilja M. Nolte39, Raymond Noordam30, Brenda W.J.H. Penninx57, Annette Peters58, Chris Power59, Paul Redmond18, Janet W. Rich-Edwards13,15,60, Paul M. Ridker8,9, Cornelius A. Rietveld61,62, Susan M. Ring17, Lynda M. Rose8, Rico Rueedi2,3, Kári Stefánsson63, Doris Stöckl58, Konstantin Strauch64,65,66, Morris A. Swertz40, Alexander Teumer67, Gudmar Thorleifsson63, Unnur Thorsteinsdottir63, A. Roy Thurik61,62,68, Nicholas J. Timpson17, Constance Turman13, André G. Uitterlinden61,69, Melanie Waldenberger23,58, Nicholas J. Wareham38, Gonneke Willemsen6, and Jing Hau Zhao38

Author list ordered alphabetically.

1 Leverhulme Centre for Demographic Science, Department of Sociology, St. Antony’s College, University of Oxford, Oxford, United Kingdom 2 Department of Computational Biology, University of Lausanne, Lausanne, Switzerland 3 Swiss Institute of Bioinformatics, Lausanne, Switzerland 4 Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa 5 Department of Sociology and Institute of Behavioral Science, University of Colorado at Boulder, Boulder, CO, United States of America 6 Department of Biological Psychology, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands 7 Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy 8 Brigham and Women’s Hospital, Boston, MA, United States of America 9 Harvard Medical School, Boston, MA, United States of America 10 Department of Economics, New York University, New York, NY, United States of America 11 Research Institute for Industrial Economics, Stockholm, Sweden 12 National Bureau of Economic Research, Cambridge, MA, United States of America 13 Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America 14 Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America 15 Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America 16 Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste, Italy 17 MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom 18 Lothian Birth Cohorts, Department of Psychology, University of Edinburgh, Edinburgh, United Kingdom 19 Estonian Genome Center, University of Tartu, Tartu, Estonia

54

20 Broad Institute of the Massachusetts Institute of Technology and Harvard University, Cambridge, MA, United States of America 21 Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands 22 Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States of America 23 Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany 24 Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany 25 Division of Research, Kaiser Permanente Northern California, Oakland, CA, United States of America 26 Department of Sociology, Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America 27 Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas, Brazil 28 University of Kentucky Markey Cancer Center, Lexington, KY, United States of America 29 Department of Internal Medicine, Division of Medical Oncology, University of Kentucky College of Medicine, Lexington, KY, United States of America 30 Department of Internal Medicine, Section of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, The Netherlands 31 Interfaculty Institute for Genetics and Functional , University of Greifswald, Greifswald, Germany 32 Australian Centre for Precision Health, University of South Australia Cancer Research Institute, Adelaide, Australia 33 South Australian Health and Medical Research Institute, Adelaide, Australia 34 Department of Psychiatry, Amsterdam Public Health and Amsterdam Neuroscience, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands 35 Department of Economics, Stockholm School of Economics, Stockholm, Sweden 36 Department of Bioinformatics, Isfahan University of Medical Sciences, Isfahan, Iran 37 Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America 38 MRC Epidemiology Unit, Institute of Metabolic Science, Cambridge Biomedical Campus, University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom 39 Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands 40 Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands 41 Psychiatric Genetics, QIMR Berghofer Medical Research Institute, Herston Brisbane, Queensland, Australia 42 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden 43 Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom 44 Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom 45 , QIMR Berghofer Medical Research Institute, Herston Brisbane, Queensland, Australia 46 Qatar Genome Programme, Qatar Foundation, Doha, Qatar 47 School of Social and Community Medicine University of Bristol, Bristol, United Kingdom 48 Department of Integrative Physiology, University of Colorado at Boulder, Boulder, CO, United States of America 49 Institute of Human Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany 50 Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia 51 Division of Adolescent and Young Adult Medicine, Department of Medicine, Boston Children’s Hospital and Harvard Medical School, Boston, MA, United States of America

55

52 Department of Obstetrics, Gynecology, and Reproductive Biology, College of Human Medicine, Michigan State University, Grand Rapids, MI, United States of America 53 Department of Sociology, University of Copenhagen, Copenhagen, Denmark 54 Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands 55 Department of Public Health and Primary Care, Leiden University Medical Center, Leiden, The Netherlands 56 Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany 57 Department of Psychiatry, EMGO Institute for Health and Care Research and Neuroscience Campus Amsterdam, VU University Medical Center/GGZ inGeest, Amsterdam, The Netherlands 58 Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany 59 Population, Policy and Practice Research and Teaching Department, UCL Great Ormond Street Institute of Child Health, London, United Kingdom 60 Division of Women’s Health, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States of America 61 Erasmus University Rotterdam Institute for Behavior and Biology, Rotterdam, The Netherlands 62 Department of Applied Economics, Erasmus School of Economics, Rotterdam, The Netherlands 63 deCODE Genetics/Amgen Inc., Reykjavik, Iceland 64 Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany 65 Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany 66 Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Germany 67 Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany 68 Montpellier Business School, Montpellier, France 69 Department of Internal Medicine, Erasmus University Medical Center, Rotterdam, The Netherlands

eQTLGen Consortium – Author information

Mawussé Agbessi1, Habibul Ahsan2, Isabel Alves1, Anand Kumar Andiappan3, Wibowo Arindrarto4, Philip Awadalla1, Alexis Battle5,6, Frank Beutner7, Marc Jan Bonder8,9, Dorret I. Boomsma10, Mark W. Christiansen11, Annique Claringbould8,12, Patrick Deelen8,13,12,14, Tõnu Esko15, Marie-Julie Favé1, Lude Franke8,12, Timothy Frayling16, Sina A. Gharib11,17, Greg Gibson18, Bastiaan T. Heijmans4, Gibran Hemani19, Rick Jansen20, Mika Kähönen21, Anette Kalnapenkis15, Silva Kasela15, Johannes Kettunen22, Yungil Kim23,5, Holger Kirsten24, Peter Kovacs25, Knut Krohn26, Jaanika Kronberg15, Viktorija Kukushkina15, Zoltan Kutalik27, Bernett Lee3, Terho Lehtimäki28, Markus Loeffler24, Urko M. Marigorta18,29,30, Hailang Mei31, Lili Milani15, Grant W. Montgomery32, Martina Müller-Nurasyid33,34,35, Matthias Nauck36,37, Michel G. Nivard38, Brenda Penninx20, Markus Perola39, Natalia Pervjakova15, Brandon L. Pierce2, Joseph Powell40, Holger Prokisch41,42, Bruce M. Psaty11,43, Olli T. Raitakari44, Samuli Ripatti45, Olaf Rotzschke3, Sina Rüeger27, Ashis Saha5, Markus Scholz24, Katharina Schramm46,34, Ilkka Seppälä28, Eline P. Slagboom4, Coen D.A. Stehouwer47, Michael Stumvoll48, Patrick Sullivan49, Peter A.C. ‘t Hoen50, Alexander Teumer51, Joachim Thiery52, Lin Tong2, Anke Tönjes48, Jenny van Dongen10, Maarten van Iterson4, Joyce van Meurs53, Jan H. Veldink54, Joost Verlouw53, Peter M. Visscher32, Uwe Völker55, Urmo Võsa8,15, Harm-Jan Westra8,12, Cisca Wijmenga8, Hanieh Yaghootkar16,56,57, Jian Yang32,58, Biao Zeng18, Futao Zhang32

Author list is ordered alphabetically.

1. Computational Biology, Ontario Institute for Cancer Research, Toronto, Canada 2. Department of Public Health Sciences, University of Chicago, Chicago, United States of America

56

3. Singapore Immunology Network, Agency for Science, Technology and Research, Singapore, Singapore 4. Leiden University Medical Center, Leiden, The Netherlands 5. Department of Computer Science, Johns Hopkins University, Baltimore, United States of America 6. Departments of Biomedical Engineering, Johns Hopkins University, Baltimore, United States of America 7. Heart Center Leipzig, Universität Leipzig, Leipzig, Germany 8. Department of Genetics, University Medical Centre Groningen, Groningen, The Netherlands 9. European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany 10. Netherlands Twin Register, Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam Public Health research institute and Amsterdam Neuroscience, the Netherlands 11. Cardiovascular Health Research Unit, University of Washington, Seattle, United States of America 12. Oncode Institute 13. Genomics Coordination Center, University Medical Centre Groningen, Groningen, The Netherlands 14. Department of Genetics, University Medical Centre Utrecht, P.O. Box 85500, 3508 GA, Utrecht, The Netherlands 15. Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu 51010, Estonia 16. Genetics of Complex Traits, University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, United Kingdom 17. Department of Medicine, University of Washington, Seattle, United States of America 18. School of Biological Sciences, Georgia Tech, Atlanta, United States of America 19. MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom 20. Amsterdam UMC, Vrije Universiteit, Department of Psychiatry, Amsterdam Public Health research institute and Amsterdam Neuroscience, The Netherlands 21. Department of Clinical Physiology, Tampere University Hospital and Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland 22. University of Helsinki, Helsinki, Finland 23. Genetics and Genomic Science Department, Icahn School of Medicine at Mount Sinai, New York, United States of America 24. Institut für Medizinische InformatiK, Statistik und Epidemiologie, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany 25. IFB Adiposity Diseases, Universität Leipzig, Leipzig, Germany 26. Interdisciplinary Center for Clinical Research, Faculty of Medicine, Universität Leipzig, Leipzig, Germany 27. Lausanne University Hospital, Lausanne, Switzerland 28. Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland 29. Integrative Genomics Lab, CIC bioGUNE, Bizkaia Science and Technology Park, Derio, Bizkaia, Basque Country, Spain 30. IKERBASQUE, Basque Foundation for Science, Bilbao, Spain 31. Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands 32. Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia 33. Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany 34. Department of Medicine I, University Hospital Munich, Ludwig Maximilian’s University, München, Germany 35. DZHK (German Centre for Cardiovascular Research), partner site Munich Heart Alliance, Munich, Germany 36. Institute of Clinical Chemistry and Laboratory Medicine, Greifswald University Hospital, Greifswald, Germany

57

37. German Center for Cardiovascular Research (partner site Greifswald), Greifswald, Germany 38. Department of Biological Psychology, Faculty of Behaviour and Movement Sciences, VU, Amsterdam, The Netherlands 39. National Institute for Health and Welfare, University of Helsinki, Helsinki, Finland 40. Garvan Institute of Medical Research, Garvan-Weizmann Centre for Cellular Genomics, Sydney, Australia 41. Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Germany 42. Institute of Human Genetics, Technical University Munich, Munich, Germany 43. Kaiser Permanente Washington Health Research Institute, Seattle, WA, United States of America 44. Centre for Population Health Research, Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital and University of Turku, Turku, Finland 45. Statistical and Translational Genetics, University of Helsinki, Helsinki, Finland 46. Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany 47. Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, The Netherlands 48. Department of Medicine, Universität Leipzig, Leipzig, Germany 49. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden 50. Center for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center Nijmegen, Nijmegen, The Netherlands 51. Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany 52. Institute for Laboratory Medicine, LIFE – Leipzig Research Center for Civilization Diseases, Universität Leipzig, Leipzig, Germany 53. Department of Internal Medicine, Erasmus Medical Centre, Rotterdam, The Netherlands 54. UMC Utrecht Brain Center, University Medical Center Utrecht, Department of Neurology, Utrecht University, Utrecht, The Netherlands 55. Interfaculty Institute for Genetics and , University Medicine Greifswald, Greifswald, Germany 56. School of Life Sciences, College of Liberal Arts and Science, University of Westminster, 115 New Cavendish Street, London, United Kingdom 57. Division of Medical Sciences, Department of Health Sciences, Luleå University of Technology, Luleå, Sweden 58. Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang 325027, China

BIOS Consortium (Biobank-based Integrative Study) – Author information

Management Team Bastiaan T. Heijmans (chair)1, Peter A.C. ’t Hoen2, Joyce van Meurs3, Aaron Isaacs4, Rick Jansen5, Lude Franke6.

Cohort collection Dorret I. Boomsma7, René Pool7, Jenny van Dongen7, Jouke J. Hottenga7 (Netherlands Twin Register); Marleen MJ van Greevenbroek8, Coen D.A. Stehouwer8, Carla J.H. van der Kallen8, Casper G. Schalkwijk8 (Cohort study on Diabetes and Atherosclerosis Maastricht); Cisca Wijmenga6, Lude Franke6, Sasha Zhernakova6, Ettje F. Tigchelaar6 (LifeLines Deep); P. Eline Slagboom1, Marian Beekman1, Joris Deelen1, Diana van Heemst9 (Leiden Longevity Study); Jan H. Veldink10, Leonard H. van den Berg10 (Prospective ALS Study Netherlands); Cornelia M. van Duijn4, Bert A. Hofman11, Aaron Isaacs4, André G. Uitterlinden3 (Rotterdam Study).

Data Generation Joyce van Meurs (Chair)3, P. Mila Jhamai3, Michael Verbiest3, H. Eka D. Suchiman1, Marijn Verkerk3, Ruud van der Breggen1, Jeroen van Rooij3, Nico Lakenberg1.

58

Data management and computational infrastructure Hailiang Mei (Chair)12, Maarten van Iterson1, Michiel van Galen2, Jan Bot13, Dasha V. Zhernakova6, Rick Jansen5, Peter van ’t Hof12, Patrick Deelen6, Irene Nooren13, Peter A.C. ’t Hoen2, Bastiaan T. Heijmans1, Matthijs Moed1.

Data Analysis Group Lude Franke (Co-Chair)6, Martijn Vermaat2, Dasha V. Zhernakova6, René Luijk1, Marc Jan Bonder6, Maarten van Iterson1, Patrick Deelen6, Freerk van Dijk14, Michiel van Galen2, Wibowo Arindrarto12, Szymon M. Kielbasa15, Morris A. Swertz14, Erik. W van Zwet15, Rick Jansen5, Peter-Bram ’t Hoen (Co-Chair)2, Bastiaan T. Heijmans (Co-Chair)1.

1. Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands 2. Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands 3. Department of Internal Medicine, ErasmusMC, Rotterdam, The Netherlands 4. Department of Genetic Epidemiology, ErasmusMC, Rotterdam, The Netherlands 5. Department of Psychiatry, VU University Medical Center, Neuroscience Campus Amsterdam, Amsterdam, The Netherlands 6. Department of Genetics, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands 7. Department of Biological Psychology, VU University Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, The Netherlands 8. Department of Internal Medicine and School for Cardiovascular Diseases (CARIM), Maastricht University Medical Center, Maastricht, The Netherlands 9. Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, The Netherlands 10. Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands 11. Department of Epidemiology, ErasmusMC, Rotterdam, The Netherlands 12. Sequence Analysis Support Core, Leiden University Medical Center, Leiden, The Netherlands 13. SURFsara, Amsterdam, The Netherlands 14. Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands 15. Medical Statistics Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands

12.2 Acknowledgements

Acknowledgements

We thank Evelina T. Akimova and Stine Møllegaard for administrative work in the organization of the cohort information and author list. The research leading to these results has received funding from PI M.C.Mills from the European Research Council (ERC) Consolidator Grant SOCIOGENOME (615603, www.sociogenome.org), ERC Advanced Grant CHRONO (835079), Economic & Social Research Council (ESRC) UK, National Centre for Research Methods (NCRM) grant SOCGEN (ES/N011856/1), Wellcome Trust ISSF and a large Centre grant from the Leverhulme Trust for the Leverhulme Centre for Demographic Science.

1958BC-T1DGC and 1958BC-WTCCC2

This work made use of data and samples generated by the 1958 Birth Cohort (NCDS), which is managed by the Centre for Longitudinal Studies at the UCL Institute of Education, funded by the

59

Economic and Social Research Council (grant number ES/M001660/1). Data governance was provided by the METADAC data access committee, funded by ESRC, Wellcome, and MRC. (2015- 2018: Grant Number MR/N01104X/1 2018-2020: Grant Number ES/S008349/1). Access to these resources was enabled via the Wellcome Trust & MRC: 58FORWARDS grant [108439/Z/15/Z] (The 1958 Birth Cohort: Fostering new Opportunities for Research via Wider Access to Reliable Data and Samples). Before 2015 biomedical resources were maintained under the Wellcome Trust and Medical Research Council 58READIE Project (grant numbers WT095219MA and G1001799). Genotyping was undertaken as part of the Wellcome Trust Case-Control Consortium (WTCCC) under Wellcome Trust award 076113, and a full list of the investigators who contributed to the generation of the data is available at www.wtccc.org.uk. This research used resources provided by the Type 1 Diabetes Genetics Consortium, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases, National Human Genome Research Institute, National Institute of Child Health and Human Development, and Juvenile Diabetes Research Foundation International (JDRF) and supported by U01 DK062418. The 1958 birth cohort data can be accessed via the UK Data Service (http://ukdataservice.ac.uk/). Funding: Niddk. U01-DK105535; Wellcome: 090532, 098381, 106130, 203141, 212259; MMcC was a Wellcome Investigator and an NIHR Senior Investigator.

1982 Pelotas Birth Cohort Study

The 1982 Pelotas Birth Cohort Study is conducted by the Postgraduate Program in Epidemiology at Universidade Federal de Pelotas with the collaboration of the Brazilian Public Health Association (ABRASCO). From 2004 to 2013, the Wellcome Trust supported the study. The International Development Research Center, World Health Organization, Overseas Development Administration, European Union, National Support Program for Centers of Excellence (PRONEX), the Brazilian National Research Council (CNPq), and the Brazilian Ministry of Health supported previous phases of the study. Genotyping of 1982 Pelotas Birth Cohort Study participants was supported by the Department of Science and Technology (DECIT, Ministry of Health) and National Fund for Scientific and Technological Development (FNDCT, Ministry of Science and Technology), Funding of Studies and Projects (FINEP, Ministry of Science and Technology, Brazil), Coordination of Improvement of Higher Education Personnel (CAPES, Ministry of Education, Brazil).

AddHealth: The National Longitudinal Study of Adolescent to Adult Health

The National Longitudinal Study of Adolescent to Adult Health (Add Health) is supported by grant P01 HD031921 to Kathleen Mullan Harris from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with cooperative funding from 23 other federal agencies and foundations. Add Health GWAS data were funded by NICHD grants to Harris (R01 HD073342) and to Harris, Boardman, and McQueen (R01 HD060726). Add Health gratefully acknowledges the assistance of Yun Li, Qing Duan, Heather Highland and Christy Avery who conducted quality control of the genotype data. For information about access to the data from this study, contact [email protected].

ALSPAC: Avon Longitudinal Study of Parent and Children

We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-

60 acknowledgements.pdf). GDS works in the Medical Research Council Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/1). coLaus: Cohorte Lausannoise

The CoLaus study was and is supported by research grants from GlaxoSmithKline (GSK), the Faculty of Biology and Medicine of Lausanne, and the Swiss National Science Foundation (grants 3200B0- 105993, 3200B0- 118308, 33CSCO-122661, and 33CS30-139468). We thank all participants, involved physicians and study nurses to the CoLaus cohort. deCODE

We thank the study subjects for their valuable participation. All deCODE collaborators in this study are employees of deCODE Genetics/Amgen, Inc. External researchers who wish to obtain access to data may contact Gudmar Thorleifsson [email protected].

EGCUT: Estonian Genome Center, University of Tartu

EGCUT received funding from the Estonian Research Council Grant IUT20-60 and PUT1660, EU H2020 grant 692145, and European Union through the European Regional Development Fund (Project No. 2014-2020.4.01.15-0012) GENTRANSMED. For more information, please contact Tõnu Esko ([email protected]).

EPIC Norfolk: The European Prospective Investigation in Cancer and Nutrition Norfolk study

The authors would like to acknowledge the contribution of the staff and participants of the EPIC- Norfolk Study. EPIC-Norfolk is supported by the Medical Research Council (programme grants G0401527, G1000143) and Cancer Research UK (programme grant C864/A8257). This work was supported by the Medical Research Council (Unit Programme numbers MC_UU_12015/1 and MC_UU_12015/2). For inquiries about access to this data, please contact Ken Ong (Ken.Ong@mrc- epid.cam.ac.uk).

INGI-FVG: Friuli Venezia Giulia Genetic Park

We would like to thank the people of the Friuli Venezia Giulia Region for the everlasting support. The research was supported by Italian Ministry of Health - RC 35/17.

InterAct-GWAS and InterAct-Exome

We thank all EPIC participants and staff for their contribution to the study. We thank Nicola Kerrison (MRC Epidemiology Unit, Cambridge) for managing the data for the InterAct Project. Funding for the InterAct project was provided by the EU FP6 programme (grant number LSHM_CT_2006_037197).

KORA F3 and F4

The KORA study was initiated and financed by the Helmholtz Zentrum München – German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. Furthermore, KORA research was supported within the Munich Center of Health Sciences (MC-Health), Ludwig-Maximilians-Universität, as part of LMUinnovativ. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank all the study participants, all members of staff of the Institute of Epidemiology II and the field staff in Augsburg who planned and conducted the study.

61

LBC1921 and LBC1936: The Lothian Birth Cohort

We thank the cohort participants and team members who contributed to these studies. Phenotype collection in the Lothian Birth Cohort 1921 was supported by the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), The Royal Society, and The Chief Scientist Office of the Scottish Government. Phenotype collection in the Lothian Birth Cohort 1936 was supported by Age UK (The Disconnected Mind project). Genotyping of the cohorts was funded by the BBSRC. The work was undertaken by The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross council Lifelong Health and Wellbeing Initiative (MR/K026992/1). Funding from the BBSRC and Medical Research Council (MRC) is gratefully acknowledged. WDH is supported from a grant from Age UK (The Disconnected Mind Project).

Lifelines Cohort Study

We wish to acknowledge the services of the Lifelines Cohort Study, the contributing research centers delivering data to Lifelines, and all the study participants. The Lifelines Cohort Study, and generation and management of GWAS genotype data for the Lifelines Cohort Study is supported by the Netherlands Organization of Scientific Research NWO (grant 175.010.2007.006), the Economic Structure Enhancing Fund (FES) of the Dutch government, the Ministry of Economic Affairs, the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the Northern Netherlands Collaboration of Provinces (SNN), the Province of Groningen, University Medical Center Groningen, the University of Groningen, Dutch Kidney Foundation and Dutch Diabetes Research Foundation. We thank Behrooz Alizadeh, Annemieke Boesjes, Marcel Bruinenberg, Noortje Festen, Pim van der Harst, Ilja Nolte, Lude Franke, Mitra Valimohammadi for their help in creating the GWAS database, and Rob Bieringa, Joost Keers, René Oostergo, Rosalie Visser, Judith Vonk for their work related to data-collection and validation. The authors are grateful to the study participants, the staff from the LifeLines Cohort Study and the contributing research centers delivering data to LifeLines and the participating general practitioners and pharmacists.

NEO: Netherlands Epidemiology of Obesity

The authors of the NEO study thank all individuals who participated in the Netherlands Epidemiology in Obesity study, all participating general practitioners for inviting eligible participants and all research nurses for collection of the data. We thank the NEO study group, Pat van Beelen, Petra Noordijk and Ingeborg de Jonge for the coordination, lab and data management of the NEO study. The genotyping in the NEO study was supported by the Centre National de Génotypage (Paris, France), headed by Jean-Francois Deleuze. The NEO study is supported by the participating Departments, the Division and the Board of Directors of the Leiden University Medical Center, and by the Leiden University, Research Profile Area Vascular and Regenerative Medicine. Dennis Mook- Kanamori is supported by Dutch Science Organization (ZonMW-VENI Grant 916.14.023).

NESDA: The Netherlands Study of Depression and Anxiety

Funding was obtained from the Netherlands Organization for Scientific Research (Geestkracht program grant 10-000-1002); the Center for Medical (CSMB, NOW Genomics), Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-NL), VU University’s Institutes for Health and Care Research (EMGO+) and Neuroscience Campus Amsterdam, University Medical Center Groningen, Leiden University Medical Center, National Institutes of Health (NIH, R01D0042157-01A, MH081802, Grand Opportunity grants 1RC2 MH089951 and 1RC2 MH089995). Part of the genotyping and analyses were funded by the Genetic Association Information Network (GAIN) of the Foundation for the National Institutes of Health. Computing was supported by BiG Grid, the Dutch e-Science Grid, which is financially supported by NWO. We would like to thank the

62

Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high performance computing cluster.

NHS: The Nurses’ Health Study

Supported by grants UM1 CA186107, UM1 CA167552, DK091718, HL071981, HL073168, CA87969, CA49449, CA055075, HL34594, HL088521, U01HG004399, DK080140, 5P30DK46200, U54CA155626, DK58845, U01HG004728-02, EY015473, DK70756 and DK46200 from the National Institutes of Health, with additional support for genotyping from Merck Research Laboratories, North Wales, PA.

NTR: Netherlands Twin Register

Funding was obtained from the Netherlands Organization for Scientific Research (NWO) and The Netherlands Organisation for Health Research and Development (ZonMW) grants 904-61-090, 985- 10-002, 912-10-020, 904-61-193,480-04-004, 463-06-001, 451-04-034, 400-05-717, Addiction- 31160008, 016-115-035, 481-08-011, 400-07-080, 056-32-010, Middelgroot-911-09-032, OCW_NWO Gravity program –024.001.003, NWO-Groot 480-15-001/674, Center for Medical Systems Biology (CSMB, NWO Genomics), NBIC/BioAssist/RK(2008.024), Biobanking and Biomolecular Resources Research Infrastructure (BBMRI –NL, 184.021.007 and 184.033.111), X- Omics 184-034-019; Spinozapremie (NWO- 56-464-14192), KNAW Academy Professor Award (PAH/6635) and University Research Fellow grant (URF) to DIB; Amsterdam Public Health research institute (former EMGO+) , Neuroscience Amsterdam research institute (former NCA) ; the European Community's Fifth and Seventh Framework Program (FP5- LIFE QUALITY-CT-2002-2006, FP7- HEALTH-F4-2007-2013, grant 01254: GenomEUtwin, grant 01413: ENGAGE and grant 602768: ACTION); the European Research Council (ERC Starting 284167, ERC Consolidator 771057, ERC Advanced 230374), Rutgers University Cell and DNA Repository (NIMH U24 MH068457-06), the National Institutes of Health (NIH, R01D0042157-01A1, R01MH58799-03, MH081802, DA018673, R01 DK092127-04, Grand Opportunity grants 1RC2 MH089951, and 1RC2 MH089995); the Avera Institute for Human Genetics, Sioux Falls, South Dakota (USA). Part of the genotyping and analyses were funded by the Genetic Association Information Network (GAIN) of the Foundation for the National Institutes of Health. Computing was supported by NWO through grant 2018/EW/00408559, BiG Grid, the Dutch e-Science Grid and SURFSARA.

RPGEH: Research Program on Genes, Environment and Health/Genetic Epidemiology Research on Aging (RPGEH/GERA)

Data used in this study were provided by the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH): Genetic Epidemiology Research on Adult Health and Aging (GERA), funded by the National Institutes of Health [RC2 AG036607 (Schaefer and Risch)], the Robert Wood Johnson Foundation, the Wayne and Gladys Valley Foundation, The Ellison Medical Foundation, and the Kaiser Permanente Community Benefits Program.

Access to RPGEH data used in this study may be obtained by application via the RPGEH Research portal: https://rpgehportal.kaiser.org. A subset of the GERA cohort consented for public use can be found at NIH/dbGaP: phs000674.v1.p1.

RS-I (Rotterdam Study Baseline), RS-II (Rotterdam Study Extension of Baseline) and RS-III (Rotterdam Study Young)

The generation and management of GWAS genotype data for the Rotterdam Study is supported by the Netherlands Organisation of Scientific Research NWO Investments (nr. 175.010.2005.011, 911- 03-012). This study is funded by the Research Institute for Diseases in the Elderly (014-93-015; RIDE2), the Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) project nr. 050-060-810. We thank Pascal Arp, Mila Jhamai, Marijn Verkerk, Lizbeth Herrera

63 and Marjolein Peters for their help in creating the GWAS database, and Karol Estrada and Maksim V. Struchalin for their support in creation and analysis of imputed data. The Rotterdam Study is funded by Erasmus Medical Center and Erasmus University, Rotterdam, Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. The authors are grateful to the study participants, the staff from the Rotterdam Study and the participating general practitioners and pharmacists. C.A. Rietveld gratefully acknowledges funding from the Netherlands Organization for Scientific Research (NWO Veni grant 016.165.004).

SHIP: Study of Health in Pomerania

SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania, and the network ‘Greifswald Approach to Individualized Medicine (GANI_MED)’ funded by the Federal Ministry of Education and Research (grant 03IS2061A). Genome-wide data have been supported by the Federal Ministry of Education and Research (grant no. 03ZIK012) and a joint grant from Siemens Healthineers, Erlangen, Germany and the Federal State of Mecklenburg- West Pomerania. The University of Greifswald is a member of the Caché Campus program of the InterSystems GmbH. HJG has received travel grants and speakers honoraria from Fresenius Medical Care, Neuraxpharm, Servier and Janssen Cilag as well as research funding from Fresenius Medical Care.

STR: Swedish Twin Registry

The Jan Wallander and Tom Hedelius Foundation (P2015-0001:1), the Ragnar Soderberg Foundation (E9/11, E42/15), The Swedish Research Council (421-2013-1061). STR is financially supported by Karolinska Institutet. Researchers interested in using STR data must obtain approval from a Swedish Ethical Review Board and from the Steering Committee of the Swedish Twin Registry. Researchers using the data are required to follow the terms of an Assistance Agreement containing a number of clauses designed to ensure protection of privacy and compliance with relevant laws. For further information, contact Patrik Magnusson ([email protected]). C.A. Rietveld gratefully acknowledges funding from the Netherlands Organization for Scientific Research (NWO Veni grant 016.165.004).

TwinsUK: St Thomas’ UK Adult Twin Registry

The Twins UK study was funded by the Wellcome Trust, European Community's Seventh Framework Program (FP7/2007-2013)/grant agreement HEALTH-F2-2008-201865-GEFOS and (FP7/2007-2013), ENGAGE project grant agreement HEALTH-F4-2007-201413, and the FP-5 GenomEUtwin Project (QLG2-CT-2002-01254). The Twins UK study also receives support from the Department of Health via the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre award to Guy's and St. Thomas' NHS Foundation Trust in partnership with King's College London. TDS is an NIHR Senior Investigator. The Twins UK study also received support from a Biotechnology and Biological Sciences Research Council (BBSRC) project grant (G20234) and a U.S. National Institutes of Health (NIH)/National Eye Institute (NEI) grant (1RO1EY018246), and genotyping was supported by the NIH Center for Inherited Disease Research. The Twins UK study also received support from the National Institute for Health Research (NIHR) comprehensive Biomedical Research Centre award to Guy's and St. Thomas' National Health Service Foundation Trust partnering with King's College London.

64

UK Biobank

This research has also been conducted using the UK Biobank Resource under Application Numbers 11425, 12514 and 9797 and 22276. Informed consent was obtained from UK Biobank subjects.

UKHLS: Understanding Society – The UK Household Longitudinal Study

The UK Household Longitudinal Study, led by the Institute for Social and Economic Research at the University of Essex is funded by the Economic and Social Research Council (Grant Number: ES/M008592/1). Data were collected by NatCen and the genome wide scan data were analysed by the Wellcome Trust Sanger Institute. Access the data at https://www.understandingsociety.ac.uk/

WGHS: Women’s Genome Health Study

The WGHS is supported by the National Heart, Lung, and Blood Institute (HL043851,

HL080467, HL09935) and the National Cancer Institute (CA047988 and UM1CA182913) with collaborative scientific support and funding for genotyping provided by Amgen.

WLS: Wisconsin Longitudinal Study

This research uses data from the Wisconsin Longitudinal Study (WLS) of the University of Wisconsin- Madison. Since 1991, the WLS has been supported principally by the National Institute on Aging (AG- 9775, AG-21079, AG-033285, and AG-041868, R01 AG041868-01A1), with additional support from the Vilas Estate Trust, the National Science Foundation, the Spencer Foundation, and the Graduate School of the University of Wisconsin-Madison. Since 1992, data have been collected by the University of Wisconsin Survey Center. The opinions expressed herein are those of the authors. A public use file of data from the Wisconsin Longitudinal Study is available from the Wisconsin Longitudinal Study, University of Wisconsin-Madison, 1180 Observatory Drive, Madison, Wisconsin 53706 and at http://www.ssc.wisc.edu/WLSresearch/data/

QIMR: Queensland Institute of Medical Research

Funding was provided by the Australian National Health and Medical Research Council (241944, 339462, 389927, 389875, 389891, 389892, 389938, 442915, 442981, 496739, 552485, 552498), the Australian Research Council (A7960034, A79906588, A79801419, DP0770096, DP0212016, DP0343921), the FP-5 GenomEUtwin Project (QLG2-CT-2002-01254), and the U.S. National Institutes of Health (NIH grants AA07535, AA10248, AA13320, AA13321, AA13326, AA14041, DA12854, MH66206). A portion of the genotyping on which the QIMR study was based (Illumina 370K scans) was carried out at the Center for Inherited Disease Research, Baltimore (CIDR), through an access award to the authors‘ late colleague Dr. Richard Todd (Psychiatry, Washington University School of Medicine, St Louis). S.E.M., is supported by an Australian National Health and Medical Research Council Fellowship APP1103623. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Researchers interested in using QIMR data can contact Nick Martin ([email protected]).

References

1. Tropf, F. C. et al. Human Fertility, Molecular Genetics, and Natural Selection in Modern Societies. PLoS One 10, e0126821 (2015). 2. Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, (2016). 3. Day, F. R. et al. Physical and neurobehavioral determinants of reproductive onset and

65

success. Nat. Genet. doi:10.1038/ng.3551 (2016). doi:10.1038/ng.3551 4. Mills, M. C. et al. Why do people postpone parenthood? Reasons and social policy incentives. Hum. Reprod. Update 17, 848–860 (2011). 5. Leridon, H. A new estimate of permanent sterility by age: sterility defined as the inability to conceive. Popul. Stud. (NY). 62, 15–24 (2008). 6. Balbo, N., Billari, F. C. & Mills, M. Fertility in Advanced Societies: A Review of Research. Eur. J. Popul. / Rev. Eur. Démographie 29, 1–38 (2013). 7. Skinner, S. R. et al. Childhood Behavior Problems and Age at First Sexual Intercourse: A Prospective Birth Cohort Study. Pediatrics 135, 255–263 (2015). 8. Schvaneveldt, P. L., Miller, B. C., Berry, E. H. & Lee, T. R. Academic goals, achievement, and age at first sexual intercourse: longitudinal, bidirectional influences. Adolescence 36, 767–87 (2001). 9. Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat. Hum. Behav. 1, 757–765 (2017). 10. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010). 11. Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012). 12. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). 13. Visscher, P. M., Yang, J. & Goddard, M. E. A commentary on ‘common SNPs explain a large proportion of the heritability for human height’by Yang et al.(2010). Twin Res. Hum. Genet. 13, 517–524 (2010). 14. Visscher, P. M. et al. Statistical power to detect genetic (co) variance of complex traits using SNP data in unrelated samples. PLoS Genet. 10, e1004269 (2014). 15. Loh, P. R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. (2015). doi:10.1038/ng.3431 16. Boardman, J. D., Blalock, C. L. & Pampel, F. C. Trends in the Genetic Influences on Smoking. J. Health Soc. Behav. 51, 108–123 (2010). 17. Domingue, B. W., Conley, D., Fletcher, J. & Boardman, J. D. Cohort Effects in the Genetic Influence on Smoking. Behav. Genet. (2016). doi:10.1007/s10519-015-9731-9 18. van der Most, P. J. et al. QCGWAS: A flexible R package for automated quality control of genome-wide association results. Bioinformatics 30, 1185–1186 (2014). 19. Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014). 20. Mills, M. C. & Rahal, C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020). 21. Chen, X.-K. et al. Teenage pregnancy and adverse birth outcomes: a large population based retrospective cohort study. Int. J. Epidemiol. 36, 368–373 (2007). 22. Bongaarts, J., Mensch, B. S. & Blanc, A. K. Trends in the age at reproductive transitions in the developing world: The role of education. Popul. Stud. (NY). 71, 139–154 (2017). 23. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. (2014). 24. Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–212 (2014). 25. Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science (80-. ). 340, 1467–1471 (2013). 26. Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016). 27. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: Fast and efficient meta-analysis of genomewide

66

association scans. Bioinformatics 26, 2190–2191 (2010). 28. Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018). 29. Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: Polygenic Risk Score software. Bioinformatics 31, btu848-1468 (2014). 30. Vilhjálmsson, B. J. et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet. 97, 576–592 (2015). 31. Harris, K. M. & et al. The National Longitudinal Study of Adolescent to Adult Health: Research Design. (2009). Available at: http://www.cpc.unc.edu/projects/addhealth/design. 32. Buck, N. & McFall, S. Understanding Society: design overview. Longit. Life Course Stud. 3, 5– 17 (2012). 33. Mills, M. C., Barban, N. & Tropf, F. C. An Introduction to Statistical Genetic Data Analysis. (The MIT Press, 2020). 34. Mills, M. C. Introducing survival and event history analysis. (Sage Publications, 2011). 35. Cox, D. R. Regression Models and Life-Tables. J. R. Stat. Soc. Ser. B 34, 187–220 (1972). 36. Bulik-Sullivan, B. et al. An Atlas of Genetic Correlations across Human Diseases and Traits. bioRxiv 47, 1–44 (2015). 37. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). 38. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015). 39. Mostafavi, H. et al. Variable prediction accuracy of polygenic scores within an ancestry group. bioRxiv (2019). doi:https://doi.org/10.1101/629949 40. Perry, J. R. B. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014). 41. Day, Felix R., Katherine S Ruth, Deborah J Thompson, Kathryn L Lunetta, Natalia Pervjakova, Daniel I Chasman, Lisette Stolk, Hilary K Finucane, Patrick Sulem, Brendan Bulik-Sullivan, Tõnu Esko, Andrew D Johnson, Cathy E Elks, Nora Franceschini, Chunyan He, L. M. R. et al. et al. Large-scale genomic analyses link reproductive aging to hypothalamic signaling, breast cancer susceptibility and BRCA1-mediated DNA repair. Nat. Genet. 47, 1294–1303 (2015). 42. Day, F. R. & Al., E. Shared genetic aetiology of puberty timing between sexes and with health- related outcomes. Nat. Commun. 6, 8842 (2015). 43. Mathieson, I., et al., Perry, J. R. B. & Mills, M. C. Genome-wide analysis for reproductive success highlights contemporary natural selection. Under Rev. (2020). 44. Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 51, 245–257 (2019). 45. Neale, B. M. & Al., E. GWAS round 2 Analysis of the UK Biobank. Website & Github repository (2018). Available at: http://www.nealelab.is/uk-biobank. 46. Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017). 47. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018). 48. Baselmans, B. M. L. et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet. 51, 445–451 (2019). 49. Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019). 50. Martin, J. et al. A Genetic Investigation of Sex Bias in the Prevalence of Attention- Deficit/Hyperactivity Disorder. Biol. Psychiatry 83, 1044–1053 (2018). 51. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

67

52. Stahl, E. A. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. 51, 793–803 (2019). 53. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018). 54. Watson, H. J. et al. Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nat. Genet. 51, 1207–1214 (2019). 55. Liu, M. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 51, 237–244 (2019). 56. de Moor, M. H. M. et al. Meta-analysis of genome-wide association studies for personality. Mol. Psychiatry 17, 337–349 (2012). 57. Day, F. R., Ong, K. K. & Perry, J. R. B. Elucidating the genetic basis of social interaction and isolation. Nat. Commun. 9, 2457 (2018). 58. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014). 59. Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015). 60. Hoffmann, T. J. et al. A Large Multiethnic Genome-Wide Association Study of Adult Body Mass Index Identifies Novel Loci. Genetics 210, 499–515 (2018). 61. Hill, W. D. et al. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence. Mol. Psychiatry (2018). doi:10.1038/s41380-017-0001-5 62. Day, F. R. et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat. Genet. 49, 834–841 (2017). 63. Barban, N. et al. Genome-wide analysis identifies 12 loci influencing human reproductive behavior. Nat. Genet. 48, 1–7 (2016). 64. Halpern, C. T. et al. Smart teens don’t have sex (or kiss much either). J. Adolesc. Heal. 26, 213–225 (2000). 65. Neiss, M., Rowe, D. C. & Rodgers, J. L. Does education mediate the relationship between IQ and age of first birth? A behavioural genetic analysis. J. Biosoc. Sci. 34, 259–276 (2002). 66. Iacono, W. G., Malone, S. M. & McGue, M. Behavioral Disinhibition and the Development of Early-Onset Addiction: Common and Specific Influences. Annu. Rev. Clin. Psychol. 4, 325–348 (2008). 67. Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–94 (2013). 68. Alvergne, A., Jokela, M. & Lummaa, V. Personality and reproductive success in a high-fertility human population. Proc Natl Acad Sci US A 107, 11745–11750 (2010). 69. Ni, G., Gratten, J., Wray, N. R. & Lee, S. H. Age at first birth in women is genetically associated with increased risk of schizophrenia. Sci. Rep. 8, 10168 (2018). 70. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015). 71. Stulp, G., Barrett, L., Tropf, F. C. & Mills, M. Does natural selection favour taller stature among the tallest people on earth? Proc. R. Soc. B Biol. Sci. 282, 20150211–20150211 (2015). 72. Belsky, J., Steinberg, L., Houts, R. M. & Halpern-Felsher, B. L. The development of reproducve strategy in females: Early maternal harshness → earlier menarche → increased sexual risk taking. Dev. Psychol. 46, 120–128 (2010). 73. Thomas, F., Renaud, F., Benefice, E., Meeus, T. de & Guegan, J.-F. International Variability of Ages at Menarche and Menopause: Patterns and Main Determinants. Hum. Biol. 73, 271–290 (2001). 74. VALOIS, R., OELTMANN, J., WALLER, J. & HUSSEY, J. Relationship between number of sexual intercourse partners and selected health risk behaviors among public high school

68

adolescents. J. Adolesc. Heal. 25, 328–335 (1999). 75. Finer, L. B. & Philbin, J. M. Sexual Initiation, Contraceptive Use, and Pregnancy Among Young Adolescents. Pediatrics 131, 886–891 (2013). 76. Weghofer, A. et al. Age at menarche: a predictor of diminished ovarian function? Fertil. Steril. 100, 1039–1043 (2013). 77. Sarver, D. E., McCart, M. R., Sheidow, A. J. & Letourneau, E. J. ADHD and risky sexual behavior in adolescents: Conduct problems and substance use as mediators of risk. J. Child Psychol. Psychiatry 55, 1345–1353 (2014). 78. Friederich, H.-C. & Herzog, W. Cognitive-Behavioral Flexibility in Anorexia Nervosa. in 111– 123 (2010). doi:10.1007/7854_2010_83 79. Howe, G., C. Westhoff, M. Vessey, Yeates, D. Effects Of Age, Cigarette Smoking, And Other Factors On Fertility: Findings In A Large Prospective Study. Br. Med. J. 290, 1697–1700 (1985). 80. Cooper, A.R., Moley, K. H. Maternal tobacco use and its preimplantation effects on fertility: more reasons to stop smoking. Semin. Reprod. Med. 26, 204–212 (2008). 81. Omurtag, K, et al. Modeling the effect of cigarette smoke on hexose utilization in spermatocytes. Reprod. Sci. 22, 94–101 (2015). 82. Laaksonen, M. et al. Socioeconomic status and smokingAnalysing inequalities with multiple indicators. Eur. J. Public Health 15, 262–269 (2005). 83. Jokela, M. Birth-Cohort Effects in the Association Between Personality and Fertility. Psychological Science 23, 835–841 (2012). 84. Jokela, M. et al. Lower fertility associated with obesity and underweight: the US National Longitudinal Survey of Youth. Am. J. Clin. Nutr. 88, 886–893 (2008). 85. Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019). 86. Sullivan, P. & Kendler, K. The genetic epidemiology of smoking. Nicotine Tob. Res. 1, 51–57 (1999). 87. Mills, M. C. & Rahal, C. Tracking GWAS diversity by disease requires real-time monitoring. Nat. Genet. (2020). 88. Mills, M. C. & Rahal, C. A Scientometric Review of Genome-Wide Association Studies. Commun. Biol. 2, (2019). 89. Sironi, M. Fertility histories and chronic conditions later in life in Europe. Eur. J. Ageing 16, 259–272 (2019). 90. Mishra, G. D., Cooper, R. & Kuh, D. A life course approach to reproductive health: theory and methods. Maturitas 65, 92–7 (2010). 91. Pandeya, N. et al. Female reproductive history and risk of type 2 diabetes: A prospective analysis of 126 721 women. Diabetes. Obes. Metab. 20, 2103–2112 (2018). 92. Rosendaal, N. T. A. et al. Adolescent Childbirth Is Associated With Greater Framingham Risk Scores for Cardiovascular Disease Among Participants of the IMIAS (International Mobility in Aging Study). J. Am. Heart Assoc. 6, (2017). 93. Verbanck, M., Chen, C. Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. (2018). doi:10.1038/s41588-018-0099-7 94. Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high- density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018). 95. Nelson, C. P. et al. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat. Genet. 49, 1385–1391 (2017). 96. Schulkind, L. & Sandler, D. H. The Timing of Teenage Births: Estimating the Effect on High School Graduation and Later-Life Outcomes. Demography 56, 345–365 (2019). 97. Eisenberg, D. T. A., Hayes, M. G. & Kuzawa, C. W. Delayed paternal age of reproduction in humans is associated with longer telomeres across two generations of descendants. Proc. Natl. Acad. Sci. 109, 10251–10256 (2012).

69

98. Abrams, P. A. & Ludwig, D. OPTIMALITY THEORY, GOMPERTZ’ LAW, AND THE DISPOSABLE SOMA THEORY OF SENESCENCE. Evolution (N. Y). 49, 1055–1066 (1995). 99. Westendorp, R. G. J. & Kirkwood, T. B. L. Human longevity at the cost of reproductive success. Nature 396, 743–746 (1998). 100. Smith, K. R., Mineau, G. P. & Bean, L. L. Fertility and post-reproductive longevity. Biodemography Soc. Biol. 49, 185–205 (2002). 101. Shadyab, A. H. et al. Maternal Age at Childbirth and Parity as Predictors of Longevity Among Women in the United States: The Women’s Health Initiative. Am. J. Public Health 107, 113– 119 (2017). 102. Mostafavi, H. et al. Identifying genetic variants that affect viability in large cohorts. PLOS Biol. 15, e2002458 (2017). 103. Woods, L. M. Geographical variation in life expectancy at birth in England and Wales is largely explained by deprivation. J. Epidemiol. Community Heal. 59, 115–120 (2005). 104. Day, F. R., Loh, P.-R., Scott, R. A., Ong, K. K. & Perry, J. R. B. A Robust Example of Collider Bias in a Genetic Association Study. Am. J. Hum. Genet. 98, 392–393 (2016). 105. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015). 106. Zeisel, A. et al. Molecular Architecture of the Mouse Nervous System. Cell 174, 999-1014.e22 (2018). 107. Tabula Muris Consortium et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018). 108. Yang, H., Robinson, P. N. & Wang, K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat. Methods 12, 841–3 (2015). 109. Villanueva, C. & de Roux, N. FGFR1 Mutations in Kallmann Syndrome. in Kallmann Syndrome and Hypogonadotropic Hypogonadism 51–61 (KARGER, 2010). doi:10.1159/000312693 110. Kim, S. H. et al. Estrogen receptor dinucleotide repeat polymorphism is associated with minimal or mild endometriosis. Fertil. Steril. 84, 774–777 (2005). 111. Stavrou, I. Association of polymorphisms of the oestrogen receptoralpha gene with the age of menarche. Hum. Reprod. 17, 1101–1105 (2002). 112. Larriba, S., Muñoz, X., Navarro, M., Mata, A. & Bassas, L. Association of PIWIL4 genetic variants with germ cell maturation arrest in infertile Spanish men. Asian J. Androl. 16, 931 (2014). 113. Treutlein, J. et al. Genome-wide association study of alcohol dependence. Arch. Gen. Psychiatry 66, 773–84 (2009). 114. Westberg, L. et al. Association between a dinucleotide repeat polymorphism of the estrogen receptor alpha gene and personality traits in women. Mol. Psychiatry 8, 118–22 (2003). 115. Petrunak, E. M., DeVore, N. M., Porubsky, P. R. & Scott, E. E. Structures of human steroidogenic cytochrome P450 17A1 with substrates. J. Biol. Chem. 289, 32952–64 (2014). 116. Feigelson, H. S. et al. Cytochrome P450c17alpha gene (CYP17) polymorphism is associated with serum estrogen and progesterone concentrations. Cancer Res. 58, 585–7 (1998). 117. Hsieh, Y.-Y., Chang, C.-C., Tsai, F.-J., Lin, C.-C. & Tsai, C.-H. Estrogen receptor alpha dinucleotide repeat and cytochrome P450c17alpha gene polymorphisms are associated with susceptibility to endometriosis. Fertil. Steril. 83, 567–72 (2005). 118. Sata, F. et al. A polymorphism in the CYP17 gene relates to the risk of recurrent pregnancy loss. Mol. Hum. Reprod. 9, 725–8 (2003). 119. Gorai, I. et al. Estrogen-metabolizing gene polymorphisms, but not estrogen receptor-alpha gene polymorphisms, are associated with the onset of menarche in healthy postmenopausal Japanese women. J. Clin. Endocrinol. Metab. 88, 799–803 (2003). 120. Ben-Menahem, D., Hyde, R., Pixley, M., Berger, P. & Boime, I. Synthesis of multi-subunit domain gonadotropin complexes: a model for alpha/beta heterodimer formation. Biochemistry 38, 15070–7 (1999).

70

121. Keutmann, H. T. & Rubin, D. A. A subunit interaction site in human luteinizing hormone: identification by photoaffinity cross-linking. Endocrinology 132, 1305–12 (1993). 122. Hong, S., Ji, I. & Ji, T. H. The alpha-subunit of human choriogonadotropin interacts with the exodomain of the luteinizing hormone/choriogonadotropin receptor. Endocrinology 140, 2486–93 (1999). 123. Tong, Y., Liao, W. X., Roy, A. C. & Ng, S. C. Association of AccI polymorphism in the follicle- stimulating hormone beta gene with polycystic ovary syndrome. Fertil. Steril. 74, 1233–6 (2000). 124. Mukhopadhyay, S. & Howlett, A. C. CB1 receptor-G protein association. Subtype selectivity is determined by distinct intracellular domains. Eur. J. Biochem. 268, 499–505 (2001). 125. Herrlich, A. et al. Involvement of Gs and Gi proteins in dual coupling of the luteinizing hormone receptor to adenylyl cyclase and phospholipase C. J. Biol. Chem. 271, 16764–72 (1996). 126. Davies, G. et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat. Commun. 9, 2098 (2018). 127. Cattane, N. et al. Altered gene expression in schizophrenia: findings from transcriptional signatures in fibroblasts and blood. PLoS One 10, e0116686 (2015). 128. Weiss, C. et al. JNK phosphorylation relieves HDAC3-dependent suppression of the transcriptional activity of c-Jun. EMBO J. 22, 3686–95 (2003). 129. Wu, R.-C., Jiang, M., Beaudet, A. L. & Wu, M.-Y. ARID4A and ARID4B regulate male fertility, a functional link to the AR and RB pathways. Proc. Natl. Acad. Sci. U. S. A. 110, 4616–21 (2013). 130. Lai, A. et al. RBP1 recruits both histone deacetylase-dependent and -independent repression activities to retinoblastoma family proteins. Mol. Cell. Biol. 19, 6632–41 (1999). 131. Vaez, A. et al. In Silico Post Genome-Wide Association Studies Analysis of C-Reactive Protein Loci Suggests an Important Role for Interferons. Circ. Cardiovasc. Genet. 8, 487–497 (2015). 132. Vosa, U. & Al., E. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. bioRxiv (2018). doi:https://doi.org/10.1101/447367 133. Qi, T. et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 9, 2282 (2018). 134. van der Wijst, M. G. P. et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018). 135. Uhlén, M. et al. . Tissue-based map of the human proteome. Science 347, 1260419 (2015). 136. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015). 137. Bult, C. J. et al. Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 47, D801–D806 (2019). 138. Bult, C. J. et al. Mouse Genome Database (MGD) 2019. Nucleic Acids Res. 47, D801–D806 (2019). 139. Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019). 140. Venkatraman, A. et al. The histone deacetylase HDAC3 is essential for Purkinje cell function, potentially complicating the use of HDAC inhibitors in SCA1. Hum. Mol. Genet. 23, 3733–45 (2014). 141. Večeřa, J. et al. HDAC1 and HDAC3 underlie dynamic H3K9 acetylation during embryonic neurogenesis and in schizophrenia-like animals. J. Cell. Physiol. 233, 530–548 (2018). 142. Alaghband, Y. et al. Distinct roles for the deacetylase domain of HDAC3 in the hippocampus and medial prefrontal cortex in the formation and extinction of memory. Neurobiol. Learn. Mem. 145, 94–104 (2017). 143. Zhao, Q. et al. HDAC3 inhibition prevents blood-brain barrier permeability through Nrf2

71

activation in type 2 diabetes male mice. J. Neuroinflammation 16, 103 (2019). 144. Young, E. J. et al. Reduced fear and aggression and altered serotonin metabolism in Gtf2ird1- targeted mice. Genes. Brain. Behav. 7, 224–34 (2008). 145. Barak, B. et al. Neuronal deletion of Gtf2i, associated with Williams syndrome, causes behavioral and myelin alterations rescuable by a remyelinating drug. Nat. Neurosci. 22, 700– 708 (2019). 146. Feng, W. et al. Chd7 is indispensable for mammalian brain development through activation of a neuronal differentiation programme. Nat. Commun. 8, 14758 (2017). 147. Li, Y. et al. Topoisomerase IIbeta is required for proper retinal development and survival of postmitotic cells. Biol. Open 3, 172–84 (2014). 148. Athanasiou, M. C. et al. The transcription factor E2F-1 in SV40 T antigen-induced cerebellar Purkinje cell degeneration. Mol. Cell. Neurosci. 12, 16–28 (1998). 149. Hou, S. T. et al. The transcription factor E2F1 modulates apoptosis of neurons. J. Neurochem. 75, 91–100 (2000). 150. Yuan, Z. et al. Opposing roles for E2F1 in survival and death of cerebellar granule neurons. Neurosci. Lett. 499, 164–9 (2011). 151. Wang, L., Wang, R. & Herrup, K. E2F1 works as a cell cycle suppressor in mature neurons. J. Neurosci. 27, 12555–64 (2007). 152. Harrington, A. J. et al. MEF2C regulates cortical inhibitory and excitatory synapses and behaviors relevant to neurodevelopmental disorders. Elife 5, (2016). 153. Davis, J. Q. & Bennett, V. Ankyrin binding activity shared by the neurofascin/L1/NrCAM family of nervous system cell adhesion molecules. J. Biol. Chem. 269, 27163–6 (1994). 154. Matzel, L. D., Babiarz, J., Townsend, D. A., Grossman, H. C. & Grumet, M. Neuronal cell adhesion molecule deletion induces a cognitive and behavioral phenotype reflective of impulsivity. Genes. Brain. Behav. 7, 470–80 (2008). 155. Yang, X. et al. The association between NCAM1 levels and behavioral phenotypes in children with autism spectrum disorder. Behav. Brain Res. 359, 234–238 (2019). 156. Zhang, A. et al. Neurofascin 140 is an embryonic neuronal neurofascin isoform that promotes the assembly of the node of Ranvier. J. Neurosci. 35, 2246–54 (2015). 157. Smigiel, R. et al. Homozygous mutation in the Neurofascin gene affecting the glial isoform of Neurofascin causes severe neurodevelopment disorder with hypotonia, amimia and areflexia. Hum. Mol. Genet. 27, 3669–3674 (2018). 158. Wu, J., Capp, C., Feng, L. & Hsieh, T. Drosophila homologue of the Rothmund-Thomson syndrome gene: essential function in DNA replication during development. Dev. Biol. 323, 130–42 (2008). 159. Gupta, P. et al. Retinoic acid-stimulated sequential phosphorylation, PML recruitment, and SUMOylation of nuclear receptor TR2 to suppress Oct4 expression. Proc. Natl. Acad. Sci. U. S. A. 105, 11424–9 (2008). 160. Ellsworth, B. S. et al. FOXL2 in the pituitary: molecular, genetic, and developmental analysis. Mol. Endocrinol. 20, 2796–805 (2006). 161. Zheng, J. et al. Novel FSHβ mutation in a male patient with isolated FSH deficiency and infertility. Eur. J. Med. Genet. 60, 335–339 (2017). 162. Huhtaniemi, I. A short evolutionary history of FSH-stimulated spermatogenesis. Hormones (Athens). 14, 468–78 163. Kottler, M.-L. et al. A new FSHbeta mutation in a 29-year-old woman with primary amenorrhea and isolated FSH deficiency: functional characterization and ovarian response to human recombinant FSH. Eur. J. Endocrinol. 162, 633–41 (2010). 164. Sentis, S., Le Romancer, M., Bianchin, C., Rostan, M.-C. & Corbo, L. Sumoylation of the estrogen receptor alpha hinge region regulates its transcriptional activity. Mol. Endocrinol. 19, 2671–84 (2005). 165. Governini, L. et al. FOXL2 in human endometrium: hyperexpressed in endometriosis. Reprod.

72

Sci. 21, 1249–55 (2014). 166. Ellsworth, B. S. et al. FOXL2 in the pituitary: molecular, genetic, and developmental analysis. Mol. Endocrinol. 20, 2796–805 (2006). 167. Rico, C. et al. HIF1 activity in granulosa cells is required for FSH-regulated Vegfa expression and follicle survival in mice. Biol. Reprod. 90, 135 (2014). 168. Su, E. J. et al. Impaired fetoplacental angiogenesis in growth-restricted fetuses with abnormal umbilical artery doppler velocimetry is mediated by aryl hydrocarbon receptor nuclear translocator (ARNT). J. Clin. Endocrinol. Metab. 100, E30-40 (2015). 169. Dai, Z. et al. Caveolin-1 promotes trophoblast cell invasion through the focal adhesion kinase (FAK) signalling pathway during early human placental development. Reprod. Fertil. Dev. (2019). doi:10.1071/RD18296 170. Artini, P. G. et al. Cumulus cells surrounding oocytes with high developmental competence exhibit down-regulation of phosphoinositol 1,3 kinase/protein kinase B (PI3K/AKT) signalling genes involved in proliferation and survival. Hum. Reprod. 32, 2474–2484 (2017). 171. Horn, S. et al. Research Resource: A Dual Proteomic Approach Identifies Regulated Islet Proteins During β-Cell Mass Expansion In Vivo. Mol. Endocrinol. 30, 133–43 (2016). 172. Lei, W. et al. Progesterone and DNA damage encourage uterine cell proliferation and decidualization through up-regulating ribonucleotide reductase 2 expression during early pregnancy in mice. J. Biol. Chem. 287, 15174–92 (2012). 173. Rotgers, E., Nurmio, M., Pietilä, E., Cisneros-Montalvo, S. & Toppari, J. E2F1 controls germ cell apoptosis during the first wave of spermatogenesis. Andrology 3, 1000–14 (2015). 174. Petrunak, E. M., DeVore, N. M., Porubsky, P. R. & Scott, E. E. Structures of human steroidogenic cytochrome P450 17A1 with substrates. J. Biol. Chem. 289, 32952–64 (2014). 175. Idkowiak, J., Cragun, D., Hopkin, R. J. & Arlt, W. Cytochrome P450 Oxidoreductase Deficiency. GeneReviews® (1993). 176. Yan, W., Burns, K. H., Ma, L. & Matzuk, M. M. Identification of Zfp393, a germ cell-specific gene encoding a novel zinc finger protein. Mech. Dev. 118, 233–9 (2002). 177. van Vliet, J. et al. Human KLF17 is a new member of the Sp/KLF family of transcription factors. Genomics 87, 474–82 (2006). 178. McLeskey, S. B., Dowds, C., Carballada, R., White, R. R. & Saling, P. M. Molecules involved in mammalian sperm-egg interaction. Int. Rev. Cytol. 177, 57–113 (1998). 179. Lin, Y.-N., Roy, A., Yan, W., Burns, K. H. & Matzuk, M. M. Loss of zona pellucida binding proteins in the acrosomal matrix disrupts acrosome biogenesis and sperm morphogenesis. Mol. Cell. Biol. 27, 6794–805 (2007). 180. Gusev, F. E. et al. Epigenetic-genetic chromatin footprinting identifies novel and subject- specific genes active in prefrontal cortex neurons. FASEB J. 33, 8161–8173 (2019). 181. Starnawska, A. et al. Differential DNA methylation at birth associated with mental disorder in individuals with 22q11.2 deletion syndrome. Transl. Psychiatry 7, e1221 (2017). 182. Savage, J. E. et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat. Genet. 50, 912–919 (2018). 183. Verweij, R. M. et al. Sexual dimorphism in the genetic influence on human childlessness. Eur. J. Hum. Genet. (2017). doi:10.1038/ejhg.2017.105 184. Blundell, R. Causes of infertility.pdf. Int. J. Mol. Med. Adv. Sci. 3, 63–65 (2007). 185. Verweij, R. M. et al. Sexual dimorphism in the genetic influence on human childlessness. Eur. J. Hum. Genet. 25, 1067–1074 (2017). 186. Ober, C., Loisel, D. A. & Gilad, Y. Sex-specific genetic architecture of human disease. Nat Rev Genet 9, 911–922 (2008). 187. Montgomery, G. W., Zondervan, K. T. & Nyholt, D. R. The future for genetic studies in reproduction. Mol. Hum. Reprod. 20, 1–14 (2014). 188. Briley, D. A., Tropf, F. C. & Mills, M. C. What Explains the Heritability of Completed Fertility? Evidence from Two Large Twin Studies. Behav. Genet. 47, 36–51 (2017).

73

189. Jokela, M., Kivimaki, M., Elovainio, M. & Keltikangas-Jarvinen, L. Personality and having children: A two-way relationship. J. Pers. Soc. Psychol. 96, 208–230 (2009). 190. Nisén, J., Martikainen, P., Kaprio, J. & Silventoinen, K. Educational Differences in Completed Fertility: A Behavioral Genetic Study of Finnish Male and Female Twins. Demography 1–22 (2013). 191. Finucane, H. K. et al. Partionining heritability by functional category using GWAS summary statistics. bioRxiv. (2015). 192. Altman, D. G. & Bland, J. M. Interaction revisited: the difference between two estimates. BMJ 326, 219 (2003). 193. Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium. Meta- analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017). 194. Benzel, I. et al. Interactions among genes in the ErbB-Neuregulin signalling network are associated with increased susceptibility to schizophrenia. Behav. Brain Funct. 3, 31 (2007). 195. Day, F. R. et al. Causal mechanisms and balancing selection inferred from genetic associations with polycystic ovary syndrome. Nat. Commun. 6, 8464 (2015). 196. Kirov, G. et al. Neurexin 1 (NRXN1) deletions in schizophrenia. Schizophr. Bull. 35, 851–4 (2009). 197. Baumgartner, H. K. et al. Characterization of choline transporters in the human placenta over gestation. Placenta 36, 1362–9 (2015). 198. Peng, Z. et al. Liver X receptor β in the hippocampus: A potential novel target for the treatment of major depressive disorder? Neuropharmacology 135, 514–528 (2018). 199. Boder, E. Ataxia-telangiectasia: an overview. Kroc Found. Ser. 19, 1–63 (1985). 200. Goodman, C. S., Coulam, C. B., Jeyendran, R. S., Acosta, V. A. & Roussev, R. Which thrombophilic gene mutations are risk factors for recurrent pregnancy loss? Am. J. Reprod. Immunol. 56, 230–6 (2006). 201. Sturm, R. A. & Duffy, D. L. Human pigmentation genes under environmental selection. Genome Biol. 13, 248 (2012). 202. Wieser, F. et al. Expression and regulation of CCR1 in peritoneal macrophages from women with and without endometriosis. Fertil. Steril. 83, 1878–1881 (2005). 203. Jedrzejczak, P. et al. Quantitative analysis of CCR5 chemokine receptor and cytochrome P450 aromatase transcripts in swim-up spermatozoa isolated from fertile and infertile men. Arch. Androl. 52, 335–41 204. Mei, J. et al. CXCL16/CXCR6 interaction Promotes Endometrial Decidualization via the PI3K/ AKT Pathway. Reproduction (2019). doi:10.1530/REP-18-0417

74