Genome-wide analysis of risk-taking behaviour and cross-disorder genetic correlations in 116 255 individuals from the UK Biobank cohort.

Supplementary Information contents

Supplementary Methods: Alcohol use disorder cohort descriptions

Supplementary Figure 1: LD structure of the chr3 locus.

Supplementary Figure 2: LD structure of the chr6 locus.

Supplementary Figure 3: Global tissue expression of CADM2 (A) and the related CADM1 (B).

Supplementary Figure 4: Genotype-specific expression of CADM2 in the specific brain regions.

Supplementary Figure 5: Global tissue expression of (A) POM121L2, (B) VN1R10P, (C) ZNF192P2 (D) PRSS16 and (E) ZNF204P.

Supplementary Figure 6: Genotype-specific gene expression of (A and B) PRSS16 and (C) ZNF204P in specific brain regions

Supplementary Table 1: Genome-wide significant loci associated with risk-taking in UK Biobank: basic and conditional analyses.

Supplementary Table 2: Variant Effect Predictor summary of impact of GWAS-significant SNPs on nearby .

Supplementary Table 3: in GWAS significant loci.

Supplementary Table 4: Previously reported signals in the CADM2 and chr 6 loci and results in the risk-taking GWAS. Genome-wide analysis of risk-taking behaviour in 116 255 individuals in UK Biobank and cross- disorder genetic correlation.

Supplementary Methods

Description of cohorts contributing to the alcohol use disorder meta-analysis (unpublished as of 20170608)

COGEND: Recruitment Sites: Washington University (St. Louis, MO), Henry Ford Health System (Detroit, MI) Recruitment Dates: 2002-2007 Instrument: Semi-Structured Assessment of Nicotine Dependence (SSAND) Inclusion Criteria: age 25-44, English speaking, smoked 100 cigarettes lifetime, current FTND >4 for case, lifetime FTND < 1 for control

The Collaborative Genetic Study of Nicotine Dependence (COGEND) was initiated to detect and characterize genes that alter risk for heavy tobacco consumption, nicotine dependence, and related phenotypes. Community-based recruitment was used to enroll nicotine dependent cases and non- dependent smoking controls in St. Louis, Missouri and Detroit, Michigan between 2002 and 2007. Institutional Review Board approval was obtained at all data collection sites prior to enrolling participants, and all participants provided informed consent. Participants were recruited using the Missouri Family Registry in St. Louis and Health Maintenance Organizations in Detroit. Potential participants were screened to determine eligibility. All participants had to be between the ages of 25-44 years and speak English. Nicotine dependent cases were defined as current smokers with a Fagerström Test for Nicotine Dependence (FTND) score of 4 or greater 1. Control status was defined as smoking at least 100 cigarettes lifetime, but never being nicotine dependent (lifetime FTND score < 1). Other substance dependence diagnoses or comorbid disorders were not exclusionary criteria. Those who qualified as a nicotine dependent case or non-dependent control completed an in-person comprehensive interview and donated a blood sample for genetic analysis.

Following the informed consent process, participants were assessed for baseline demographics, psychiatric disorders, and substance use history using a Computer Assisted Personal Interview (CAPI) version of the Semi-Structured Assessment of Nicotine Dependence (SSAND), which was developed specifically for COGEND. The SSAND is modeled after the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA) 2, 3. The SSAGA is a validated instrument developed to provide a detailed evaluation of alcohol, nicotine and other substance use disorders. The SSAGA has been utilized in many large-scale genetic studies of substance dependence 4, 5 and has high test-retest reliability in short-term (one week) and long-term (5 year) studies 6, 7. Substance use disorder was assessed based on DSM-IV 8 and DSM5 9 criteria. Nicotine dependence was also assessed using the Fagerström Test for Nicotine Dependence (FTND) 1. Participants also completed CAPI versions of the NEO Five-Factor Inventory 10, the Nicotine Dependence Syndrome Scale 11, and the Wisconsin Inventory of Smoking Dependence Motives 12.

Participants provided a blood sample for genetic analysis.

COGEND2 Recruitment Site: Washington University (St. Louis, MO) Recruitment Dates: 2011-2014 Instrument: Semi-Structured Assessment of Nicotine Dependence-version 2 (SSAND-II) Inclusion Criteria: age 25-44, English speaking, current FTND >4 for case, lifetime FTND < 1 for control

The Collaborative Genetic Study of Nicotine Dependence (COGEND) was initiated to detect and characterize genes that alter risk for heavy tobacco consumption, nicotine dependence, and related phenotypes. Additional recruitment was undertaken to expand the sample. Institutional Review Board approval was obtained at Washington University prior to enrolling participants, and all participants provided informed consent. Participants were recruited from the St. Louis metropolitan area between 2011 and 2014 via marketing lists, flyers, and street recruitment. Potential participants were screened to determine eligibility. All participants had to be between the ages of 25-44 years and speak English. Nicotine dependent cases were defined as current smokers with a Fagerström Test for Nicotine Dependence (FTND) score of 4 or greater 1. Control status was defined as smoking at least 40 cigarettes lifetime, but never being nicotine dependent (lifetime FTND score < 1). Other substance dependence diagnoses or comorbid disorders were not exclusionary criteria. Those who qualified as a nicotine dependent case or non-dependent control completed an in-person comprehensive interview and donated a blood sample for genetic analysis.

Following the informed consent process, participants were assessed for baseline demographics, psychiatric disorders, and substance use history using a modified Computer Assisted Personal Interview (CAPI) version of the Semi-Structured Assessment of Nicotine Dependence (SSAND). The SSAND is modeled after the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA) 2, 3. The SSAGA is a validated instrument developed to provide a detailed evaluation of alcohol, nicotine and other substance use disorders. The SSAGA has been utilized in many large-scale genetic studies of substance 4, 5 and has high test-retest reliability in short-term (one week) and long-term (5 year) studies 6, 7. Substance use disorder was assessed based on DSM-IV 8 and DSM5 9 criteria. Nicotine dependence was also assessed using the Fagerström Test for Nicotine Dependence (FTND) 1. Participants also completed CAPI versions of the NEO Five-Factor Inventory 10, the Nicotine Dependence Syndrome Scale 11, and the Wisconsin Inventory of Smoking Dependence Motives 12.

Participants provided a blood sample for genetic analysis.

COGEND-23andMe Recruitment Site: Washington University (St. Louis, MO) Recruitment Dates: 2014-2015 (recruitment is ongoing) Instruments: Semi-Structured Assessment of Nicotine Dependence-short version (SSAND-Short) used in 2014 Semi-Structured Assessment of Nicotine Dependence–short version 2015 (SSAND-Short 2015) used in 2015 Inclusion Criteria: age 25-44, English speaking, smoked 100 cigarettes lifetime, exhaled carbon monoxide level > 7 parts per million, smoked > 15 days during the past month

The Collaborative Genetic Study of Nicotine Dependence (COGEND) was initiated to detect and characterize genes that alter risk for heavy tobacco consumption, nicotine dependence, and related phenotypes. We undertook a further extension to study biomarkers of smoking. Institutional Review Board approval was obtained at Washington University prior to enrolling participants, and all participants provided informed consent. Participants were recruited from the St. Louis metropolitan area between 2014 and 2015 via internet advertising, Facebook, flyers, and word of mouth. All participants were current smokers as demonstrated by an exhaled carbon monoxide level > 7 parts per million and self-reported smoking on > 15 days during the past month. Participants were required to have smoked 100 cigarettes lifetime, be between the ages of 25-44 years and speak English. Other substance use or comorbid conditions were not exclusionary criteria. Those who qualified completed an in-person comprehensive interview, provided measures of exhaled carbon monoxide, and donated a saliva sample for genetic analysis.

Following the informed consent process, participants were assessed for baseline demographics and substance use history using a modified Computer Assisted Personal Interview (CAPI) version of the Semi-Structured Assessment of Nicotine Dependence (SSAND). The SSAND is modeled after the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA) 2, 3. The SSAGA is a validated instrument developed to provide a detailed evaluation of alcohol, nicotine and other substance use disorders. The SSAGA has been utilized in many large-scale genetic studies of substance dependence 4, 5 and has high test-retest reliability in short-term (one week) and long-term (5 year) studies 6, 7. Substance use disorder was assessed based on DSM-IV 8and DSM5 9criteria. Nicotine dependence was also assessed using the Fagerström Test for Nicotine Dependence (FTND) 1. Healthcare literacy was assessed using the Rapid Estimate of Adult Literacy in Medicine, Revised (REALM-R) 13.

Participants provided saliva samples for genetic analysis using 23andMe DNA collection kits. 23andMe is a privately held personal genomics and biotechnology company that produces high quality genetic data in CLIA-certified laboratories. The success rate of genotyping submitted saliva samples was 97%. In addition to data cleaning performed by 23andMe, we performed additional checks including individual sample quality, SNP quality, Hardy-Weinberg Equilibrium (HWE), duplicates, and relatedness across participants. We required at least a 98% call rate across all SNPs for a sample to be included in analyses. At a SNP level, we required at least a 98% call rate for each SNP in the sample. Relatedness across participants was examined to make sure that our participants were independent. Setting thresholds of per sample call rate of 98%, minor allele frequency of 1% or greater, and HWE p value more than 10-10,, we had a final SNP set of 488,487 variants with a mean call rate of 99.89% per sample and average 99.79% call rate per SNP, demonstrating the high quality genetic data generated by 23andMe services.

COGA The Collaborative Study on the Genetics of Alcoholism (COGA) was initiated in 1989 and is a large- scale family study that has had as its primary aim the identification of genes that contribute to alcoholism susceptibility and related characteristics 4. Subjects were recruited from 7 cities across the U.S. and Institutional Review Boards at all sites approved the protocols. Alcohol dependent probands were recruited from treatment facilities and were required to meet criteria for DSM-III-R Alcohol Dependence 14 and Feighner Alcoholism at the Definite level 15. A set of comparison families was drawn from the same communities as the families recruited through the alcohol dependent probands. After obtaining informed consent, participants were assessed using a comprehensive personal interview developed for this project, the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), which gathers detailed information on alcoholism-related symptoms along with other drugs and psychiatric symptoms 2, 3. The SSAGA has high test-retest reliability in short-term (one week) and long-term (5 year) studies 6, 7. Families ascertained through an alcohol dependent proband with three or more first-degree relatives who were alcohol dependent were invited for more extensive testing, including neurophysiology evaluations and a battery of neuropsychological assessments. Participants also provided a blood sample for genetic analysis.

From the family study, unrelated cases and controls were selected for association testing.

FSCD

The Family Study of Cocaine Dependence (FSCD) was initiated in 2000 with the primary goal of increasing understanding of the familial and non-familial antecedents and consequences of cocaine dependence 16. Cocaine dependent subjects were recruited from publicly and privately funded inpatient and outpatient chemical dependency treatment centers in the St. Louis, Missouri region. Eligibility requirements included DSM-IV cocaine dependence 8, being at least 18 years of age, speaking fluent English, and having a full sibling within five years of their age who was willing to participate. Community-based comparison subjects were recruited through driver’s license records from the Missouri Family Registry maintained by Washington University in St. Louis for research purposes. Community-based comparison participants were matched to cocaine dependent subjects based on date of birth (within 1 year), ethnicity, gender, and zip code. The Institutional Review Board at Washington University approved the protocol.

Following the informed consent process, participants were assessed for baseline demographics, psychiatric disorders, and substance use history using the Semi-Structured Assessment of Cocaine Dependence, which was developed specifically for this study. The interview is modeled after the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA) 2, 3. The SSAGA is a validated instrument developed to provide a detailed evaluation of alcohol, nicotine and other substance use disorders. The SSAGA has been utilized in many large-scale genetic studies of substance dependence (4, 5 and has high test-retest reliability in short-term (one week) and long-term (5 year) studies 6, 7. Participants also provided a blood sample for genetic analysis.

References

1. Heatherton TF, Kozlowski LT, Frecker RC, Fagerstrom KO. The Fagerstrom Test for Nicotine Dependence: a revision of the Fagerstrom Tolerance Questionnaire. Br J Addict 1991; 86(9): 1119-1127.

2. Bucholz KK, Cadoret R, Cloninger CR, Dinwiddie SH, Hesselbrock VM, Nurnberger JI, Jr. et al. A new, semi-structured psychiatric interview for use in genetic linkage studies: a report on the reliability of the SSAGA. J Stud Alcohol 1994; 55(2): 149-158.

3. Hesselbrock M, Easton C, Bucholz KK, Schuckit M, Hesselbrock V. A validity study of the SSAGA--a comparison with the SCAN. Addiction 1999; 94(9): 1361-1370.

4. Begleiter H RT, Hesselbrock V, Porjesz B, Li TK, Schuckit M, Edenberg HJ, Rice. The Collaborative Study on the Genetics of Alcoholism. Alcohol Health Res World 1995; 19(3): 228-236.

5. Gelernter J, Liu X, Hesselbrock V, Page GP, Goddard A, Zhang H. Results of a genomewide linkage scan: support for 9 and 11 loci increasing risk for cigarette smoking. Am J Med Genet B Neuropsychiatr Genet 2004; 128B(1): 94-101.

6. Bucholz KK, Hesselbrock VM, Shayka JJ, Nurnberger JI, Jr., Schuckit MA, Schmidt I et al. Reliability of individual diagnostic criterion items for psychoactive substance dependence and the impact on diagnosis. J Stud Alcohol 1995; 56(5): 500-505.

7. Culverhouse R, Bucholz KK, Crowe RR, Hesselbrock V, Nurnberger JI, Jr., Porjesz B et al. Long- term stability of alcohol and other substance dependence diagnoses and habitual smoking: an evaluation after 5 years. Arch Gen Psychiatry 2005; 62(7): 753-760.

8. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 4th edn. American Psychiatric Association: Washington DC, 1994.

9. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM5. American Psychiatric Association: Washington, D.C, 2013.

10. Costa PT Jr MR. Manual for the Revised NEO Personality Inventory (NEO-PIR) and NEO Five- Factor Inventory (NEO-FFI). . Psychological Assessment Resources: Odessa, FL., 1992.

11. Shiffman S, Waters A, Hickcox M. The nicotine dependence syndrome scale: a multidimensional measure of nicotine dependence. Nicotine Tob Res 2004; 6(2): 327-348.

12. Piper ME, Piasecki TM, Federman EB, Bolt DM, Smith SS, Fiore MC et al. A multiple motives approach to tobacco dependence: the Wisconsin Inventory of Smoking Dependence Motives (WISDM-68). J Consult Clin Psychol 2004; 72(2): 139-154.

13. Bass PF, 3rd, Wilson JF, Griffith CH. A shortened instrument for literacy screening. J Gen Intern Med 2003; 18(12): 1036-1038.

14. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-III-R) American Psychiatric Association Washington, D.C, 1987.

15. Feighner JP, Robins E, Guze SB, Woodruff RA, Jr., Winokur G, Munoz R. Diagnostic criteria for use in psychiatric research. Arch Gen Psychiatry 1972; 26(1): 57-63.

16. Bierut LJ, Strickland JR, Thompson JR, Afful SE, Cottler LB. Drug use and dependence in cocaine dependent subjects, community-based individuals, and their siblings. Drug Alcohol Depend 2008; 95(1-2): 14-22.

Supplementary Figure 1: Linkage disequilibrium (LD) structure of the chr 3 locus. Colours and numbers reflect R2 measures of LD.

Supplementary Figure 2: LD structure of the chr 6 locus. Colours and numbers reflect R2 measures of LD.

Supplementary Figure 3. A) 50

40

30 RPKM 20

10

0

B)

40

30

RPKM 20

10

0

Liver Lung Testis Ovary Spleen UterusVagina Bladder PituitaryProstate Thyroid Pancreas Stomach Artery - AortaArtery - Tibial Nerve - Tibial Whole Blood Adrenal Gland Brain - Cortex Fallopian Tube Colon - Sigmoid Kidney - Cortex Artery - Coronary Brain - Amygdala Muscle - Skeletal Brain - Cerebellum CervixCervix - Ectocervix - EndocervixColon - Transverse Brain Brain- Hippocampus - Hypothalamus Esophagus - Mucosa Heart - Left Ventricle Minor Salivary Gland Brain - Substantia nigra Esophagus - Muscularis Adipose - Subcutaneous Breast - Mammary Tissue Heart - Atrial Appendage Brain - Frontal Cortex (BA9) Adipose - Visceral (Omentum) Brain - Cerebellar Hemisphere Cells - Transformed fibroblasts Brain - Caudate (basal ganglia) Brain Brain- Putamen - Spinal (basal cord ganglia) (cervical c-1) Skin - SunSmall Exposed Intestine (Lower - Terminal leg) Ileum

Cells - EBV-transformed lymphocytes Skin - Not Sun Exposed (Suprapubic) Brain - Anterior cingulate cortex (BA24) Esophagus - Gastroesophageal Junction Brain - Nucleus accumbens (basal ganglia) Supplmentary Figure 4.

A) B)

C) D) Supplementary Figure 5. A) 10

8

6 RPKM

4

2

0

B) 1.5

1.0 RPKM

0.5

0.0

1.0 C) 0.8

0.6 RPKM 0.4

0.2

0.0

30 D)

20 RPKM

10

0

40 E)

30

20 RPKM

10

0

Liver Lung Testis Ovary Spleen UterusVagina Bladder PituitaryProstate Thyroid Pancreas Stomach Artery - AortaArtery - Tibial Nerve - Tibial Whole Blood Adrenal Gland Brain - Cortex Fallopian Tube Colon - Sigmoid Kidney - Cortex Artery - Coronary Brain - Amygdala Muscle - Skeletal Brain - Cerebellum CervixCervix - Ectocervix - EndocervixColon - Transverse Brain Brain- Hippocampus - Hypothalamus Esophagus - Mucosa Heart - Left Ventricle Minor Salivary Gland Brain - Substantia nigra Esophagus - Muscularis Adipose - Subcutaneous Breast - Mammary Tissue Heart - Atrial Appendage Brain - Frontal Cortex (BA9) Adipose - Visceral (Omentum) Brain - Cerebellar Hemisphere Cells - Transformed fibroblasts Brain - Caudate (basal ganglia) Brain Brain- Putamen - Spinal (basal cord ganglia) (cervical c-1) Skin - SunSmall Exposed Intestine (Lower - Terminal leg) Ileum

Cells - EBV-transformed lymphocytes Skin - Not Sun Exposed (Suprapubic) Brain - Anterior cingulate cortex (BA24) Esophagus - Gastroesophageal Junction Brain - Nucleus accumbens (basal ganglia) Supplmentary Figure 6.

A) B)

C) Supplementary Table 1. Genome-wide significant loci associated with risk-taking in UK Biobank: basic and conditional analyses

Basic analysis (N =115 Conditional analysis (n=115 770) 770)

SNP CHR BP A1 MAF NMISS BETA SE P BETA SE P rs10511087 3 85 439 136 A 0.379 115770 0.055 0.010 3.44E-08 0.032 0.014 2.48E-02 rs1368742 3 85 442 208 A 0.380 115693 0.055 0.010 3.44E-08 0.032 0.014 2.63E-02 rs1991872 3 85 443 793 G 0.379 115764 0.055 0.010 3.51E-08 0.032 0.014 2.51E-02 rs12637798 3 85 446 776 A 0.380 115778 0.055 0.010 2.97E-08 0.032 0.014 2.37E-02 rs4308294 3 85 447 830 A 0.379 115787 0.055 0.010 3.18E-08 0.032 0.014 2.46E-02 rs12638482 3 85 454 815 C 0.380 115915 0.055 0.010 4.66E-08 0.031 0.014 3.15E-02 rs11918899 3 85 458 577 A 0.381 115484 0.055 0.010 4.52E-08 0.031 0.014 3.15E-02 rs4856569 3 85 464 125 A 0.380 115979 0.055 0.010 3.42E-08 0.030 0.014 3.39E-02 rs80285517 3 85 467 372 C 0.378 115471 0.056 0.010 1.87E-08 0.030 0.014 3.49E-02 rs2053108 3 85 467 972 C 0.380 116180 0.055 0.010 3.61E-08 0.029 0.014 4.40E-02 rs6808400 3 85 468 059 T 0.380 116180 0.055 0.010 3.61E-08 0.029 0.014 4.40E-02 rs62250686 3 85 468 446 T 0.379 115807 0.055 0.010 2.81E-08 0.029 0.014 4.65E-02 rs6780968 3 85 469 041 C 0.380 116188 0.055 0.010 3.51E-08 0.028 0.014 4.68E-02 rs62250687 3 85 469 353 T 0.380 116157 0.055 0.010 4.11E-08 0.028 0.014 4.87E-02 rs60311538 3 85 470 687 T 0.380 115882 0.055 0.010 3.90E-08 0.028 0.014 5.02E-02 rs144059553 3 85 471 325 AAC 0.380 116142 0.055 0.010 4.43E-08 0.028 0.014 4.91E-02 rs7636243 3 85 472 033 C 0.380 116158 0.055 0.010 3.68E-08 0.028 0.014 4.73E-02 rs7650284 3 85 472 227 T 0.380 116177 0.055 0.010 3.92E-08 0.028 0.014 4.76E-02 rs7638953 3 85 472 462 G 0.380 116254 0.055 0.010 4.36E-08 0.028 0.014 4.74E-02 rs4856571 3 85 474 868 G 0.381 115911 0.055 0.010 4.19E-08 0.027 0.014 6.24E-02 rs35894540 3 85 475 647 T 0.380 115911 0.055 0.010 3.71E-08 0.027 0.014 5.81E-02 rs11716233 3 85 481 659 T 0.225 115967 0.064 0.012 3.01E-08 -0.008 0.102 9.37E-01 rs35827242 3 85 482 661 G 0.380 115737 0.055 0.010 3.59E-08 0.028 0.014 5.07E-02 rs35489310 3 85 486 209 G 0.225 116028 0.064 0.012 2.59E-08 -0.012 0.102 9.08E-01 rs12629036 3 85 488 299 T 0.380 115857 0.054 0.010 4.77E-08 0.026 0.014 6.52E-02 rs9874491 3 85 489 907 T 0.225 116038 0.064 0.012 2.62E-08 -0.008 0.102 9.40E-01 rs2082556 3 85 490 947 G 0.381 115857 0.054 0.010 4.97E-08 0.026 0.014 6.78E-02 rs9820228 3 85 493 753 T 0.225 115898 0.063 0.012 3.90E-08 -0.074 0.104 4.76E-01 rs2033526 3 85 496 758 A 0.384 113867 0.057 0.010 1.34E-08 0.030 0.014 3.62E-02 rs9828679 3 85 497 198 C 0.225 115993 0.064 0.012 3.10E-08 -0.068 0.104 5.16E-01 rs145394945 3 85 503 999 C 0.379 115451 0.055 0.010 4.74E-08 0.027 0.014 6.44E-02 rs2196098 3 85 505 246 G 0.225 116004 0.064 0.012 3.65E-08 -0.095 0.106 3.71E-01 rs13068434 3 85 508 814 G 0.225 115983 0.064 0.012 3.42E-08 -0.095 0.106 3.71E-01 rs78867021 3 85 510 809 TATG 0.379 115111 0.055 0.010 3.68E-08 0.027 0.014 6.27E-02 rs6762267 3 85 513 115 C 0.381 115777 0.055 0.010 4.37E-08 0.026 0.014 7.33E-02 rs62250712 3 85 513 716 C 0.381 115738 0.055 0.010 4.67E-08 0.026 0.014 7.45E-02 rs62250713 3 85 513 793 A 0.363 115747 0.056 0.010 2.13E-08 0.027 0.015 7.30E-02 rs112911909 3 85 517 112 A 0.364 115429 -0.057 0.010 1.66E-08 -0.044 0.012 1.26E-04 rs13070166 3 85 517 507 A 0.229 114949 0.065 0.012 1.71E-08 -0.013 0.156 9.32E-01 rs34133544 3 85 517 748 G 0.352 115756 0.056 0.010 4.02E-08 0.025 0.016 1.16E-01 rs62250716 3 85 517 752 G 0.366 114441 -0.058 0.010 1.24E-08 -0.045 0.012 1.02E-04 rs2875907 3 85 518 580 A 0.352 115685 0.056 0.010 3.16E-08 0.025 0.016 1.16E-01 rs960986 3 85 519 305 T 0.364 115672 -0.056 0.010 2.87E-08 -0.043 0.012 1.91E-04 rs144888873 3 85 520 967 T 0.226 115689 0.065 0.012 1.44E-08 -0.357 0.516 4.88E-01 rs62250717 3 85 521 990 G 0.369 116254 -0.055 0.010 4.82E-08 -0.041 0.011 3.19E-04 rs9841144 3 85 531 199 T 0.227 116147 0.064 0.011 2.46E-08 0.146 0.333 6.62E-01 rs1379778 3 85 536 091 AT 0.355 111797 -0.058 0.010 2.77E-08 -0.043 0.012 2.25E-04 rs9849399 3 85 537 665 C 0.227 116142 0.064 0.011 2.76E-08 0.026 0.343 9.41E-01 rs56088977 3 85 538 811 G 0.218 113908 0.064 0.012 4.55E-08 0.259 0.187 1.66E-01 rs34418561 3 85 538 846 C 0.227 116139 0.064 0.011 2.63E-08 0.026 0.343 9.41E-01 rs138246680 3 85 539 217 C 0.354 113348 0.056 0.010 4.86E-08 0.027 0.016 8.04E-02 rs9844512 3 85 544 070 A 0.227 116072 0.064 0.011 2.59E-08 -0.662 0.663 3.19E-01 rs76395182 3 85 547 337 G 0.341 108695 -0.061 0.011 1.06E-08 -0.046 0.012 1.12E-04 rs4637303 3 85 552 787 C 0.351 115832 0.056 0.010 4.42E-08 0.026 0.016 9.77E-02 rs6809805 3 85 553 096 A 0.227 116094 0.064 0.012 2.13E-08 0.440 0.495 3.74E-01 rs13084531 3 85 553 994 G 0.225 115264 0.067 0.012 8.75E-09 rs12495758 3 85 554 262 G 0.351 116170 0.055 0.010 4.62E-08 0.024 0.016 1.20E-01 rs62250759 3 85 569 026 G 0.368 115471 -0.055 0.010 4.90E-08 -0.041 0.011 3.11E-04 rs9841829 3 85 569 361 G 0.227 116056 0.064 0.012 3.34E-08 0.080 0.302 7.92E-01 rs9851444 3 85 576 706 C 0.229 115054 0.065 0.012 2.06E-08 0.018 0.167 9.14E-01 rs9873400 3 85 590 097 A 0.227 115973 0.063 0.012 4.61E-08 -0.232 0.213 2.77E-01 rs12492753 3 85 601 440 T 0.351 115953 0.056 0.010 3.73E-08 0.023 0.016 1.33E-01 rs72585634 3 85 602 434 ATC 0.354 115188 0.057 0.010 2.28E-08 0.025 0.016 1.05E-01 rs7652808 3 85 603 643 T 0.351 116254 0.055 0.010 4.90E-08 0.024 0.016 1.29E-01 rs11713902 3 85 604 425 T 0.225 115393 0.063 0.012 4.00E-08 0.032 0.171 8.50E-01 rs13077660 3 85 616 260 C 0.226 115563 0.063 0.012 4.73E-08 0.011 0.173 9.47E-01 rs13353478 3 85 621 395 T 0.226 115473 0.064 0.012 3.68E-08 -0.045 0.186 8.07E-01 rs6790699 3 85 624 189 A 0.372 115930 0.055 0.010 4.91E-08 0.026 0.015 7.33E-02 rs9379971 6 27 259 308 A 0.349 109344 -0.063 0.011 2.31E-09 rs7746841 6 27 261 700 G 0.451 116026 -0.054 0.010 3.94E-08 -0.018 0.019 3.44E-01 rs35316606 6 27 267 230 CA 0.451 116143 -0.054 0.010 3.32E-08 -0.020 0.019 2.91E-01 rs2393923 6 27 270 261 C 0.321 116144 -0.059 0.010 1.73E-08 -0.028 0.016 7.42E-02 rs7771953 6 27 271 343 A 0.437 116254 -0.055 0.010 1.88E-08 -0.018 0.020 3.45E-01 rs6456772 6 27 273 365 A 0.451 116098 -0.054 0.010 2.69E-08 -0.021 0.019 2.49E-01 rs3800319 6 27 280 407 G 0.435 116010 -0.055 0.010 2.38E-08 -0.015 0.020 4.59E-01 rs2893911 6 27 283 303 A 0.436 116051 -0.054 0.010 3.25E-08 -0.017 0.020 4.02E-01 rs9379973 6 27 288 419 C 0.453 116074 -0.055 0.010 2.11E-08 -0.022 0.018 2.28E-01 rs112356720 6 27 289 497 ACTTTT 0.453 115962 -0.055 0.010 1.63E-08 -0.023 0.018 2.12E-01 rs6923811 6 27 289 776 C 0.320 116254 -0.059 0.010 1.92E-08 -0.025 0.016 1.18E-01 rs2092114 6 27 290 082 C 0.459 116117 -0.055 0.010 2.04E-08 -0.025 0.018 1.55E-01 rs9368501 6 27 295 082 G 0.453 116254 -0.055 0.010 2.16E-08 -0.021 0.018 2.51E-01 Where: Basic analysis included age, sex, 8 principle components and array; Conditional analysis included age, sex, 8 principle components and array and the index SNP for each locus. The index SNP in each locus is highlihgted in grey. SNPs in the suggestive secondary signal in chr 3 are in bold.

Supplementary Table 2: Variant Effect Predictor summary of impact of GWAS-significant SNPs on nearby proteins SNP Chr BP Allele Consequence IMPACT Gene Gene type Feature rs10511087 3 85389986 G intron variant MODIFIER CADM2 coding ENST00000383699 rs1368742 3 85393058 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs1991872 3 85394643 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs12637798 3 85397626 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs4308294 3 85398680 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs12638482 3 85405665 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs4856569 3 85414975 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs80285517 3 85418222 ATT intron variant MODIFIER CADM2 protein coding ENST00000383699 rs2053108 3 85418822 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs6808400 3 85418909 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs62250686 3 85419296 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs6780968 3 85419891 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs62250687 3 85420203 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs60311538 3 85421537 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs144059553 3 85422175 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs7636243 3 85422883 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs7650284 3 85423077 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs7638953 3 85423312 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs4856571 3 85425718 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs35894540 3 85426497 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs11716233 3 85432509 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs35827242 3 85433511 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs35489310 3 85437059 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs12629036 3 85439149 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9874491 3 85440757 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs2082556 3 85441797 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9820228 3 85444603 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs2033526 3 85447608 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9828679 3 85448048 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs145394945 3 85454849 AT intron variant MODIFIER CADM2 protein coding ENST00000383699 rs2196098 3 85456096 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs13068434 3 85459664 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs78867021 3 85461659 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs6762267 3 85463965 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs62250712 3 85464566 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs62250713 3 85464643 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs112911909 3 85467962 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs13070166 3 85468357 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs34133544 3 85468598 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs62250716 3 85468602 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs2875907 3 85469430 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs960986 3 85470155 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs144888873 3 85471817 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs62250717 3 85472840 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9841144 3 85482049 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9849399 3 85488515 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs56088977 3 85489661 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs34418561 3 85489696 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs138246680 3 85490067 ATTAT intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9844512 3 85494920 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs76395182 3 85498187 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs4637303 3 85503637 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs6809805 3 85503946 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs13084531 3 85504844 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs12495758 3 85505112 A 5 prime UTR MODIFIER CADM2 protein coding NM_001256502.1 variant rs62250759 3 85519876 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9841829 3 85520211 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9851444 3 85527556 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9873400 3 85540947 A intron variant MODIFIER CADM2 protein coding ENST00000383699 rs72585634 3 85553284 - intron variant MODIFIER CADM2 protein coding ENST00000383699 rs7652808 3 85554493 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs11713902 3 85555275 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs13077660 3 85567110 C intron variant MODIFIER CADM2 protein coding ENST00000383699 rs13353478 3 85572245 T intron variant MODIFIER CADM2 protein coding ENST00000383699 rs6790699 3 85575039 G intron variant MODIFIER CADM2 protein coding ENST00000383699 rs9379971 6 27291529 A intron variant MODIFIER POM121L2 protein coding ENST00000429945 rs7746841 6 27293921 G intron variant MODIFIER POM121L2 protein coding ENST00000429945 rs35316606 6 27299451 A intron variant MODIFIER POM121L2 protein coding ENST00000429945 rs2393923 6 27302482 C intron variant MODIFIER POM121L2 protein coding ENST00000429945 rs7771953 6 27303564 A intron variant MODIFIER POM121L2 protein coding ENST00000429945 rs6456772 6 27305586 A intron variant MODIFIER POM121L2 protein coding ENST00000429945 rs3800319 6 27312628 G intron variant MODIFIER LOC105375107 protein coding XM_011515048.2 rs2893911 6 27315524 A upstream gene MODIFIER POM121L2 protein coding ENST00000429945 variant rs9379973 6 27320640 C upstream gene MODIFIER VN1R10P unprocessed ENST00000447106 variant pseudogene rs112356720 6 27321718 CTTTT upstream gene MODIFIER VN1R10P unprocessed ENST00000447106 variant pseudogene rs6923811 6 27321997 C upstream gene MODIFIER VN1R10P unprocessed ENST00000447106 variant pseudogene rs2092114 6 27322303 C upstream gene MODIFIER VN1R10P unprocessed ENST00000447106 variant pseudogene rs9368501 6 27327303 G downstream MODIFIER VN1R10P unprocessed ENST00000447106 gene variant pseudogene

Supplementary Table 3: Genes in GWAS significant loci

Locus Gene EntrezID Name Localisation Gene sumamry () Gene (GeneCards) (GeneCards) (GeneCards) expression pattern (GTEx ) chr 3 CADM2*^ 253559 cell adhesion plasma Cell adhesion binding, synaptic cell adhesion low molecule 2 membrane receptor activity, molecule/immunoglobulin expression, receptor binding, superfamily except in protein brain homodimerisation activity CADM2-AS2 100874037 CADM2 Antisense NA NA NA NA RNA 2 miR5688 100847077 NA NA NA NA NA chr 6 HIST1H2AG 8969 Cluster 1 Nuclear and DNA binding, protein intronless replication- low H2A Family extracellular binding, enzyme dependent histone expression, Member G binding, protein except in heterodimerisation testis activity HIST1H2AH 85235 Histone Cluster 1 Nuclear and molecular function, DN intronless replication- generally low H2A Family extracellular binding, protein dependent histone expression Member H heterodimerisation HIST1H2BJ 8970 Histone Cluster 1 Nuclear and DNA binding, protein intronless replication- generally low H2B Family extracellular heterodimersation dependent histone expression Member J HIST1H2BK 85236 Histone Cluster 1 Nuclear and DNA binding, protein A replication-dependent generally low H2B Family extracellular heterodimersation histone with antibacterial and expression Member K antifungal properties properties HIST1H4I 8294 Histone Cluster 1 Nuclear and DNA binding, protein replication-dependent histone generally low H4 Family extracellular binding, histone binding, expression Member I polyA RNA binding LOC105375107^ 105375107 uncharacterized NA NA NA NA protein miR3143 100422934 NA NA NA NA NA POM121L2^ 94026 POM121 nuclear nucleocytoplasmic NA exclusively Transmembrane trasnporter activity, expressed in Nucleoporin Like nuclear localistion the testis 2 sequence binding, structural constitutent of nuclear pore PRSS16* 10279 Protease, Serine lysosome and serine carboxypeptidase exclusively expressed by the mixed 16 endosome activity, dipeptidyl- thymus, involved in alternative expression peptidase activity, antigen presentation pathway hydrolase activity during T cell selection ZSCAN4, RP5- 7741 Zinc Finger And nuclear transcription factor NA mixed 874C20.3 SCAN Domain activity, protein binding, expression Containing 26 sequence specific DNA binding, metal ion binding VN1R10P^ 387316 Vomeronasal 1 NA NA related to long non coding low Receptor 10 RNAs expression, Pseudogene except in testis ZNF184 7738 Zinc Finger Nuclear (and DNA binding, Zinc ion Predicted to contain Krupple mixed Protein 184 cytosolic) binding, Metal ion boxes and a zinc finger domain expression, binding highest in testis and brain ZNF204P* 7754 Zinc Finger NA NA transcribed pseudogene, N mixed Protein 204, terminal truncation lacking expression Pseudogene Krupple and zinc finger domains ZNF391 346157 Zinc Finger Nuclear DNA binding, TF activity NA mixed Protein 391 and sequence specific expression, DNA binding, metal ion highest in binding testis ZNF192P2 222701 Zinc Finger NA NA related to long non coding low Protein 192 RNAs expression, Pseudogene 2 except in testis Where: NA, not available: * genes highlighted by eQTL analysis in brain: ^ genes highlighted by VEP

Supplementary Table 4: Previously reported signals in the CADM2 and chr 6 loci and results in the risk-taking GWAS

GWAS catalogue UK BioBank Risk-taking behaviour GWAS locus trait SNP-allele PMID CHR SNP BP A1 A2 MAF NMISS BETA SE P 3p12.1 ADHD rs7642644-T 26174813 3 rs7642644 84920368 C T 0.135 114901 -0.025 0.014 8.57E-02 3p12.1 visceral adiposity rs13323436-A 22589738 3 rs13323436 85502845 A T 0.075 112775 0.069 0.018 1.92E-04 3p12.1 longevity rs9841144-? 25199915 3 rs9841144 85531199 T A 0.227 116147 0.064 0.011 2.46E-08 3p12.1 cognitive rs17518584-T 25644384, 3 rs17518584 85604923 C T 0.376 114005 0.053 0.010 1.17E-07 function, 25869804 information processing 3p12.1 educational rs55686445-C 27046643 3 rs55686445 85671676 C T 0.349 113498 -0.046 0.010 6.76E-06 attainment 3p12.1 BMI rs13078960-G 25673413 3 rs13078960 85807590 G T 0.200 114963 0.027 0.012 2.73E-02 3p12.1 temperament rs12494658-T 22832960 3 rs12494658 85874476 C T 0.237 116254 -0.012 0.011 3.12E-01 3p12.1 BMI, obesity rs13078807-G 20935630, 3 rs13078807 85884150 G A 0.200 116254 0.027 0.012 2.49E-02 23563607 3p12.1 alzhiemers rs71316816-C 25778476 3 rs71316816 85905034 T C 0.068 115133 0.020 0.019 2.89E-01 3p12.1 educational rs112374913- 27046643 3 rs112374913 85946506 A G 0.395 114005 -0.014 0.010 1.70E-01 attainment ? 3p12.1 subq adipose rs2324999-T 22589738 3 rs2324999 86158885 T C 0.196 116254 0.019 0.012 1.27E-01 3p12.1 educational rs56262138-A 27225129 3 rs56262138 86183716 A T 0.296 110158 -0.025 0.011 2.10E-02 attainment 3p12.1 educational rs62263923-A 27225129 na na attainment 3p12.1 age first sexual rs12714592-A 27089180 3 rs12714592 84387950 C A 0.271 115754 0.015 0.011 1.72E-01 intercourse 3p12.1 age first sexual rs57401290- 27089180 na na intercourse GGTGTGT 3p12.1 risk propensity rs4856591 27089180 na na 6p22.1 SCZ rs55834529-A 26198764 6 rs55834529 27072542 G A 0.111 114699 -0.075 0.016 2.28E-06 6p22.1 SCZ rs13194053-T 19571811, 6 rs13194053 27143883 C T 0.189 116254 -0.047 0.012 1.89E-04 19571809 6p22.1 SCZ rs6932590-T 19571808 6 rs6932590 27248931 C T 0.262 114951 -0.049 0.011 1.08E-05 6p22.1 SCZ rs16897515-C 23894747 6 rs16897515 27278020 A C 0.190 116254 -0.051 0.012 4.00E-05 6p22.1 SCZ rs17693963-? 22688191, 6 rs17693963 27710165 C A 0.124 116254 -0.054 0.015 2.87E-04 24166486, 24280982 6p22.1 SCZ rs34706883- 26198764 6 rs34706883 27805255 C A 0.118 116254 -0.060 0.015 7.20E-05 A 6p22.1 SCZ rs1635-? 22197929 6 rs1635 28227604 A C 0.044 116254 -0.036 0.024 1.35E-01 6p22.1 depression rs853679-A 27089181 6 rs853679 28296863 A C 0.164 116254 -0.051 0.013 1.06E-04 6p22.1 daytime sleep rs150548387- 27126917 6 rs150548387 28800281 G A 0.038 116215 -0.030 0.026 2.35E-01 G 6p22.1 neuroticism rs114304113- 27089181 6 rs114304113 28916835 T G 0.038 116236 -0.032 0.026 2.10E-01 C 6p22.1 social rs9257616-G 24047820 6 rs9257616 29180721 A G 0.446 116254 0.041 0.010 2.17E-05 communications problems 6p22.1 Migrane wo aura rs3095267-? 23793025 6 rs3095267 29607046 C G 0.226 116108 -0.023 0.012 4.36E-02 6p22.1 SCZ rs385492-T 26198764 6 rs385492 29649547 T C 0.452 116254 0.011 0.010 2.53E-01 6p22.1 SCZ rs3131888-C 26198764 6 rs3131888 29663889 C T 0.427 116254 -0.018 0.010 6.87E-02 6p22.1 parental rs138847126- 27015805 6 rs138847126 29909278 A AT 0.015 108615 -0.026 0.042 5.41E-01 longevity ? 6p22.1 mood disorders rs8321-? 23453885 6 rs8321 30032522 C A 0.133 116254 -0.042 0.014 3.20E-03 6p22.1 SCZ rs2523722-G 22883433 6 rs2523722 30165273 T C 0.230 116254 -0.037 0.012 1.27E-03 6p22.1 SCZ, mood rs2021722-C 21926974, 6 rs2021722 30174131 T C 0.233 116249 -0.036 0.012 1.85E-03 disorders 23453885 6p22.1 Anger rs2844775-A 24489884 6 rs2844775 30179422 A G 0.250 116254 0.008 0.011 4.53E-01 6p22.1 SCZ rs9272219-G 19571809 6 rs9272219 32602269 T G 0.257 116254 -0.016 0.011 1.53E-01 6p22.1 daytime sleep rs114080364- 27126917 na na G

6p22.1 SCZ rs114200269- 26198764 na na T 6p22.1 daytime sleep rs114683528- 27126917 na na T 6p22.1 SCZ rs115329265- 25056061 na na A 6p22.1 daytime sleep rs116330539- 27126917 na na G 6p22.1 subjective rs144077837- 27089181 na na wellbeing A 6p22.1 early aggressive rs2015436-T 26087016 na na behaviour 6p22.1 Cognitive rs34704616-? 20125193 na na performance 6p22.1 neuroticism rs9468186-A 27089181 na na Where: na: not available