The Genetics of Autism and Related Traits

Varun Warrier St. John’s College

Autism Research Centre, Department of Psychiatry University of Cambridge

1 University Notes This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration except as declared below and specified in the text.

It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in below and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University of similar institution except as declared below.

It does not exceed the prescribed word limit of 60,000 words.

Part of Chapter 1 is a part of a review on the genetics of autism co-written with Simon Baron-Cohen. I wrote the first draft of the review which was edited by Simon. This review is titled: Genetics of Autism. Warrier V and Baron-Cohen S. eLS [In Press].

Chapter 2 was conducted in close collaboration with Vivienne Chee who was an intern from Imperial College London at the Autism Research Centre. I co-supervised Vivienne Chee. I designed the study, and conducted the literature review and meta-analysis with Vivienne. I wrote the first draft of the manuscript, which was edited by all the other authors. This study has been published as: A comprehensive meta-analysis of common genetic variants in autism spectrum conditions. Warrier V, Chee V, Smith P, Chakrabarti B, Baron-Cohen S. Mol Autism. 2015 Aug 28;6:49.

Chapters 3, 4, and 7 were conducted in collaboration with a team of scientists. The GWAS for the three datasets were conducted by 23andMe and I conducted all the subsequent analyses using the summary statistics. In addition, for Chapter 4, the GWAS for the BLTS cohort was conducted by Katrina Grasby, and I conducted the meta-analyses using the summary statistics from the 23andMe cohort and the BLTS cohort. Katrina Grasby also conducted the twin heritability analyses. I co-designed the study with the lead authors on the papers, Simon Baron-Cohen and Thomas Bourgeron. I wrote the first draft of the manuscripts. Chapter 3 is under review in Translational Psychiatry as: Genome-wide analyses of self- reported empathy: correlations with autism, schizophrenia, and anorexia nervosa. Warrier V, Toro R, Chakrabarti C, the iPSYCH-Broad autism consortium, the 23andMe Research Team, Hinds DA, Bourgeron T, and Baron-Cohen S. Chapter 4 has been published in Molecular

2 Psychiatry as: A genome-wide meta-analysis of cognitive empathy: heritability and correlates of the ‘Reading the Mind in the Eyes’ test with psychiatric conditions, psychological traits and cognition. Warrier V, Grasby K, Uzefovsky F, Toro R, Smith P, Chakrabarti B, Khadake J, Mawbey-Adamson E, Litterman N, Hottenga J, Lubke G, Boomsma D, Martin NG, Hatemi PK, Medland SE, Hinds DA, Bourgeron T, and Baron-Cohen S. Mol Psychiatry [Epub Ahead of Print]. Chapter 7 has been written for publication.

I designed and conducted the analyses for Chapters 5 and 6. For Chapter 5, the genotyping, imputation, and primary quality control was conducted by ALSPAC. For Chapter 6, the genotyping, imputation and primary quality control was conducted by the UK Biobank. I conducted the GWAS, subsequent quality control and downstream analyses. I wrote the first draft of the manuscripts which was edited by the other authors involved in the study. I was supervised by Simon Baron-Cohen on both the chapters and, additionally, by Thomas Bourgeron for Chapter 6. Chapter 5 is under review and Chapter 6 has been written for publication.

Given the highly collaborative nature of this work, I have used the term ‘we’ for all the chapters in this thesis.

This thesis also contains several appendices. Where possible, we have provided the full table. However, for the larger tables containing -based and pathway-based analyses, we have provided the top 100 most significant results. The full tables can be downloaded from here: https://www.dropbox.com/sh/9xu6zvtmsqyjf2y/AAAUTAt33dVXFKcPu7hjxr-ka?dl=0.

3 Acknowledgements The last three years have been tremendously exciting, filled with adventure, joy, and bad puns, and I have very many people to thank for this.

First, Simon. Simon, thank you for being ever so patient, kind, generous and filled with positivity. You gave me so many opportunities that a PhD student could only have dreamed of. You allowed me to form my own ideas, start collaborations with researchers across the globe, get my hands wet with grant writing, and even laughed encouragingly at all my terrible jokes. I could not have asked for a better mentor. I’ve been told that a PhD is like a roller coaster with ups and downs. I seemed to have gotten a ticket to a different amusement park - this PhD was an unbelievable thrill ride that just got better and better. Thank you for this golden ticket!

Second, in equal measure, my two fantastic advisors. Bhisma, I remember the late evening skype conference calls with patchy internet at my end. You walked me through the very basics of genetics - from using PLINK to how to correct for multiple testing. I decided to stay on in human and statistical genetics for my PhD because of you. You encouraged me to learn statistics and programming. Thank you for all of this and more! Thomas, I am so fortunate that Simon suggested that you should examine my MPhil thesis. Thank you for always being encouraging, for always being happy to skype, for being perennially excited about any ideas that I suggest even as I doubted them, for the countless letters of reference, and more! Here’s hoping that I get more opportunities to visit you in Paris - a city that taught me that broccoli and snails can be tasty.

Third, to my amazing group of friends and colleagues at the Autism Research Centre. Richard, thanks for being a pillar of common-sense and support. It’s so exciting and enriching to talk to someone working on something entirely different, and yet come up with new ideas to collaborate on. Here’s to more such exciting collaborations, to fun trips (Georgia, next?), board game nights, winning the weekly pub-quiz at Alma, and imaging genetics! I still don’t understand how you survived without coffee in academia, but that’s a discussion for another day. Owen and Ezra, it’s been great sharing an office with you. Ezra, thank you for always allowing me to switch on the lights, and make it as bright as summer’s noon in the tropics. I’m still on the fence about peeling the figs before eating them, but I’m keeping an open mind.

4 Owen, thanks for all the lovely banter, the help with R and data analysis, the cheeky Nando’s trips, and for coining the name Bristol Feminists. Never forget, cheeky is not cheeky in Turkey.

Clara, Dori, Helena, Aicha, Florina, Amandine, Amber, Sarah, Gareth, Jack, Alex, Alexa, Jan, Arko, and Rosie thanks for all the wonderful company, for filling the three something years with laughter, for the numerous trips to exciting and not exciting places: the local Co-op, Morocco, the Cambridge Oktoberfest, San Francisco, the Waitrose in Trumpington, Israel, the Sainsbury’s by the railway station, and the Grand Tetons to name a few. Alexa, if it were not for your vigilance, we would certainly have been mauled by a bear or an angry grouse. Gareth, I hope you enjoyed the concert but I’m still waiting for my finest wine in the land. And, Clara, I will forever be a Real Madrid supporter. And, oh, Helena, I will forever be a Barcelona supporter.

Becky, Paula, and Carrie, thank you for being the go-to persons for almost everything. Thank you for helping me wade through the tricky waters of project proposals, grant applications, ethics applications, research agreements...the list is long!

My time in Cambridge has been enriched by so very many friends. Thank you, Gavin, Courtlin, Dave, Kweku, Devyani, and Elijah for all the hikes, the board game nights, the cook- offs, the beekeeping sessions, and all the excellent shenanigans we got up to. Good luck trying to best me at the next chili cook-off.

I would also like to express my sincere gratitude to St John’s College, Cambridge and the Cambridge University Trusts for generously funding my PhD. I am also extremely thankful to the numerous collaborators who have been so generous with their data, expertise, and time. It’s great to be in a field that is so openly collaborative, happy to share data, and answer tricky statistical questions.

And most of all, to my lovely, lovely, parents who figured out early on that instilling in me a love for jokes and puns is the best survival skill they can give me. Words will not wrap themselves around how grateful I am for your never-ending support and love.

5 Abstract Autism Spectrum Conditions (henceforth, autism) refers to a group of neurodevelopmental conditions characterized by difficulties in social interaction and communication, difficulties in adjusting to unexpected change, alongside unusually narrow interests and repetitive behaviour, and sensory hyper-sensitivity. Twin and family-based studies have consistently identified high heritabilities for autism and autistic traits, with recent studies converging at 60 – 90% heritability. Common genetic variants are thought to additively contribute to as much as 50% of the total risk for autism. In this thesis, I investigate the contribution of common genetics variants (including SNPs, and InDels) to autism and related traits. In Chapter 1, I discuss the recent advances in the field of autism genetics, focussing on the contribution of common genetic variants to the risk for autism. Chapters 2 – 7 report the results of various studies investigating the genetic correlates of autism and related traits. In Chapter 2, I surveyed the evidence for 552 candidate associated with autism, and conducted a meta-analysis for 58 common variants in 27 genes, investigated in at least 3 independent cohorts. Meta-analysis did not identify any SNPs that were replicably associated with autism in the Psychiatric Genetics Consortium genome-wide association study (PGC- GWAS) dataset after Bonferroni correction, suggesting that candidate gene association studies are not statistically well-powered. In Chapters 3 – 7, I conducted genome-wide association studies (GWAS) for 6 traits associated with autism: self-reported empathy (N = 46,861, Chapter 3), cognitive empathy (N = 89,553, Chapter 4), theory of mind in adolescents (N =

4,577, Chapter 5), friendship satisfaction (Neffective = 158,116) and family relationship satisfaction (Neffective = 164,112, both Chapter 6), and systemizing (N = 51,564, Chapter 7). GWAS identified significant loci for self-reported empathy, systemizing, friendship and family relationship satisfaction, and cognitive empathy. Genetic correlation analyses replicably identified a significant negative genetic correlation between autism and family relationship satisfaction and friendship satisfaction, and a significant positive genetic correlation between autism and systemizing. In addition, there was a negative genetic correlation between one of the autism GWAS datasets (iPSYCH) and self-reported empathy. Chapter 8 draws all of these studies together, concluding that there may be at least two independent sources of genetic risk for autism: one stemming from social traits and another from non-social traits. I discuss some future directions about how this can be leveraged using polygenic scores from multiple phenotypes to potentially stratify individuals within the autism spectrum, and both the strengths and limitations of the reported studies.

6 Index

Contents 1. Introduction ...... 16 1.1 Autism: definition, prevalence, and co-morbidities ...... 16 1.2 Heritability of autism: Twin studies and familial recurrence ...... 17 1.3 Molecular genetics: Linkage, syndromic forms of autism and candidate genes ...... 19 1.4 Copy number variants and de novo loss of function mutations ...... 20 1.5 Transcriptional dysregulation in autism ...... 22 1.6 The role of common genetic variants in autism ...... 24 2. Meta-analysis of candidate gene association studies in autism...... 30 2.1 Introduction ...... 30 2.2 Methods...... 31 2.2.1 Literature search and inclusion criteria ...... 31 2.2.2 Cohorts from the Autism Research Centre ...... 31 2.2.3 Statistical analyses ...... 32 2.2.4 Analysis of the PGC dataset ...... 34 2.3 Results ...... 34 2.3.1 Literature review ...... 34 2.3.2 Mean effect sizes ...... 35 2.3.3 Subgroup analyses...... 43 2.3.4 Publication bias and sensitivity analyses ...... 47 2.3.5 Analysis of the PGC dataset ...... 50 2.3.6 Previous meta-analyses ...... 50 2.4 Discussion ...... 51 3. Genome-wide association study of self-reported empathy ...... 54 3.1 Introduction ...... 54 3.2 Methods...... 56 3.2.1 Participants ...... 56 3.2.2 Measures ...... 57 3.2.3 Genotyping, imputation and quality control ...... 57 3.2.4 Genetic association ...... 58 3.2.5 Genomic inflation factor, heritability, and functional enrichment ...... 58 3.2.6 Genetic correlations ...... 59 3.2.7 Gene-based analysis ...... 60 3.2.8 Genome-wide colocalization ...... 60 3.2.9 Data Availability ...... 60 3.3 Results ...... 60

7 3.3.1 Phenotype description ...... 60 3.3.2 Genome-wide association analyses ...... 62 3.3.3 Gene-based association, heritability, and enrichment in functional categories ...... 68 3.3.4 Sex differences ...... 71 3.3.5 Genetic correlations ...... 74 3.3.6 Bayesian genomic colocalization ...... 76 3.4 Discussion ...... 77 4. Genome-wide association meta-analysis of Cognitive Empathy ...... 81 4.1 Introduction ...... 81 4.2 Methods...... 82 4.2.1 Participants ...... 82 4.2.2 Measures ...... 83 4.2.3 Genotyping, imputation and quality control ...... 83 4.2.4 Association analyses ...... 83 4.2.5 Heritability and genetic correlation ...... 84 4.2.6 Twin Heritability ...... 85 4.2.7 Gene-based analyses and sex difference analyses...... 86 4.2.8 Data Availability ...... 88 4.3 Results ...... 88 4.3.1 Study overview ...... 88 4.3.2 Phenotypic properties of the short version of the Eyes Test ...... 90 4.3.3 Genome-wide association meta-analyses ...... 92 4.3.4 Heritability analyses ...... 101 4.3.5 Genetic correlation ...... 103 4.3.6 Sex differences ...... 106 4.4 Discussion ...... 109 5. Genome-wide association study of theory of mind in adolescents ...... 113 5.1 Introduction ...... 113 5.2 Methods...... 115 5.2.1Phenotype and participants ...... 115 5.2.2 Genotyping and Imputation ...... 116 5.2.3 Genome-wide association analyses and gene based analyses ...... 117 5.2.4 Heritability and Polygenic risk scores...... 117 5.2.5 Data availability ...... 118 5.3 Results ...... 118 5.3.1Phenotypic distribution ...... 118 5.3.2 Genome-wide association analyses and heritability ...... 119

8 5.3.3 Gene-based and pathway analysis ...... 120 5.3.4 Polygenic risk score ...... 121 5.4 Discussion ...... 125 6. Genome-wide association meta-analysis of social relationship satisfaction ...... 128 6.1 Introduction ...... 128 6.2 Methods...... 129 6.2.1 Phenotypes and participants ...... 129 6.2.2 Genetic analyses ...... 129 6.3.3 Genetic correlations ...... 130 6.3.4 Heritability analyses ...... 131 6.3.5 Functional annotation ...... 131 6.3.6 Polygenic regression analyses ...... 132 6.3.7 ALSPAC cohort ...... 132 6.3 Results ...... 134 6.3.1 Phenotypic distributions ...... 134 6.3.2 Genetic correlation ...... 138 6.3.3 SNP heritability ...... 141 6.3.4 Genetic association analyses ...... 142 6.3.5 Functional annotation ...... 147 6.3.6 Polygenic score analyses ...... 160 6.4 Discussion ...... 164 7. Genetics of Systemizing ...... 167 7.1 Introduction ...... 167 7.2 Methods...... 168 7.2.1 Participants ...... 168 7.2.2 Phenotype ...... 169 7.2.3 Genotyping, imputation, and quality control ...... 169 7.2.4 Genetic association ...... 170 7.2.5 GWAS of SCDC ...... 172 7.2.6 Genomic inflation factor, heritability, and functional enrichment ...... 173 7.2.7 Functional annotations ...... 174 7.2.8 GWIS ...... 174 7.2.9 Phenotypic regression analyses ...... 175 7.2.10 Data Availability ...... 175 7.3. Results ...... 175 7.3.1 Phenotypic distribution and correlates of the SQ-R in the CARD database ...... 175 7.3.2 Phenotypic distribution and genome-wide association analyses...... 177

9 7.3.3 Genetic correlation ...... 184 7.3.4 Heritability, enrichment analysis, and gene-based analyses ...... 188 7.4 Discussion ...... 192 8. Discussion ...... 195 References ...... 203 Appendix 1: Studies included and study characteristics (Chapter 2) ...... 224 Appendix 2: Studies excluded (Chapter 2) ...... 234 References for the studies included in Appendices 1 and 2 ...... 236 Appendix Table 3: Genes with sex-specific expression (Chapter 3) ...... 247 Appendix Table 4: Gene based analysis of the non-stratified EQ GWAS (Chapter 3) ...... 250 Appendix Table 5: Pathway based analysis for the non-stratified EQ (Chapter 3) ...... 253 Appendix Table 6: Gene based analysis of the non-stratified Eyes Test (Chapter 4) ...... 256 Appendix Table 7: Gene based association for the Eyes Test - males (Chapter 4) ...... 259 Appendix Table 8: Gene based association for the Eyes Test - females (Chapter 4) ...... 262 Appendix Table 9: List of sex-differentially expressed genes in the adult cortex (Chapter 4) ...... 265 Appendix Table 10: Results of the gene-based analyses for the Triangles Task (Chapter 5) ...... 267 Appendix Table 11: Results of the pathway analyses for the Triangles Task (Chapter 5) ...... 270 Appendix Table 12: Results of the eQTL analyses for social relationship satisfaction (Chapter 6) .. 274 Appendix Table 13: Results of the chromatin interactions for social relationship satisfaction (Chapter 6) ...... 277 Appendix Table 14: Gene based analyses for family relationship satisfaction (Chapter 6) ...... 283 Appendix Table 15: Gene based analyses for friendship satisfaction (Chapter 6) ...... 286 Appendix Table 16: Results of the chromatin interactions for the SQ-R (Chapter 7) ...... 289 Appendix Table 17: Gene based analyses for the SQ-R (Chapter 7) ...... 294 Appendix Table 18: Pathway analyses for the SQ-R (Chapter 7)...... 297

10 List of Figures

1. Introduction ...... 16 2. Meta-analysis of candidate gene association studies in autism...... 30 Figure 1: Schematic diagram of the study protocol ...... 35 Figure 2: Forest plot for rs7794745 (CNTNAP2) ...... 36 Figure 3: Forest plot for rs167771 (DRD3) ...... 36 Figure 4: Forest plot for rs362691 (RELN) ...... 37 Figure 5: Forest plot for rs2268491 (OXTR) ...... 37 Figure 6: Forest plot for rs2292813 (SLC25A12) ...... 38 Figure 7: Forest plot for rs2056202 (SLC25A12) ...... 38 Figure 8: Forest plot for rs1801133 (MTHFR) ...... 39 Figure 9: Forest plot for rs1861972 (EN2) ...... 39 Figure 10: STin2 VNTR (SLC6A4), Caucasian only ...... 45 Figure 11: rs362691 (RELN), Case-control only ...... 45 Figure 12: rs2292813 (SLC25A12), TDT only ...... 45 Figure 13: rs2056202 (SLC25A12), TDT only ...... 46 Figure 14: rs1861973 (EN2), TDT only ...... 46 Figure 15: rs1861973 (EN2), Caucasian only ...... 46 Figure 16: rs1861972 (EN2), Caucasian only/ TDT only ...... 47 Figure 17: Sensitivity analysis for rs4446909 (ASMT) ...... 48 Figure 18: Sensitivity analysis for rs736707 (RELN) ...... 48 Figure 19: Sensitivity analysis for rs1801133 (MTHFR) ...... 49 Figure 20: Sensitivity analysis for rs2056202 (SLC25A12) ...... 49 Figure 21: Sensitivity analysis for rs1861972 (EN2) ...... 50 3. Genome-wide association study of self-reported empathy ...... 54 Figure 1: Schematic diagram of the study protocol ...... 61 Figure 2: Mean scores and heritability estimates for the EQ ...... 62 Figure 3: Manhattan Plot (A) and QQ Plot (B) of the non-stratified GWAS analysis ...... 63 Figure 4: Manhattan Plot (A) and QQ Plot (B) of the females-only GWAS analysis ...... 64 Figure 5: Manhattan Plot (A) and QQ Plot (B) of the males-only GWAS analysis ...... 65 Figure 6: Regional association plot for rs4882760 (Non-stratified GWAS) ...... 66 Figure 7: Genetic correlations between the EQ and other conditions ...... 75 4. Genome-wide association meta-analysis of Cognitive Empathy ...... 81 Figure 1: Schematic diagram of the study protocol ...... 89 Figure 2: Frequency histogram (left) and Quantile-quantile plot of the scores on the short version (V2) of the Eyes Test...... 90

11 Figure 3: Manhattan plot and regional association plot for the Eyes Test (females) meta-analysis GWAS ...... 93 Figure 4: QQ-plots for all the GWAMAs ...... 97 Figure 5: Manhattan plot of Eyes Test meta-analysis ...... 97 Figure 6: Locus zoom plots for the most significant loci in the non-stratified and the males-only loci ...... 98 Figure 7: Mean scores and SNP heritability ...... 102 Figure 8: Genetic correlations between the Eyes Test and psychiatric conditions, psychological traits and subcortical brain volumes...... 105 Figure 9: Effect direction for independent suggestive SNPs (P < 1x10-6) ...... 107 Figure 10: Overlap of top genes in males and females (Eyes Test) ...... 108 Figure 11: Sex-difference enrichment analyses ...... 109 5. Genome-wide association study of theory of mind in adolescents ...... 113 Figure 1: Frequency histogram and Quantile-quantile plot of the scores on the Triangles Task .... 119 Figure 2: Manhattan plot and quantile-quantile plot of the GWAS of the Triangles Task ...... 120 Figure 3: Polygenic score results at various P-value thresholds for the Triangles Task ...... 124 6. Genome-wide association meta-analysis of social relationship satisfaction ...... 128 Figure 1: Phenotypic distributions of family and friendship relationship satisfaction ...... 134 Figure 2: Spearman’s rank correlation between phenotypic distributions of friendship and family relationship satisfaction ...... 135 Figure 3: Difference in scores for friendship and family relationship satisfaction based on sex .... 136 Figure 4: Difference in scores for friendship and family relationship satisfaction based on age ... 137 Figure 5: Genetic correlations ...... 139 Figure 6: Additive SNP heritability for family relationship and friendship satisfaction ...... 141 Figure 7: Manhattan and QQ-plot for pre-MTAG GWAS ...... 143 Figure 8: Direction and P-values of all independent SNPs with P < 10-6 in the pre-MTAG family relationship satisfaction and friendship satisfaction GWAS ...... 144 Figure 9: Manhattan and QQ-plots ...... 146 Figure 10: Circos plots showing interactions and eQTLs ...... 148 Figure 11: Regional LD plot for family relationship satisfaction ...... 149 Figure 12: Regional LD plot for the friendship satisfaction ...... 150 Figure 13: General tissue enrichment (FUMA) ...... 152 Figure 14: Specific tissue enrichment (FUMA) ...... 153 Figure 15: Correlation in Polygenic scores ...... 160 Figure 16: Distribution of phenotypes tested in polygenic regression analysis ...... 161 Figure 17: Polygenic regression analyses ...... 162 7. Genetics of Systemizing ...... 167 Figure 1: Distribution of the SQ-R scores in different groups ...... 176 Figure 2: Correlation between the SQ-R and the AQ scores ...... 177

12 Figure 3: Mean scores and standard deviations of the SQ-R ...... 178 Figure 4: Manhattan and QQ-plots for the three GWAS ...... 179 Figure 5: Circos plots for the three significant loci demonstrating eQTLs and chromatin interactions ...... 182 Figure 6: Regional LD plot for the three significant loci ...... 183 Figure 7: Genetic correlation between the SQ-R and other phenotypes ...... 185 Figure 8: Genetic correlations between the SQ-R and SQminEdu GWIS estimates ...... 187 Figure 9: Additive heritability for the three GWAS ...... 188 Figure 10: Tissue specific heritability ...... 192 8. Discussion ...... 195 Figure 1: Genetic correlation heatmap of the phenotypes investigated in this thesis and other relevant phenotypes ...... 198 References ...... 203

13 List of Tables

1. Introduction ...... 16 2. Meta-analysis of candidate gene association studies in autism...... 30 Table 1: Mean effect and 95% confidence intervals for all SNPs with P < 0.01 ...... 40 Table 2: Mean effect and standard error for SNPs with P > 0.01 ...... 41 Table 3: Results of the subgroup analyses ...... 44 3. Genome-wide association study of self-reported empathy ...... 54 Table 1: Independent SNPs with P < 1x10-6 from the GWAS studies ...... 67 Table 2: Variance explained by the top SNPs ...... 68 Table 3: Additive SNP heritability for the three GWAS ...... 69 Table 4: Results of the partitioned heritability analyses for the EQ ...... 70 Table 5: Male-female heterogeneity in effect sizes and direction ...... 73 Table 6A: Genetic correlation for the non-stratified GWAS ...... 76 Table 6B: Sex stratified genetic correlations ...... 76 4. Genome-wide association meta-analysis of Cognitive Empathy ...... 81 Table 1: Questions used in the three different versions of the Eyes Test ...... 91 Table 2: All SNPs with P < 1x10-6 from the females-only meta-analysis ...... 94 Table 3: Independent SNPs with P < 1x10-6 from the GWAS studies ...... 96 Table 4: Partitioned heritability results for the Eyes Test GWAS ...... 99 Table 5A: Twin heritability analyses of the Eyes test short version using the BLTS cohort (14 items) ...... 102 Table 5B: Standardised variance components with 95% CIs for the ACE, ADE, and AE models 103 Table 6: Genetic correlations ...... 104 5. Genome-wide association study of theory of mind in adolescents ...... 113 Table 1: Results of the Polygenic score analyses...... 122 6. Genome-wide association meta-analysis of social relationship satisfaction ...... 128 Table 1: Genetic correlation for the two phenotypes ...... 140 Table 2: Additive SNP heritability...... 141 Table 3: 13 independent loci in the pre-MTAG GWAS ...... 145 Table 4: Significant SNPs in the MTAG GWAS...... 145 Table 5: Partitioned heritability analyses for the Family GWAS ...... 154 Table 6: Partitioned heritability analyses for the Friendship GWAS ...... 156 Table 7: Partitioned heritability analyses for tissue-specific expression in the brain ...... 158 Table 8: Cell type specific enrichment partitioned heritability ...... 159 Table 9: Cell type specific enrichment MAGMA ...... 159 Table 10: Polygenic score regression analyses ...... 163 7. Genetics of Systemizing ...... 167

14 Table 1: List of SNPs with P < 1x10-6 ...... 180 Table 2: Variance explained by the top SNPs ...... 181 Table 3: Genetic correlations ...... 186 Table 4: Genetic correlations between SQminEdu and other GWAS ...... 187 Table 5: Additive SNP heritability estimates ...... 188 Table 6: Partitioned heritability for the non-stratified SQ-R ...... 190 8. Discussion ...... 195 References ...... 203

15 1. Introduction 1.1 Autism: definition, prevalence, and co-morbidities The term autism refers to a group of heterogeneous neurodevelopmental conditions typically characterized by difficulties in social interaction and communication alongside repetitive and stereotyped behaviours, and is usually accompanied by difficulties in verbal communication. First described by Leo Kanner and Hans Asperger in 1943 and 1944 respectively1, the term encompasses Classic Autism and Asperger’s Syndrome. The condition is extremely heterogeneous but at its core consists of impaired social and communicative development which, in almost all cases, persists into adulthood. The latest Diagnostic and Statistical Manual of Mental Disorders (DSM-V) defines autism (officially, autism spectrum disorder) as having difficulties in social communication and interaction, and, having restrictive, repetitive, and stereotyped behaviour (American Psychiatric Association, 2013). In addition, difficulties in the two areas must be present in early childhood even if they were identified later, and must together impair and limit everyday functioning. The latest International Statistical Classification of Diseases and Related Health Problems (ICD 10) (World Health Organization, 1992), classifies the group of conditions into childhood autism, and Asperger syndrome. Childhood autism, in addition to the dyad of difficulties mentioned in DSM-V, also incorporates difficulties in verbal communication. Asperger Syndrome differs from childhood autism in that there is no delay or retardation of language or cognitive development.

While initially thought to be extremely rare condition, the prevalence of autism has been rising steadily over years. Between 2000 to 2013, the Center for Disease Control and Prevention’s Autism and Developmental Disorders Monitoring Network has identified an increase in prevalence from 67 in 10,000 in 2000 to 146 in 10,000 in 2012 (see: http://www.cdc.gov/ncbddd/autism/data.html), calculated using a survey of 8-year old children in the United States. A more recent study (2015) using a modified National Health Insitute Survey in the United States in children aged 3- 17, identified a prevalence of 224 in 10,000 (2.24%) (see: http://www.cdc.gov/nchs/data/nhsr/nhsr087.pdf).

In addition to the difficulties in the core diagnostic criteria, individuals with autism often have other comorbid conditions. For example, many individuals have atypical language development, sensory hypo or hypersensitivity, and difficulties in motor coordination, including dyspraxia. Approximately 38% of all individuals with autism have intellectual

1 Though it was initially believed that Kanner and Asperger had independently identified and reported the two conditions, newer evidence suggests that Kanner was aware of Asperger’s work on the condition.

16 disability, as estimated by the CDC. Similarly, many autistic individuals also have comorbid ADHD, depression, and suicidal ideation (Lai, Lombardo, & Baron-Cohen, 2013). Aside from clinical comorbidities, on average, individuals with autism tend to perform better on measures of ‘systemizing’ (Baron-Cohen et al. 2003), that is, the drive to analyse and build systems, based on identifying the laws that govern the particular system, in order to predict how that system will work. Systems may be abstract, mechanical, natural, collectible, and motoric. They also perform better on tests of attention to detail (Jolliffe & Baron-Cohen, 1997), a prerequisite for systemizing. Individuals who are in the science-technology-engineering-maths (STEM) fields, or relatives of these individuals, are more likely to be diagnosed with autism or have higher levels of autistic traits (Baron-Cohen, Wheelwright, Stott, Bolton, & Goodyer, 1997; Ruzich et al., 2015; Wheelwright & Baron-Cohen, 2001). Autistic individuals also, on average, tend to have difficulties in eye contact (Jones & Klin, 2013), attention to social stimuli (Dawson, Meltzoff, Osterling, Rinaldi, & Brown, 1998), and interpreting emotions ( Baron- Cohen et al. 2001), which contributes to persistent difficulties in social interaction and communication. This may be because the social domain is less amenable to systemizing, as it does not reduce to a set of rules

1.2 Heritability of autism: Twin studies and familial recurrence Since the first description in 1943 by Leo Kanner (Kanner, 1943), autism was known to be a condition that manifested in early childhood, leading to the hypothesis that the condition is at least partly genetic. Establishing heritability is important, as it provides evidence for a causal role of genes in autism risk, which can then inform molecular genetic studies. The first twin study to report evidence for familiality in autism, by Folstein and Rutter in 1977, investigated the concordance of autism in a small sample of 11 monozygotic twins (MZ) and 10 dizgyotic twins (DZ) (Folstein & Rutter, 1977). The concordance for autism was 36% in the MZ twins and 0% in DZ twins. Expanding the criteria to include associated cognitive and social impairment or the Broad Autism Phenotype (BAP) showed that 82% of the MZ twins were concordant whereas only 10% of the DZ twins were concordant. Since then, several studies have investigated the heritability of autism in twin samples in different populations. Heritability estimates have largely been comparable and high across twin studies, regardless of the ascertainment criteria (Ronald & Hoekstra, 2011).

A recent meta-analysis of seven twin studies identified a high twin heritability of 64 - 91% (Tick, Bolton, Happé, Rutter, & Rijsdijk, 2016). In parallel, twin studies of ‘autistic traits’ have identified modest to high heritabilities between 60 - 90%, although this varies depending on

17 the type of measure used and the age of the participants (de Zeeuw, van Beijsterveldt, Hoekstra, Bartels, & Boomsma, 2017; Ronald & Hoekstra, 2011). A few studies have also conducted multivariate co-heritability analyses of autism and related phenotypes, suggesting a significant shared genetic influence. The genetic correlation between autism/autistic traits and ADHD/ADHD traits, in particular, is high, reflecting the phenotypic co-morbidity (Ronald & Hoekstra, 2011).

Family recurrence rates have also provided evidence for heritability for autism. Multiple studies from Scandinavian countries have identified similar risk ratio for siblings of individuals with autism (~10%) (Grønborg, Schendel, & Parner, 2013; Jokiranta-Olkoniemi et al., 2016; Sandin et al., 2014). A recent study investigating insurance records has identified a heritability of ~ 90% (Kanix Wang, Gaitsch, Poon, Cox, & Rzhetsky, 2017). Family recurrence rates also offer other clues into the underlying genetics of autism. One interesting observation is that siblings of female probands have higher risk for autism than siblings of male probands - an observation that is called the Carter Effect (the effect was originally described in pyloric stenosis (Carter & Evans, 1996), but subsequently used in other conditions). The Carter Effect suggests that females have a protective effect, suggesting that a greater mutation burden is required for a clinical diagnosis of autism (this has been confirmed using gene sequencing studies, see below). If the genetic risk is partly familial i.e. not de novo, this higher genetic risk can be inherited by siblings, thus increasing the risk for autism. Investigation of twin and multiplex family samples have identified a higher relative risk for autism in siblings of female probands (Werling and Geschwind 2015; Robinson et al. 2013). In contrast, large population studies have failed to find support for the Carter Effect (Sandin et al., 2014).

Another interesting finding is the identification of autistic traits or the broad autism phenotype in family members of probands (Wheelwright et al. 2010). This is also supported by the identification of higher relative risk for other psychiatric conditions in family members of probands compared to the general population (Constantino, Zhang, Frazier, Abbacchi, & Law, 2010; Frazier et al., 2015; Jokiranta-Olkoniemi et al., 2016).

A third interesting finding is that of parental age. A few studies have demonstrated that increased paternal age increases the risk for autism (Frans et al., 2013; McGrath et al., 2014; Sandin et al., 2015). Indeed, a considerable proportion of de novo mutations are paternal in origin (Gratten et al., 2016; Kong et al., 2012). Earlier studies have noted that increased paternal

18 age also increases the risk for de novo mutations in the sperm as the number of cell replications increases with age (Gratten, Wray, Keller, & Visscher, 2014; Kong et al., 2012).

Finally, another hypothesis which has received considerable support is the idea that higher autistic traits or psychiatric liability in individuals is likely to delay fatherhood (Gratten et al., 2014; McGrath et al., 2014). However, a recent study has shown in addition to increased paternal age, the difference in ages between the parents also contributes to risk for autism, with the risk increasing as the difference in ages increases (Sandin et al., 2015). The mechanism underlying the increased risk for increased differences in parental age is unclear, though it is possible that subthreshold psychiatric risk can both advance and delay parenthood in comparison to the mean age in the general population.

1.3 Molecular genetics: Linkage, syndromic forms of autism and candidate genes With evidence of considerable heritability for autism from twin studies, early genetic studies focused on linkage of multiplex autism families to identify loci associated with the condition. Linkage studies investigate the inheritance of regions of in family pedigrees. The first autism linkage study was reported by IMGSAC in 99 families (International Molecular Genetic Study of Autism Consortium, 1998). Early linkage studies were non-parametric and in a relatively small sample size of 100 - 200 families. These studies used relatively sparse linkage genotyping, and had limited success.

Many genes currently associated with autism were first identified through specific syndromic forms of autism. For example, four early genes identified with autism – FMR1, TSC1 and TSC2, and MECP2 – are all associated with syndromic forms of autism (Fragile X Syndrome, Tuberous Sclerosis, and Rett Syndrome respectively). Early studies also investigated large chromosomal abnormalities by karyotyping. Because of the low resolution of these studies, it was impossible to identify specific genes associated with the condition. Today, the prevalence of chromosomal abnormalities is estimated to be less than 2% in autism (Bourgeron, 2016). However, early studies identified genes by sequencing candidate genes in loci with frequent deletions and/or duplications in autism. For example, Bourgeron and colleagues used this approach to identify three early genes with autism: SHANK3, NLGN3, and NLGN4X (Durand et al., 2007; Jamain et al., 2003). Subsequent gene sequencing efforts in large cohorts has provided further support for NLGN3 and SHANK3 in autism (Sanders et al., 2015).

19 Candidate gene approaches have also provided several false positives. Several genes have been investigated in autism using a candidate gene association approach (for a list, see: https://gene.sfari.org/autdb/HG_Home.do). These were typically but not always conducted in relatively small sample sizes, investigating a small number of genetic variants. In Chapter 2, I report the results of the first systematic meta-analysis of candidate gene association studies in autism where we reviewed the evidence of 552 genes that have been included in association studies in autism (Warrier, Chee, Smith, Chakrabarti, & Baron-Cohen, 2015). Common genetic variants in only 27 of these genes had been investigated in three or more independent cohorts, suggesting a scarcity of well-replicated genetic associations for autism. None of the variants included in the meta-analysis were significant in a larger genome-wide association cohort. It is thus clear that common genetic variants have very small effects, and all of the previous candidate gene association studies that have reported associations with autism are statistically underpowered and false-positives. This works sets the stage for subsequent genetic investigations conducted in the remaining chapters; the genetic association studies have been conducted in relatively large samples, affording different degrees of statistical power. Further, we move away from candidate associations to perform ‘hypothesis-naïve’ genome-wide association studies (GWAS).

1.4 Copy number variants and de novo loss of function mutations Copy number variants (CNVs) are submicroscopic genomic deletions or duplications that are larger than 1,000 nucleotides (an arbitrary number). These are frequent in the genome and alter gene dosage (Zarrei, MacDonald, Merico, & Scherer, 2015). Several de novo and inherited copy number variants (CNVs) have been identified in autism. In 2007, the first study investigating the role of CNVs in autism used data from 118 simplex families, 44 multiplex families, and 99 control families, and identified a significant excess of de novo CNVs in simplex probands (Sebat et al., 2007). Since then several studies have investigated the role of CNVs in autism and replicated the initial results (reviewed in (Chung, Tao, & Tso, 2014; Geschwind & State, 2015)).

Several well-validated results have emerged. CNVs are found at a significantly higher frequency in probands compared to unaffected siblings (approximately 2 – 3 times more CNVs than siblings) (Sanders et al., 2015). Further, CNVs in probands affect a larger number of genes, altering gene dosage of multiple genes. Consistent with the Carter Effect, female probands carry more de novo CNVs than male probands (Sanders et al., 2015). In addition, the number of de novo CNVs is significantly associated with lower IQ, a finding that has been replicated

20 in multiple studies (Leppa et al., 2016; Levy et al., 2011). Overall, de novo CNVs are thought to contribute a significant proportion of the risk in autism, with 5 – 15% of individuals with autism carrying a de novo CNVs compared only 1 – 2% in the general population (Geschwind & State, 2015). While several CNVs have been identified, it has been challenging to identify CNVs at genome-wide significance due to the difference in lengths and number of genes affected by different CNVs at the same locus. By investigating de novo CNVs in multiple large cohorts, Sanders and colleagues identified 6 risk loci: 1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2 (Sanders et al., 2015).

In parallel, studies have also identified a prominent role for de novo putative loss-of- function mutations in autism. Though de novo loss-of-function mutations have large effect sizes in comparison to common variants, their relative rarity in the population in addition to the genetic heterogeneity in the population makes it difficult to identify genes at a genome- wide significant threshold. Considerable advances have been made in identifying high confidence genes in autism using next-generation sequencing in largely simplex families (trios or quads with one affected proband), and there has been convergence on key findings (De Rubeis et al., 2014; Iossifov et al., 2014; Neale et al., 2012; O’Roak et al., 2012; Samocha et al., 2014; Sanders et al., 2015).

Similar to de novo CNVs, de novo loss-of-function mutations are enriched in probands compared to controls in simplex families. Female probands are likely to harbour more de novo loss-of-function mutations than male probands, which is in line with the Carter Effect. These mutations are also associated with lower IQ and more severe autistic phenotypes. It is also clear that the number of these mutations increases with paternal age, likely because spermatogonia undergo more active mitosis to produce sperm cells. Finally, these studies have also identified that de novo missense mutations and inherited loss of function mutations show smaller effects in comparison to de novo loss-of-function mutations.

Recent efforts have integrated data from multiple sources in order to identify genes that are frequently mutated. Sanders and colleagues (Sanders et al., 2015) identified 65 high confidence genes (false discovery rate < 0.1), though it is estimated that between 450 – 1,000 such genes may be involved in autism (Geschwind & State, 2015). Two recent studies have expanded and refined this list of genes by using different methods. Yuen and colleagues performed whole- genome sequencing on multiplex autism families, and identified 18 additional genes (Yuen et al. 2015; Yuen et al. 2017). In parallel, Kosmicki and colleagues utilized genetic data from a

21 large-scale population resource – the Exome Aggregation Consortium (Lek et al., 2016) – to identify which de novo variants are not observed in the general population (Kosmicki et al., 2017).

Partly due to the families sequenced and partly due to the underlying genetic architecture, many of the genes and CNVs identified are not unique to autism. Indeed, several of these genes are also seen in conditions such as intellectual disability (ID) and schizophrenia (Geschwind & Flint, 2015), a feature that is also shared by CNVs identified in autism and common genetic variation. This shared pleiotropy between autism, intellectual disability and schizophrenia, among other conditions, suggests that a combination of different genetic and environmental effects shape the disease-specific mechanisms. Investigation of transcriptomic signatures of autism and intellectual disability has identified distinct biological networks that contribute to ID and autism (Parikshak et al., 2013).

The use of large-scale data has also allowed for convergence in identifying pathways, particularly synaptic function, chromatin remodelling and downstream targets of FMR1, MECP2, and CHD8 (Bourgeron, 2015; Pinto et al., 2014). Several of the genes identified are associated with post-synaptic density, and many of the mutations are thought to affect synapse formation and plasticity, both during child development and in adulthood (Bourgeron, 2015). Functional studies using post-mortem brain tissue, and animal models, have identified altered synaptic development and pruning (Tang et al., 2014; Zoghbi & Bear, 2012).

1.5 Transcriptional dysregulation in autism As autism is a neurodevelopmental condition, studies have also sought to investigate alterations in gene expression directly in the developing brain. Parikshak and colleagues (Parikshak et al., 2013) and Willsey and colleagues (Willsey et al., 2013) both used transcriptomic data from developing cortical tissues to identify if genetic risk for autism shows spatio-temporal convergence, using different methods. Willsey and colleagues constructed gene co-expression network using high-confidence autism genes as seed genes, and investigated the enrichment for probable autism genes across multiple temporal and spatial windows. They identified enrichment for probable autism genes in the mid-fetal prefrontal and the primary motor-somatosensory cortex. Further, by investigating layer-specific gene expression, they were able to identify enrichment in the cortical innerplate in the mid-fetal prefrontal and primary motor-somatosensory cortices.

22 Parikshak and colleagues (Parikshak et al., 2013) used a different approach to investigate convergence of genetic risk in autism in the developing brain. Using whole-genome transcriptome data, they constructed weighted gene co-expression networks agnostic of relationship to candidate genes in autism and followed their expression trajectories across developmental time. They identified three gene co-expression modules that were enriched for autism candidate genes, and transcriptionally dysregulated genes in autism. Interestingly, rare de novo variants were enriched in two different co-expression modules, suggesting that transcriptional dysregulation and rare de novo variants represent distinct mechanisms of risk in autism. Layer specific enrichment analyses identified significant enrichment with the inner cortical plate in the developing brain.

A few studies have also investigated gene dysregulation in adult cortical and subcortical tissues, by systematically identifying differentially expressed genes in the autism post-mortem brains compared to control post-mortem brains (Gupta et al., 2014; Parikshak et al., 2016; Voineagu et al., 2011). These studies have identified several important results. First, cortical gene expression can help to separate the transcriptomes of autistic individuals from population controls. Second, these studies have been able to identify differentially expressed genes (both upregulated and downregulated) in autism cortex compared to control cortex, though analyses of the autism cerebellum in comparison to the control cerebellum have not been forthcoming.

These differences are likely to extend to other cortical and subcortical regions. For example, there is evidence that typical differences in gene expression in the frontal and temporal cortices are altered in autism (Parikshak et al., 2016). Third, there is replicable evidence to suggest that differentially downregulated genes are associated with neuronal and synaptic pathways, whereas upregulated genes are associated with glial (microglia and astrocytes in particular) and immune-related pathways (Parikshak et al., 2016; Voineagu et al., 2011).

Integrative transcriptome analyses have also identified similarities and differences across multiple psychiatric conditions. A meta-analysis of gene-expression microarray data across cortical transcriptional datasets has revealed overlapping neuropathology between autism and psychiatric conditions such as schizophrenia, bipolar disorder and major depression (Gandal et al., 2016). The correlation between transcriptional dysregulation between these conditions parallels the genetic correlation identified using GWAS data. This shared pathophysiology was replicated using RNA sequencing data. Another study provided independent evidence for the shared pathophysiology between autism and schizophrenia, by significant and correlation of

23 the transcriptional dysregulation between the two conditions (Ellis, Panitch, West, & Arking, 2016).

1.6 The role of common genetic variants in autism There have been a few GWAS of autism, with limited success. The first study, by Wang and colleagues (Kai Wang et al., 2009) investigated 780 families initially and a second cohort of 1,453 autistic individuals and 7,070 controls. Meta-analysis of the two cohorts identified one locus at 5p14.1 that was significant (P < 5 x 10-8). This intergenic locus was located between two cadherin genes (CDH10 and CDH9) that are involved in diverse neural functions and contain a unique calcium binding domain. They replicated the locus in two smaller, independent cohorts. Interestingly, this region was also implicated in another genetic association study that used data from 438 autistic families (Ma et al., 2009). SNPs in 5p14.1 were nominally significant (P < 0.05) though it did not reach genome-wide significance.

Despite these early success, subsequent studies have not been able to replicate association at 5p14.1 at a genome-wide association level, leading to the conclusion that these early studies were statistically underpowered and the effect size inflated due to winner’s curse (or regression to the mean, where the effect size that are most likely to cross the threshold of significance are likely to be inflated when the statistical power is limited).

Four further studies have reported significant association results. In 2009, Weiss, Arking and colleagues (Weiss et al., 2009) used data from multiple different cohorts and meta-analysed results from transmission disequilibrium tests and association studies to identify one SNP that was significant at 5p15.2 between genes SEMA5A and TAS2R1. However, the P-value threshold used for significance was not the traditional GWAS threshold of 5x10-8, but a more liberal threshold of 2.5x10-7, identified using permutation and after accounting for LD. Another study by Anney and colleagues (Anney et al., 2010) divided participants into four groups along two axes: one on ethnicity (primary European ancestry vs. all ancestry), and one on diagnosis (strict autism vs. inclusive spectrum). They identified an intronic SNP in MACROD2 that was associated with strict autism below a genome-wide threshold. A further study conducted a cross-ethnic meta-analysis (European ancestry and Chinese ancestry) to identify variants associated with autism. Meta-analysis identified common variants at the 1p13.2 locus associated with autism. Interestingly, 1p13.2 has been previously been linked to autism in linkage studies (Xia et al., 2014). Recently, work from the Psychiatric Genomics Consortium using genetic data on more than 16,000 individuals identified an association at 10q24.32

24 (rs1409313, P = 1.05 × 10−8) (The Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium, 2017).

Despite the general lack of replicably associated loci, there is considerable evidence for a role of common genetic variants, en masse, identified using different SNP heritability measures. Using data from simplex (only one individual with an autism diagnosis) and multiplex families (at least two individuals with an autism diagnosis in the family), Klei and colleagues (Klei et al., 2012) have identified a narrow-sense heritability between 40 – 65% for autism. The heritability was higher for multiplex families (65%) than for simplex families (~40%), consistent with the observation of higher number of de novo loss-of-function mutations in simplex probands than in the general population (Iossifov et al., 2014; Kosmicki et al., 2017; Sanders et al., 2015).

This estimate of narrow-sense heritability was confirmed by another study that suggested that the majority of the genetic risk for autism is attributable to common variants (Gaugler et al., 2014). The same study identified that de novo loss-of-function mutations contributed to ~3% of the variance, but explained a significant proportion of individual liability. Summing up the contribution from different classes of genetic risk, genetic variants explained approximately about 60% of the total variance (Gaugler et al., 2014). The limited success in identifying significant loci despite the considerable SNP heritability may be due to multiple reasons, including low per-SNP effect size coupled with the high polygenicity of the condition, and the considerable phenotypic heterogeneity.

Other studies have tried to investigate the genetics of specific subtypes of autism. Two studies have investigated the genetics of Asperger Syndrome, a subtype of autism where individuals have average or above average intellectual ability and preserved language. Neither of the studies, however, identified genome-wide significant association (Salyakina et al., 2010; Warrier, Chakrabarti, et al., 2015). A third study conducted family-based association analysis based on IQ and symptom profiles, but did not identify significant differences in heritability nor significant associations, possibly owing to the reduced sample size and, consequently, statistical power (Chaste et al., 2015). As there is considerable phenotypic differences between males and females, another study also investigated sex-stratified analysis, but did not find evidence for higher genetic risk for females with autism (Mitra et al., 2016).

25 A few studies have also investigated the genetics of traits related to autism, given the considerable twin heritability of autistic and related social traits. Two studies have investigated the genetics of social communication in children. Using data from a longitudinal cohort (ALSPAC), St. Pourcain and colleagues investigated the SNP heritability of social- communication difficulties using the Children’s Communication Checklist (CCC) and the Social-Communication Disorders Checklist (SCDC) (St Pourcain et al., 2013, 2014). Both phenotypes were modestly heritable, with a SNP-heritability estimate of the 0.18 for the CCC derived phenotype and SNP heritability estimates ranging from 0.08 to 0.45 for the SCDC across different ages. Our work, reported in Chapters 3 - 7 has also identified significant SNP heritabilities for traits related to autism. For example, measures of empathy (cognitive and self- report) had significant SNP heritabilities of between 0.05 - 0.12.

Development in statistical methods has also allowed for the interrogation of SNP co- heritabilities (or genetic correlations) between autism and different phenotypes. Work by Robinson, St. Pourcain and colleagues identified a significant and replicable genetic correlation between clinically diagnosed autism and the broader autism phenotype measured using the

SCDC (rg ~ 0.30) (Robinson et al., 2016). However, this was dependent on the age at which the SCDC was completed by individuals, with the highest genetic correlation in childhood, which declined with age (St Pourcain et al., 2017). Further, the SCDC measures only social aspects of autism and hence, is not necessarily representative of autistic traits. The limited sample size (n ~ 5000) for the SCDC GWAS study ensures that the correlation is only significant if the intercept is constrained in the LD score regression to 1, which minimizes the standard errors of the correlation. Thus, this study needs to be interpreted cautiously. In Chapter 6, I report a significant, negative, and replicable genetic correlation between two measures of social satisfaction (family relationship satisfaction and friendship satisfaction) with two large-scale GWAS of autism.

In parallel, genetic correlation analyses have also investigated the shared genetics between autism and non-social traits. The most interesting of these is the consistent positive genetic correlation between autism and different measures of cognition including educational attainment, childhood and adult cognition, and number of college years (Bulik-Sullivan, Finucane, et al., 2015; Clarke et al., 2015; Sniekers et al., 2017). This is in contrast to other psychiatric conditions, notably, schizophrenia and bipolar disorder, where the correlation with different measures of cognition including educational attainment is neither consistently significant nor in the same direction. In addition, in Chapters 7, I report significant, positive

26 and replicable correlation between systemizing (an interest in rule-based systems) that persists after conditioning on the genetic contribution to educational attainment.

The positive genetic correlation between measures of cognition and autism, is in contrast with several studies that have identified a significant co-morbidity and intellectual disability (Lai et al., 2013). There is some epidemiological and genetic evidence of different genetic architecture of autism with vs. without intellectual disability (Robinson et al. 2014). Whilst common genetic variants that contribute to autism risk are positively genetically correlated with measures of cognition, there is an enrichment for de novo loss-of-function mutations in individuals with autism and intellectual disability. A recent study, however, did not identify a difference in polygenic transmission for autism or educational attainment genetic scores between individuals with autism with vs. without ID, providing support for a two-hit model (wherein a de novo loss-of-function mutation in combination with a background of genetic risk predisposes an individual to autism as opposed to a de novo loss-of-function mutation alone) (Weiner et al., 2017). Interestingly, a recent study has identified that common genetic variants associated with different signatures of positive evolutionary selection in humans are enriched in autism, and this may possibly linked to the underlying pleiotropy with different measures of cognition (Polimanti & Gelernter, 2017) .

The current genetic correlation results are limited by the sample size and the effect statistical power of the autism GWAS analysis. Work from the Psychiatric Genomics Consortium (Grove et al., 2018) using a larger autism GWAS sample has identified several additional significant correlations including with other psychiatric conditions.

It is clear that a considerable proportion of the risk for autism lies in common genetic variants. It is also clear that current genome-wide association studies for autism have had limited success. Some of it can be attributable to high polygenicity and very low effect sizes, which would indicate that despite the high familial and twin heritability, large sample sizes would be needed to detect significant associations. A different strategy would be to think of autism as a condition existing on multiple different Gaussian liability models. Genetic risk for the condition would not emerge from one underlying liability, but multiple different liability models. This approach would allow for investigation of dimensional traits as espoused by the Research Domain Criteria (RDoC), and may also help surmount the challenges posed by the immense underlying heterogeneity in the condition.

27 Chapters 3 – 7 of this thesis investigate this by investigating the genetics of multiple different traits that are related to autism. Here, we draw the distinction between traits related to autism and autistic traits. The former is a collection of traits that can be thought to be normally distributed in the general population. Autistic individuals, on average, lie several standard deviations away from the mean on these traits. The latter is simply the underlying latent risk model of autism and can, for all practical purposes, be thought of as a single Gaussian. There are merits to investigating both approaches. The current thesis deals with the former. However, we are in the process of investigating the latter in the UK Biobank and other cohorts.

My investigations centre around social and non-social traits related to autism. Chapters 3 - 6 are genome-wide association studies of social traits related to autism: self-reported empathy, cognitive empathy, theory of mind in adolescence, and social relationship satisfaction (measured using two different scales – family relationship satisfaction and friendship satisfaction). In Chapter 7, I investigate the genetic correlates of a non-social trait associated with autism: Systemizing, which is an interest in systems. In all these chapters, we investigate three central questions: 1. What is the contribution of common genetic variants to the phenotype investigated? 2. What is the genetic correlation between the phenotype investigated and autism, other psychiatric conditions, and related psychological traits? and 3. What are the tissues, pathways, and gene sets that are enriched for association with the phenotype being tested?

The sample sizes for these phenotypes are varied. The smallest study is conducted in a sample size of less than 5,000 13-year olds (Chapter 5). The advantage of this study is that we can investigate age-specific contributions to theory of mind as measured using a fairly detailed phenotyping test. The largest study has an effective sample size of approximately 150,000 individuals from the UK Biobank (Chapter 6). These differences in sample sizes are reflected in the results. Overall, the main results for these studies are as follows. First, as expected, all the phenotypes are modestly heritable. Second, we identified significant genome-wide associations in most of the phenotypes tested. Third, barring theory of mind and cognitive empathy, all the phenotypes are genetically correlated with autism in the direction predicted by psychometric and epidemiological studies. However, these phenotypes are not uniquely correlated with autism – many of these phenotypes are also genetically correlated with other psychiatric conditions and psychological and cognitive phenotypes. For three of these phenotypes (self-reported empathy, cognitive empathy, and systemizing), we also investigated genetic sex differences given the modest sex differences observed in the phenotypes. The sex

28 specific architecture differed between phenotypes: whilst systemizing had the largest phenotypic sex difference, there was a very high genetic correlation between the sexes. In contrast, there was considerable evidence for a sex-specific architecture for cognitive empathy. Overall, our results advance the understanding of traits related to autism and I discuss how this can be utilized in future analyses.

29 2. Meta-analysis of candidate gene association studies in autism 2.1 Introduction Twin studies of autism have identified a heritability of between 50 to 90% (Bourgeron, 2016; Colvert et al., 2015), making a strong case for the role of genetics in the aetiology of the condition. Common genetic variants contribute to a significant proportion of the risk (Gaugler et al., 2014; Klei et al., 2012). Several different study designs have been used to identify common variants implicated in autism including genome-wide association studies (GWAS) and candidate gene association studies. Strategies to identify common variants through GWAS have thus far had limited success in identifying consistent, replicable loci across cohorts (Gratten et al., 2014). This may be attributed to many factors, including small sample sizes with limited power, the association model (case-pseudocontrol association is typically underpowered when compared to case-control in association studies (Peyrot, Boomsma, Penninx, & Wray, 2016), and the high genetic and phenotypic heterogeneity which makes it challenging to identify subtype-specific variants (Lai et al., 2013).

Over the last two decades, several studies have investigated common genetic variants in candidate genes for autism, typically investigating variants in a small number of genes using a relatively small sample size. These studies have provided some evidence of the association of a few genes with autism, though they are not rigorous enough to definitively identify variants and results vary based on ethnicity, sample size, study methodology, and clinical ascertainment (O’Roak & State, 2008). One method to investigate the underlying effect using summary level data is meta-analysis. Though not without limitations, meta-analysis provides an excellent statistical framework to systematically analyse effect sizes. Further, the combined power of a meta-analysis greatly exceeds the power of the individual studies in a meta-analysis.

In the field of psychiatric genetics, studies have comprehensively investigated existing candidate gene studies and used meta-analysis to investigate genetic associations (Munafò et al., 2003; Taylor, 2013, 2016; Z. Wang et al., 2014). In the field of autism genetics, such an overarching study is lacking. To bridge this gap, we reviewed the existing literature for 552 genes implicated in autism. Using strict inclusion criteria, we identified common variants in 27 genes that were investigated in three or more independent cohorts. We performed meta- analyses, sensitivity analyses and subgroup analyses for these common variants and checked for publication bias in a subset of these common variants.

30 2.2 Methods 2.2.1 Literature search and inclusion criteria A preliminary literature search of genes associated with autism was performed using SFARI gene (https://gene.sfari.org/) and HuGE Navigator (http://hugenavigator.net/). Since both these databases do not completely document the available literature, we additionally searched Pubmed, Scopus, and Google Scholar. The search terms used were: 'Gene name' or 'variant ID' and 'Autism' or 'Autistic Disorder' or 'Asperger Syndrome'.

Studies were included in the meta-analysis if: 1) they reported effect sizes or statistics to measure effect sizes and confidence intervals; 2) the studies were either a case-control association study or a transmission disequilibrium study of autism; 3) the variants did not deviate from Hardy-Weinberg Equilibrium (HWE) in the control group or if the sample size was too small to effectively calculate HWE due to sampling effect. Though we checked for HWE in family-based studies, this was not a requirement for including these studies as the study design overcomes the issue of population stratification; 4) cases had a diagnosis of autism (Autism, Pervasive Developmental Disorders Not Otherwise Specified, Asperger Syndrome) according to DSM-IV, DSM-5, or ICD-10 criteria; 5) the global minor allele frequency (MAF) of the variant investigated was greater than 0.01; 6) the studies were reported in English; and 7) the common variants were investigated in independent cohorts. Authors of the articles were contacted if sufficient information was unavailable to use the data for meta-analysis. We did not include data from GWAS as there is an overlap between participants in the candidate gene association studies and the genome-wide association studies. Since we had access to only summary data, it was impossible to ascertain the degree of overlap and remove participants accordingly. Literature search and study inclusion was performed from March 2014 to September 2014.

2.2.2 Cohorts from the Autism Research Centre In addition to the published studies, we used unpublished genotype data from two cohorts from our research group at the Autism Research Centre, University of Cambridge. These cohorts are labelled 'Chakrabarti 2009 (Cohort 1)' and 'Warrier 2014 (Cohort 2)' in the current study.

Cohort 1 consists of 349 controls (143 males and 206 females) without an autism diagnosis and were recruited using an advertisement. There were 174 cases (140 males and 34 females). Cases were diagnosed with autism by independent clinicians using DSM-IV (DSM-

31 (IV, 1994) or ICD 10 criteria (World Health Organization, 1992). The following SNPs used included in the study were genotyped and analysed previously (Chakrabarti et al., 2009): rs37356353 and rs1861972 in EN2; rs6265 in BDNF, rs10951145 in HOXA1; rs237885 and rs2228485 in OXTR. Additionally, for this study, we genotyped the following SNPs: rs4717806 in STX1A; rs736707 in RELN, rs2056202 in SLC25A12; rs53576, rs2254298, rs2268493, rs2268490, rs237894, and rs2301261 in OXTR.

Cohort 2 consists of 118 cases (74 males and 44 females) and 412 controls. Cases were diagnosed with autism by independent clinicians based on DSM-IV or ICD 10 criteria. Select SNPs in three genes, namely OXTR, SLC25A12, GABRB3, and STX1A were genotyped, analysed and reported previously (Di Napoli, Warrier, Baron-Cohen, & Chakrabarti, 2014; Durdiaková, Warrier, Banerjee-Basu, Baron-Cohen, & Chakrabarti, 2014; Durdiaková, Warrier, Baron-Cohen, & Chakrabarti, 2014; Warrier, Baron-Cohen, & Chakrabarti, 2013). SNPs from these genes analysed in this study have been referenced accordingly. In addition, we also genotyped rs1861972 in EN2 and rs736707 in RELN. All participants reported Caucasian ancestry for at least three generations. DNA was extracted from buccal swabs. Genotyping was performed using TaqMan® SNP genotyping assays, Applied Biosystems Inc., CA. No SNP showed a significant deviation from HWE. Allelic association study was performed using Plink v1.07 (Purcell et al., 2007).

2.2.3 Statistical analyses Meta-analysis was performed only if variants were investigated in three or more independent cohorts. Family Based Association Tests (FBATs) studies were not included as effect sizes are not calculated in FBAT. For variants investigated in five or more independent cohorts, we performed a complete meta-analysis. This included the calculation of effect size and, additionally, publication bias, sensitivity analysis, and subgroup analysis. For variants investigated in three to five independent cohorts, analysis was restricted to the calculation of mean effect size. We did not perform a meta-analysis for variants investigated in fewer than three cohorts as there was insufficient statistical power to significantly investigate the underlying effect.

All analyses were performed using Comprehensive Meta-Analysis version 2.0 (https://www.meta-analysis.com/). Meta-analysis was performed using the inverse-variance weighted method. Heterogeneity was measured using I2 statistics in conjunction with Q- statistics. A fixed effect model was applied if the P-value for Q-statistics was above 0.05 and I2 was below 60. The random effects model was used if either the P-value for Q-statistics was

32 below 0.05 or I2 was above 60, as an I2 above 60 indicates that 60% of the total observed variation is due to true heterogeneity (Rothstein, Borenstein, Hedges, & Higgins, 2013).

Egger's regression in conjunction with a funnel plot was used to assess publication bias. Sensitivity analyses were performed by removing each study from the meta-analysis and calculating the mean effect size for the remaining studies. This analysis was used to assess the contribution of each study to the final weighted effect in the analysis. Additionally, for the variants with P-values < 0.05, we computed both Classic Fail-Safe N and Orwin’s Fail-safe N, to check the number of studies required to make the P-value nonsignificant and make the effect size trivial respectively. For Orwin’s Fail-safe N, the non-significant odds ratio (OR) was kept at 1.05 for the risk increasing allele. While this is certainly not a trivial effect size, it is difficult to identify variants with such small effects with precision given the sample sizes in the present meta-analysis. Subgroup analysis was performed after stratifying based on ethnicity or study methodology to check if either of these variables affected the final effect size. We conducted subgroup analysis only for variants investigated in five or more independent cohorts.

OR and 95% confidence intervals (CI) were used to calculate the mean effect size. For Transmission Disequilibrium Tests (TDT), odds ratios were calculated according to methods laid out by Kazeem and Farall (2005) (Kazeem & Farrall, 2005). Where possible, OR and CI were calculated using allele numbers for case-controls (CC) and transmitted and non- transmitted numbers for TDT. Where information of OR and CI was provided for the complement allele of the allele investigated in the study, the log odds ratio (LOR) and standard error (SE) were calculated and used in the meta-analysis.

Age was not regarded a confounding variable as autism is a neurodevelopmental condition and genetic variations are largely invariant across lifespan. However, autism has a male-female ratio of between 3:1 and 5:1 (Lai et al., 2013), and sex is a potential confounding variable as gene expressions can vary based on sex. However, there was insufficient data to conduct a stratified analysis based on sex, so this is a limitation of the current study. Finally, due to the large number of studies carried out in multiple different cohorts, we adopted a more conservative statistical significance threshold of 0.01. This is similar to what was used in a similar comprehensive meta-analysis of obsessive compulsive disorder (Taylor, 2013). We did not carry out a Bonferroni correction as each variant was investigated in different samples, and as a result, multiple tests were not carried out on the same sample.

33 2.2.4 Analysis of the PGC dataset While we did not choose to include data from available GWAS due to potential overlap of participants, we compared the results using the publicly available GWAS dataset from the Psychiatric Genomics Consortium (PGC) (http://www.med.unc.edu/pgc/results-and- downloads). 4788 trio cases and 4788 trio pseudocontrols as well as 161 cases and 526 controls have been genotyped in the autism cohort of the PGC dataset (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). Genetic variants with P < 0.05 in the meta-analysis were investigated in the PGC dataset.

2.3 Results 2.3.1 Literature review We identified 463 genes that have been tested for genetic association using HuGE Navigator (as of August 2014). SFARI Gene reports 616 genes to be associated with autism (as of August 2014). Only 185 of these genes have been examined in autism using genetic association studies. Of these, we identified 89 genes from the SFARI Gene list that were not included in the HuGE Navigator list, bringing the total list of potential genes to 552. We did not identify any additional genes from AutismKB database. Thus, we reviewed 552 genes in total for the meta-analysis.

Scopus, Google Scholar and Pubmed were searched for publications relating to autism and any of the 552 genes. We searched for common variations in these genes that have been investigated for autism in at least three independent cohorts. Using the eligibility criteria outlined in the methods section, we identified 27 genes that could be taken forward for meta- analysis. In total, there were 58 common variants across these 27 genes that were investigated in this meta-analysis. Details of the studies included and excluded for the 27 genes are given in Appendix Tables 1 and 2.

We next searched the literature for existing meta-analyses for the 58 variants and 27 genes in autism, identifying existing meta-analyses for OXTR (LoParo & Waldman, 2015), RELN (Z. Wang et al., 2014), SLC6A4 (Huang & Santangelo, 2008), HOXA1 (R.-R. Song et al., 2011), HOXB1 (R.-R. Song et al., 2011) and MTHFR (Pu, Shen, & Wu, 2013). As we had additional data and different inclusion criteria, we performed meta-analyses for all the variants in these six genes except rs723387731 in HOXB1, STin2 VNTR in SLC6A4, and the GGC repeat in RELN. These three variants were excluded from the current meta-analyses as we could not identify additional data to add to the original meta-analyses. Detailed information about previous meta-analysis is provided later in the Results. For the sake of comprehensiveness, we

34 have included the data for these three variants in Table 1 and 2. Of the remaining 55 variants, we conducted a complete meta-analysis for 20 variants and a partial meta-analysis for 35 variants. A flow chart of the study protocol is given in Figure 1.

Figure 1: Schematic diagram of the study protocol

The literature review was followed by an identification of candidate SNPs. Where the SNPs were investigated in 5 or more independent cohorts, a full meta-analysis was conducted which included sensitivity analysis, publication bias investigation and stratified analysis. Where the SNPs were investigated in 3 – 5 independent cohorts, only a meta-analysis was conducted. Where the SNPs were investigated in fewer than 3 independent cohorts, the SNP was omitted from the current study.

2.3.2 Mean effect sizes Effect sizes for 15 variants, in 12 genes had P < 0.05. Nine of these variants had P < 0.01. The most significant association was rs167771 in DRD3 (OR = 1.822, P = 9.08x10-6). Seven other significant associations with P < 0.01 were in CNTNAP2 (rs7794745, OR= 0.887, P = 0.001), RELN (rs362691, OR = 0.832, P = 3.93x10-5), OXTR (rs2268491, OR = 1.31, P = 0.004), SLC25A12 (rs2292813, OR = 1.372, P = 0.001 and rs2056202, OR = 1.227, P = 0.002) EN2 (rs1861972, OR = 1.125, P = 0.006) and MTHFR (rs1801133, OR = 1.370, P = 0.010). As expected for common variants in autism, the odds ratios for the alleles tested were small and

35 lay between 0.781 (0.446 - 1.368) for MAOA uVNTR and 1.822 (1.398-2.375) for DRD3 rs167771. Details of the variants analysed, model used and the P-values are provided in Tables 1 and 2. Forest plots for the nine most significant variants are in Figures 2 – 9.

Figure 2: Forest plot for rs7794745 (CNTNAP2Meta Analysis)

Study name Subgroup within study Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Li 2010 TDT 0.919 0.677 1.247 -0.545 0.586 Toma 2013 CC 0.756 0.567 1.006 -1.917 0.055 Sampath 2013 NIMH TDT 0.841 0.757 0.935 -3.210 0.001 Sampath 2013 SSC TDT 0.944 0.854 1.042 -1.141 0.254 0.887 0.828 0.950 -3.445 0.001

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 3: Forest plot for rs167771 (DRD3Meta) Analysis

Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Krom 2009 A 1.884 1.348 2.633 3.708 0.000 Krom 2009 B 2.247 1.372 3.680 3.218 0.001 Toma 2013 0.700 0.282 1.734 -0.771 0.441 1.822 1.398 2.375 4.438 0.000 0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

36 Figure 4: Forest plot for rs362691 (RMetaELN) Analysis

Study name Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Sharma 2013 (Black) 1.060 0.372 3.023 0.109 0.913 Sharma 2013 (White) 0.590 0.220 1.581 -1.049 0.294 Sharma 2013 (Mixed) 0.740 0.311 1.758 -0.682 0.495 He 2011 1.405 0.793 2.490 1.164 0.244 Dutta 2008 0.810 0.432 1.518 -0.658 0.511 Li 2008 0.850 0.773 0.934 -3.374 0.001 Bonora 2003 0.800 0.413 1.550 -0.661 0.509 Serajee 2006 0.520 0.360 0.751 -3.492 0.000 0.832 0.763 0.908 -4.112 0.000 0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 5: Forest plot for rs2268491 (OXTR)

Study name Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Liu et al., 2010 1.405 1.112 1.775 2.850 0.004 Tansey et al., 2010 Irish 1.410 0.860 2.311 1.363 0.173 Tansey et al., 2010 Portuguese 1.020 0.678 1.535 0.095 0.924 Tansey et al., 2010 UCL 1.250 0.588 2.659 0.579 0.562 1.310 1.092 1.572 2.906 0.004

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

37 Figure 6: Forest plot for rs2292813 (SLC25A12Meta Analysis)

Study name Subgroup within study Outcome Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Ramoz 2004 Egyptian TDT 1.459 1.107 1.922 2.682 0.007 Segurado 2005 Irish TDT 1.889 1.067 3.344 2.182 0.029 Blasi 2006 Caucasian TDT 1.280 0.884 1.853 1.308 0.191 Chien 2010 Chinese Case-control 1.293 0.950 1.758 1.636 0.102 Palmieri 2010 Italian Case-control 1.182 0.472 2.958 0.357 0.721 Prandini 2012 Italian TDT 0.710 0.213 2.364 -0.558 0.577 1.373 1.162 1.622 3.722 0.000

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 7: Forest plot for rs2056202 (MetaSLC25A12 Analysis)

Study name Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Ramoz 2004 1.394 1.119 1.736 2.968 0.003 Segurado 2005 1.840 1.131 2.994 2.454 0.014 Blasi 2006 1.125 0.832 1.520 0.767 0.443 Correira 2006 1.031 0.532 1.998 0.090 0.928 Chakrabarti 2009 1.112 0.739 1.672 0.510 0.610 Chien 2010 1.093 0.824 1.450 0.619 0.536 Palmieri 2010 1.361 0.584 3.170 0.713 0.476 Durdiakova 2014 0.760 0.393 1.470 -0.815 0.415 1.227 1.079 1.396 3.123 0.002 0.01 0.1 1 10 100 Favours A Favours B

Meta Analysis

38 Figure 8: Forest plot for rs1801133 (MetaMTHFR Analysis)

Study name Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Boris 2004 2.252 1.811 2.799 7.307 0.000 James 2006 1.243 0.962 1.606 1.666 0.096 Mohammad 2009 2.790 1.579 4.928 3.535 0.000 Pasca 2009 1.179 0.553 2.510 0.426 0.670 dos Santo 2010 1.150 0.789 1.676 0.727 0.467 Liu 2011 1.148 0.980 1.343 1.714 0.086 Schmidt 2011 0.853 0.649 1.121 -1.140 0.254 Guo 2012 1.303 0.961 1.767 1.703 0.089 Divyakolu 2013 3.632 1.543 8.547 2.953 0.003 Park 2014 0.966 0.772 1.208 -0.307 0.759 1.370 1.079 1.739 2.589 0.010 0.01 0.1 1 10 100 Favours A Favours B

Meta Analysis

Figure 9: Forest plot for rs1861972 (EN2)Meta Analysis

Study name Subgroup within study Statistics with study removed Odds ratio (95% CI) with study removed Lower Upper Point limit limit Z-Value p-Value Yang 2008 Case-control 1.111 1.020 1.209 2.410 0.016 Yang 2010 Case-control 1.114 1.023 1.214 2.488 0.013 Benayed 2005 AGRE II TDT 1.169 1.039 1.315 2.596 0.009 Benayed 2005 NIMH TDT 1.130 1.029 1.241 2.558 0.011 Gharani 2004 TDT 1.099 1.008 1.197 2.146 0.032 Chakrabarti 2009 Case-control 1.153 1.057 1.258 3.197 0.001 Warrier 2014 Case-control 1.126 1.032 1.228 2.676 0.007 Prandini 2012 TDT 1.125 1.032 1.226 2.673 0.008 1.125 1.035 1.224 2.758 0.006 0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

39 Table 1: Mean effect and 95% confidence intervals for all SNPs with P < 0.01 Effect Model N, N, PGC Classic Orwin's Gene Variant Allele MAF N OR (95% CI) P Trios direction (I2) cases controls P N N (OR) DRD3 rs167771 G vs A G=0.41 3 1.82 (1.39-2.37) 9.08E-06 F (60) 580 754 0 0.59 D (0.98) 7 34 RELN rs362691 C vs G C=0.12 8 0.83 (0.76-0.90) 3.93E-05 F (33) 765 765 303 NA NA 12 21 SLC25A12 rs2292813 C vs T T=0.20 6 1.37 (1.16-1.62) 1.97E-04 F (0) 465 450 1220 0.77 C (1.01) 5 25 CNTNAP2 rs7794745 A vs T A=0.49 4 0.88 (0.82-0.95) 0.001 F (21) 322 524 2236 0.17 C (0.95) 9 6 SLC25A12 rs2056202 T vs C T=0.24 8 1.22 (1.07 -1.39) 0.002 F (6) 756 1211 1220 0.98 D (0.99) 6 26 OXTR rs2268491 T vs C T=0.21 4 1.31 (1.09 -1.57) 0.004 F (0) 282 440 458 0.54 C (1.02) 3 19 EN2 rs1861972 A vs G G=0.24 8 1.12 (1.03-1.22) 0.006 F (58) 669 1704 953 NA NA 16 12 MTHFR rs1801133 T vs C A=0.24 10 1.37 (1.07-1.73) 0.01 R (88) 2280 7235 0 0.57 C (1.01) 80 40 ASMT rs4446909 G vs A 5 1.19 (1.03 -1.37) 0.013 F (0) 1066 1074 0 NA NA 3 14 MET rs38845 A vs G A=0.36 3 1.32 (1.01-1.72) 0.016 R (66) 405 594 419 0.2 C (1.04) 13 15 SLC6A4 rs2020936 T vs C G=0.22 4 1.24 (1.03 - 1.49) 0.019 F (34) 0 0 1068 0.78 C (1.01) 3 14 STX1A rs4717806 A vs T A=0.23 4 0.85 (0.74 - 0.97) 0.023 F (35) 653 1007 375 NA NA 0 9 RELN rs736707 T vs C G=0.36 9 1.26 (1.03 - 1.56) 0.025 R (77) 975 1695 196 0.31 C (1.03) 126 48 PON1 rs662 A vs G T=0.45 3 0.79 (0.64 - 0.98) 0.034 F (18) 334 641 0 0.06 D (1.05) 0 11 OXTR rs237887 G vs A G=0.39 4 1.16 (1.00 - 1.34) 0.047 F (0) 282 440 458 0.93 C (1.00) 0 9 Table 1 provides the odds ratio (OR) and 95% confidence interval (CI) of the meta-analysis, the total number of independent studies analysing the SNP (N), the allele investigated, the global minor allele frequency of the SNPs investigated (MAF), the model used (F = Fixed effect, R = Random effects), the number of cases, controls and trios, the PGC P-value, the direction of effect between the meta-analysis and the PGC study (C = concordant, D = discordant), Classic Fail Safe N, and Orwin’s Fail Safe N.

40 Table 2: Mean effect and standard error for SNPs with P > 0.01 Model OR (95% N, N, Gene Variants Allele MAF N P (I2 Trios CI) cases controls value) STX1A rs6951030 G vs T G=0.17 4 1.3 (0.9-1.9) 0.05 R (77) 653 1007 375 OXTR rs2268493 C vs T C=0.20 3 0.8 (0.7-1.0) 0.07 F (54) 574 1201 0 ASMT rs5989681 G vs C NA 5 1.1 (0.9 - 1.3) 0.08 F (0) 1066 1074 0 HOXB1 rs72338773* INS vs nINS NA 8 1.3 (0.9-1.3) 0.11 F (NA) 362 448 238 RELN rs2073559 C vs T C=0.47 3 0.95(0.9-1.0) 0.13 F (64) 437 493 473 RELN GGC repeat* NA NA 7 1.1 (0.8–1.5) 0.15 F (0) 878 1170 167 GLO1 rs2736654 A vs C G=0.28 4 1.3 (0.8 - 1.9) 0.18 R (68) 857 680 0 PON1 rs854560 A vs T T=0.18 3 1.1 (0.9 - 1.3) 0.2 F (0) 334 641 0 TPH2 rs11179000 T vs A T=0.39 3 1.1 (0.9-1.3) 0.2 F (0) 224 260 352 MET rs1858830 G vs C G=0.45 8 0.9 (0.7-1.0) 0.21 R (67) 1975 1589 798 OXTR rs2268490 T vs C T=0.25 5 1.1 (0.9-1.4) 0.23 F (0) 292 761 458 OXTR rs2301261 A vs G T=0.12 4 1.1 (0.8-1.4) 0.32 F (39) 650 1300 0 HOXA1 rs10951154 G vs A C=0.21 13 0.9 (0.7-1.0) 0.32 F (36) 705 998 425 BDNF rs6265 G vs A T=0.20 3 0.9 (0.7-1.1) 0.37 F (0) 303 469 140 HTR2A rs6311 A vs G T=0.44 6 0.8 (0.6-1.1) 0.37 R (75) 179 313 396 ITGB3 rs5918 C vs T C=0.08 3 0.8 (0.6-1.1) 0.37 F (37) 139 165 363 MAOA uVNTR short vs long NA 3 0.7 (0.4 - 1.3) 0.38 R (72) 436 469 0 MACROD2 rs4141463 T vs C C=0.38 7 0.9 (0.7-1.1) 0.41 R (87) 1170 35307 1158 OXTR rs2254298 A vs G A=0.20 5 0.8 (0.4-1.3) 0.42 R (82) 650 1306 57 ASMT rs6644635 C vs T NA 4 1.0 (0.9 -1.2) 0.48 F (29) 788 819 0 SLC6A4 rs2020942 A vs G T=0.25 3 1.06(0.8-1.2) 0.52 F (0) 0 0 678 OMG rs11080149 A vs G T=0.04 4 0.8 (0.4 - 1.5) 0.57 F (43) 65 131 431 ADA rs7359837 G vs A A=0.02 3 1.3 (0.4 - 4.7) 0.61 R (89) 334 445 0

41 OXTR rs237894 G vs C C=0.16 5 0.9 (0.8-1.1) 0.62 F (4) 292 761 458 OXTR rs53576 A vs G A=0.38 5 0.9 (0.8-1.1) 0.63 F (45) 650 1300 57 OXTR rs2268494 A vs T A=0.06 4 1.0 (0.7 -1.5) 0.67 F (0) 76 99 458 SLC6A4 STin2 VNTR* 12 vs 9/10 NA 8 1.1 (0.8–1.5) 0.67 R (68) 0 0 814 NF1 GxAlu 9 vs non-9 NA 4 1.1 (0.6 - 2.0) 0.67 R (86) 262 312 0 GRIK2 rs2227281 T vs C T=0.27 4 0.9 (0.6-1.4) 0.73 R (77) 0 0 508 OXTR rs2268495 A vs G A=0.24 4 1.0 (0.7 - 1.4) 0.73 F (60) 282 446 458 SHANK3 rs9616915 C vs T C=0.34 3 0.9 (0.8 - 1.1) 0.74 F (60) 340 863 308 HTR2A rs6314 T vs G A=0.07 4 0.9 (0.6-1.3) 0.74 F (18) 103 214 370 CNTNAP2 rs2710102 T vs C A=0.41 3 0.9 (0.9-1.0) 0.76 F (17) 322 524 2051 OXTR rs237885 G vs T G=0.48 6 0.9 (0.8 - 1.1) 0.76 F (0) 574 1201 458 COMT rs4680 Met vs Val A=0.36 5 0.9 (0.8-1.1) 0.8 F (49) 814 741 35 MTHFR rs1801131 C vs A G=0.24 6 0.9 (0.8-1.1) 0.81 R (56) 1854 6819 0 OXTR rs1042778 G vs A T=0.41 4 1.0 (0.8-1.2) 0.83 F (0) 282 440 458 GRIK2 rs2227283 A vs G A=0.32 4 0.96 (0.6-1.3) 0.85 R (66) 0 0 508 EN2 rs3735653 T vs C T=0.40 4 1.0 (0.8-1.1) 0.92 F (0) 174 349 499 NF1 GxAlu 8 vs non-8 NA 4 0.9 (0.6 - 1.6) 0.94 R (79) 262 312 0 SLC6A4 5-HTTLPR short vs long NA 17 0.9 (0.8-1.1) 0.94 R (64) 0 0 2039 HTR2A rs6313 T vs C A=0.44 3 1.0 (0.8-1.2) 0.94 F (0) 0 0 303 EN2 rs1861973 T vs C T=0.24 6 1.0 (0.7-1.3) 0.97 R (80) 669 1704 681 Table provides the odds ratio (OR) and 95% confidence intervals (CI) of the meta-analysis, the total number of independent studies analysing the SNP (N), the allele investigated, the global minor allele frequency of the SNPs investigated, the model used (F = Fixed effect, R = Random effects), the number of cases, controls and trios. *represents SNPs investigated in previous meta-analyses.

42

2.3.3 Subgroup analyses We performed subgroup analyses, stratifying by ethnicity and study methodology, for variants originally investigated in five or more independent cohorts. In the stratified analyses, six variants had P < 0.05. Of these, the most significant three variants (rs2292813 and rs2056202 - SLC25A12, rs362691 - RELN) were also significant in the non-stratified analyses. Stratification did not increase the significance for these variants. A variant in EN2 (rs1861973) was significant after stratifying based on both ethnicity (Caucasian only) and study methodology (TDT). Another variant in EN2 (rs1861972) was significant after stratifying for study methodology (TDT). Finally, the STin2 variant in SLC6A4 also exhibited a significant trend in the Caucasian only subgroup. This result indicates that at least for a few variants implicated in autism, ethnicity and study methodology can potentially influence the outcome. Results of the subgroup analyses are provided in Table 3. Forest plots for the significant and nominally significant subgroup analyses are provided in Figures 10 – 16.

43

Table 3: Results of the subgroup analyses Gene SNP Allele N Subgroup OR (95% CI) P Model ASMT rs4446909 G vs A 3 Caucasian 1.1 (0.8 - 1.4) 0.31 F ASMT rs5989681 G vs C 3 Caucasian 1.0 (0.8 - 1.3) 0.6 F COMT rs4680 A vs G 4 TDT 0.9 (0.8 - 1.1) 0.71 F EN2 rs1861973 T vs C 4 TDT 0.8 (0.7 - 0.9) 0.003 F EN2 rs1861973 T vs C 3 Caucasian 0.8 (0.8 - 0.9) 0.009 F EN2 rs1861972 A vs G 4 Case-control 1.1 (0.8 - 1.6) 0.26 R EN2 rs1861972 A vs G 4 TDT/Caucasian 1.1 (1.0 - 1.2) 0.01 F HOXA1 rs10951154 A vs G 6 Case-control 0.8 (0.6 - 1.1) 0.32 R HOXA1 rs10951154 A vs G 6 Caucasian 0.8 (0.6 - 1.1) 0.42 R HOXA1 rs10951154 A vs G 7 TDT 0.9 (0.7 - 1.1) 0.63 R HTR2A rs6311 A vs G 4 TDT 0.8 (0.6 - 1.3) 0.57 R HTR2A rs6311 A vs G 3 Caucasian 0.9 (0.5 - 1.5) 0.79 R MACROD2 rs4141463 T vs C 5 Case-control 1.0 (0.9 - 1.1) 0.47 R MET rs1858830 G vs C 7 Case-control 0.8 (0.7 - 1.0) 0.18 R MET rs1858830 G vs C 3 Italian 0.9 (0.5 - 1.4) 0.72 R MTHFR rs1801133 T vs C 4 Caucasian 1.3 (1.2 - 1.5) 0.06 R MTHFR rs1801131 C vs A 3 Caucasian 0.9 (0.7 - 1.0) 0.17 F OXTR rs237885 G vs T 3 Case-control 0.9 (0.8 – 1.1) 0.51 F OXTR rs2268490 T vs C 3 TDT 1.2 (0.9 - 1.7) 0.1 F OXTR rs2254298 A vs G 4 Caucasian 0.6 (0.3 - 1.2) 0.19 F OXTR rs2268490 T vs C 4 Caucasian 1.1 (0.8 - 1.4) 0.36 F OXTR rs237885 G vs T 4 Caucasian 1.0 (0.8 - 1.2) 0.64 F OXTR rs237885 G vs T 3 TDT 1.0 (0.8 - 1.2) 0.69 F OXTR rs2254298 A vs G 4 Case-control 1.0 (0.6 - 1.5) 0.86 F RELN rs362691 C vs G 6 Case-control 0.8 (0.7 - 0.9) 0.001 F RELN rs736707 T vs C 8 Case-control 1.1 (0.9 - 1.4) 0.12 R RELN rs736707 T vs C 3 Caucasian 1.3 (0.8 - 2.0) 0.23 R SLC25A12 rs2292813 C vs T 4 TDT 1.4 (1.1- 1.7) 7.30E-04 F SLC25A12 rs2056202 T vs C 5 TDT 1.2 (1.0 - 1.4) 0.002 F SLC25A12 rs2056202 T vs C 3 Case-control 1.1 (0.8 - 1.4) 0.43 F SLC25A12 rs2056202 T vs C 4 Caucasian 1.0 (0.8 - 1.3) 0.45 F SLC6A4 5-HTTLPR short vs long 5 Caucasian 0.9 (0.6 - 1.4) 0.83 F STin2 SLC6A4 12 vs 9/10 4 Caucasian 1.4 (1.0 - 2.0) 0.01 F VNTR This table provides the results of the subgroup analyses. Results with P < 0.05 have been written in bold.

44

Figure 10: STin2 VNTR (SLC6A4), CaucasianMeta Analysis only

Study name Subgroup within study Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Cook 1997 Caucasian 1.650 0.857 3.177 1.498 0.134 Klauk 1997 Caucasian 1.400 0.725 2.703 1.003 0.316 Maestrini 1999Caucasian 0.910 0.499 1.660 -0.307 0.759 Kim 2002 Caucasian 3.380 1.519 7.522 2.984 0.003 1.492 1.068 2.083 2.347 0.019

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis Figure 11: rs362691 (RELN), Case-controlMeta Analysisonly

Study name Method Ancestry Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Sharma 2013 (Black) Case control South African 1.060 0.372 3.023 0.109 0.913 Sharma 2013 (White) Case control South African 0.590 0.220 1.581 -1.049 0.294 Sharma 2013 (Mixed)Case control South African 0.740 0.311 1.758 -0.682 0.495 He 2011 Case control Chinese (Hans) 1.405 0.793 2.490 1.164 0.244 Dutta 2008 Case control Indian 0.810 0.432 1.518 -0.658 0.511 Li 2008 Case control Chinese (Hans) 0.850 0.773 0.934 -3.374 0.001 0.857 0.783 0.939 -3.318 0.001

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 12: rs2292813 (SLC25A12Meta), TDT Analysis only

Group by Study name Subgroup within study Outcome Statistics for each study Odds ratio and 95% CI Outcome Odds Lower Upper ratio limit limit Z-Value p-Value TDT Ramoz 2004 Egyptian TDT 1.459 1.107 1.922 2.682 0.007 TDT Segurado 2005 Irish TDT 1.889 1.067 3.344 2.182 0.029 TDT Blasi 2006 Caucasian TDT 1.280 0.884 1.853 1.308 0.191 TDT Prandini 2012 Italian TDT 0.710 0.213 2.364 -0.558 0.577 TDT 1.419 1.158 1.740 3.377 0.001

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

45

Figure 13: rs2056202 (SLC25A12), TDTMeta only Analysis

Study name Ancestry Method Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Ramoz 2004 Egyptian TDT 1.394 1.119 1.736 2.968 0.003 Segurado 2005 Irish TDT 1.840 1.131 2.994 2.454 0.014 Blasi 2006 CaucasianTDT 1.125 0.832 1.520 0.767 0.443 Chakrabarti 2009CaucasianTDT 1.112 0.739 1.672 0.510 0.610 Durdiakova 2014CaucasianTDT 0.760 0.393 1.470 -0.815 0.415 1.275 1.097 1.482 3.173 0.002

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 14: rs1861973 (EN2), TDT only

Study name Subgroup within study Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Sen 2010 TDT 0.568 0.333 0.970 -2.073 0.038 Gharani 2004 TDT 0.715 0.550 0.929 -2.507 0.012 Benayed 2005 AGRE II TDT 0.916 0.811 1.034 -1.419 0.156 Benayed 2005 NIMH TDT 0.893 0.741 1.076 -1.189 0.234 0.869 0.792 0.954 -2.942 0.003

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 15: rs1861973 (EN2), Caucasian only

Study name Subgroup within study Statistics for each study Odds ratio and 95% CI Odds Lower Upper ratio limit limit Z-Value p-Value Gharani 2004 Caucasian 0.715 0.550 0.929 -2.507 0.012 Benayed 2005 AGRE II Caucasian 0.916 0.811 1.034 -1.419 0.156 Benayed 2005 NIMH Caucasian 0.893 0.741 1.076 -1.189 0.234 0.881 0.801 0.969 -2.620 0.009

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

46

Figure 16: rs1861972 (EN2), CaucasianMeta only/ Analysis TDT only

Study name Subgroup within study Statistics with study removed Odds ratio (95% CI) with study removed Lower Upper Point limit limit Z-Value p-Value Benayed 2005 AGRE II TDT 1.203 1.032 1.403 2.361 0.018 Benayed 2005 NIMH TDT 1.133 1.016 1.264 2.247 0.025 Gharani 2004 TDT 1.092 0.991 1.204 1.779 0.075 Prandini 2012 TDT 1.126 1.021 1.241 2.376 0.017 1.126 1.025 1.238 2.472 0.013

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

2.3.4 Publication bias and sensitivity analyses Publication bias was significant only for one variant, rs2254298 in OXTR (Egger's test (two-tailed) P = 0.03). However, the mean effect size for the variant was not significant (P = 0.425). Notably, sensitivity was significant for some variants. Of the nine variants with P < 0.01, we performed sensitivity analyses on the six variants with data from more than five independent cohorts (rs7794745, rs362691, rs2292813, rs2056202, rs1861972, and rs1801133). For rs1801133, most studies contributed approximately equally, with the exception of two studies (Park et al., 2014; Schmidt et al., 2011), both these studies lowered the OR. A re- analysis of the data after removing either of the two studies decreased the P-value of the OR (Original P = 0.010, P after removing Park et al, 2014 = 0.006; P after removing Schmidt et al, 2011 = 0.003). For rs2056202, the removal of data from one study (Ramoz et al., 2004) increased the P-value from P = 0.002 to P = 0.088. Sensitivity was not an issue for the remaining four variants that were significant. However, of the nominally significant variants, sensitivity was an issue for rs4446909, rs736707, and rs1861972. Forest graphs of the sensitivity analyses for these five variants are provided in Figures 17 – 21.

47

Figure 17: Sensitivity analysis for rs4446909 (ASMT)

Study name Ancestry Statistics with study removed Odds ratio (95% CI) with study removed Lower Upper Point limit limit Z-Value p-Value Melke 2008 Mixed 1.120 0.955 1.314 1.391 0.164 Toma 2007 Finnish Caucasian 1.194 1.033 1.381 2.394 0.017 Toma 2007 Italian Caucasian 1.201 1.037 1.390 2.450 0.014 Toma 2007 IMGSAC Caucasian 1.213 1.038 1.418 2.430 0.015 Wang 2013 Chinese 1.272 1.051 1.539 2.468 0.014 1.195 1.038 1.375 2.479 0.013

0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

Figure 18: Sensitivity analysis for rs736707Meta Analysis (RELN)

Study name Ancestry Method Statistics with study removed Odds ratio (95% CI) with study removed Lower Upper Point limit limit Z-Value p-Value Sharma 2013 (Black) South African Case control 1.297 1.036 1.623 2.268 0.023 Sharma 2013 (White) South African Case control 1.266 1.014 1.580 2.084 0.037 Sharma 2013 (Mixed)South African Case control 1.204 0.978 1.483 1.749 0.080 He 2011 Chinese (Hans) Case control 1.363 1.140 1.629 3.402 0.001 Dutta 2008 Indian Case control 1.262 1.002 1.589 1.979 0.048 Li 2008 Chinese (Hans) Case control 1.267 0.956 1.679 1.644 0.100 Serajee 2006 Caucasian TDT 1.187 0.953 1.479 1.527 0.127 Chakrabarti 2009 Caucasian Case control 1.310 1.038 1.652 2.277 0.023 Warrier 2014 Caucasian Case control 1.302 1.034 1.640 2.248 0.025 1.269 1.030 1.563 2.235 0.025 0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

48

Figure 19: Sensitivity analysis for rs1801133Meta Analysis (MTHFR)

Study name Statistics with study removed Odds ratio (95% CI) Lower Upper with study removed Point limit limit Z-Value p-Value Boris 2004 1.229 1.019 1.483 2.153 0.031 James 2006 1.397 1.064 1.835 2.408 0.016 Mohammad 20091.288 1.017 1.632 2.098 0.036 Pasca 2009 1.385 1.079 1.777 2.558 0.011 dos Santo 2010 1.402 1.081 1.820 2.543 0.011 Liu 2011 1.422 1.064 1.900 2.382 0.017 Schmidt 2011 1.455 1.135 1.865 2.958 0.003 Guo 2012 1.386 1.062 1.809 2.399 0.016 Divyakolu 2013 1.301 1.028 1.646 2.190 0.029 Park 2014 1.441 1.110 1.872 2.741 0.006 1.370 1.079 1.739 2.589 0.010 0.01 0.1 1 10 100 Favours A Favours B

Meta Analysis

Figure 20: Sensitivity analysis for rs2056202Meta Analysis (SLC25A12)

Study name Statistics with study removed Odds ratio (95% CI) Lower Upper with study removed Point limit limit Z-Value p-Value Ramoz 2004 1.148 0.980 1.346 1.708 0.088 Segurado 2005 1.191 1.042 1.360 2.566 0.010 Blasi 2006 1.251 1.085 1.442 3.091 0.002 Correira 2006 1.236 1.084 1.409 3.165 0.002 Chakrabarti 2009 1.241 1.084 1.421 3.121 0.002 Chien 2010 1.265 1.095 1.461 3.191 0.001 Palmieri 2010 1.224 1.075 1.394 3.050 0.002 Durdiakova 2014 1.251 1.097 1.426 3.345 0.001 1.227 1.079 1.396 3.123 0.002 0.01 0.1 1 10 100 Favours A Favours B

Meta Analysis

49

Figure 21: Sensitivity analysis for rs1861972Meta Analysis (EN2)

Study name Statistics with study removed Odds ratio (95% CI) Lower Upper with study removed Point limit limit Z-Value p-Value Yang 2008 1.147 0.978 1.344 1.690 0.091 Yang 2010 1.163 0.982 1.377 1.750 0.080 Benayed 2005 AGRE II 1.242 0.995 1.550 1.915 0.055 Benayed 2005 NIMH 1.228 0.993 1.518 1.898 0.058 Gharani 2004 1.112 0.975 1.268 1.579 0.114 Chakrabarti 2009 1.244 1.060 1.460 2.674 0.007 Warrier 2014 1.207 1.002 1.455 1.980 0.048 1.187 1.011 1.393 2.092 0.036 0.01 0.1 1 10 100

Favours A Favours B

Meta Analysis

2.3.5 Analysis of the PGC dataset 11 of the 15 nominally significant variants in the current meta-analyses were genotyped in the PGC GWAS cohort, none were found to be significant. Effect direction was concordant for eight of the 11 variants between both the datasets. Effect sizes, as expected due to the larger sample size, were smaller in the PGC dataset for all the 11 variants, and the odds ratios were closer to 1. Total sample size was also not a significant predictor of concordance of effect direction between the two datasets. However, inspection of the datasets indicates that with the exception of rs2056202 in SLC25A12, the other three variants discordant for effect direction were analysed in small samples in the meta-analysis.

Due to the lack of significance for 11 of the 15 variants in the PGC dataset, we re- evaluated the significance of the remaining four variants. For two variants, the Classic Fail- safe N is very small (3 for rs4446909 in ASMT, and 0 for rs4717806 in STX1A). The latter variant was analysed using a fixed effect model and becomes non-significant when analysed using a random effect model. For the remaining two variants (rs1861972 in EN2 and rs362691 in RELN), the Classic Fail-safe N is above 10. The sample sizes, however, are modest. These analyses indicate that the first two variants are likely to be false positives. Additional research is required to confirm the significance of the latter two variants.

2.3.6 Previous meta-analyses Five genes investigated in this study have been previously investigated in other meta- analyses. These are: OXTR, RELN, HOXA1, MTHFR, and SLC6A4. HOXB1 was previously analysed using meta-analysis and was not re-investigated in this study as there was no new data

50

to include. This study differs from that of LoParo and Waldman (LoParo & Waldman, 2015) who carried out a meta-analysis of OXTR and autism, as they included FBAT studies in their analyses. We excluded studies that used FBAT as FBAT does not report effect sizes. However, we included three additional cohorts, unpublished genotype data from two cohorts from our lab, and a third cohort studied by Nyffeler and colleagues (Nyffeler, Walitza, Bobrowski, Gundelfinger, & Grünblatt, 2014). Of the three variants significant in the previous study, rs237887 and rs2268491 were significant in the current study. We did not have enough data to test the third significant variant (rs7632287). A previous autism and RELN meta-analysis (Z. Wang et al., 2014) investigated three variants (rs736707, rs362691, and the GGC repeat), with only rs362691 giving a statistically significant P-value. In this study, we re-investigated the first two variants using data from additional cohorts. rs736707 was nominally significant and rs362691 was significant in this study. We did not identify any additional data for the GGC repeat and hence did not investigate it in our study. Additionally, we identified a fourth variant in RELN, rs2073559, which was not investigated by the previous meta-analysis. This variant was not significant in this study. We analysed both the variants investigated in a previous meta- analysis (Pu et al., 2013), of MTHFR and autism, including data from two additional studies for rs1801133 and one additional study for rs1801131. The results were similar to the previous results obtained. rs1801133 was significant whereas rs1801131 was not. While the previous study stratified based on folate fortification, we did not conduct these analyses due to insufficient data on folate fortification. We re-investigated rs10951154 in HOXA1 which was investigated in an earlier meta-analysis (R.-R. Song et al., 2011). We included data from the Chakrabarti et al., (2009) cohort in this study, which was not included in the earlier study. While the previous study carried out analyses stratified by ethnicity, they did not stratify the data based on study methodology, differing from this study. We did not identify any additional data from the HOXB1 variant, rs72338773, investigated in the previous study (R.-R. Song et al., 2011) and hence did not re-investigate that variant. Finally, SLC6A4 has been investigated for autism using meta-analysis in an earlier study (Huang & Santangelo, 2008). We extend their work for 5-HTTLPR by using additional data and investigate two additional variants (rs2020936 and rs2020942) in this study. We did not identify any additional data for STin2 VNTR and hence did not re-investigate it.

2.4 Discussion This is the first study to comprehensively investigate candidate gene association studies of common variants in autism. Using two databases, we identified 552 genes that are reported

51

to be implicated in autism through genetic association studies. We scanned the literature for these 552 genes and, using a strict inclusion criterion, we identified 27 genes that had sufficient data to perform a meta-analysis. Eight variants across seven genes were significant for combined effect sizes with P < 0.01. Data for 11 variants was present in the PGC GWAS dataset. None of the 11 variants were significant in the PGC dataset though the majority of the variants were concordant for effect direction in both the datasets. Overall, the lack of replication within the larger PGC cohort suggests that the results from candidate gene association studies are likely to be false positives.

Effect sizes for most common variants are modest for autism and these results are consistent with this observation. However, there was no clear correlation between effect sizes in this dataset and the PGC dataset. Effect sizes were smaller in the PGC dataset. This is most likely due to the phenomenon known as ‘winner’s curse’, where, in genetic studies with low statistical power, variants with significant P-values are likely to have inflated effect sizes. Though most commonly described for GWAS, this is applicable in candidate gene association studies as well. While most of the effects lay between 0.8 and 1.2, which is expected from GWAS data, for some variants, the effect was larger. The most significant variant (rs167771) had data only from three studies and had a relatively high OR of 1.82 (1.40-2.38). The small sample size for this variant inflated the OR making it significant. The effect direction was discordant for the variant in the PGC dataset and it was not significant in this dataset.

It is clear that all candidate gene association studies are underpowered to replicably identify significant variants. Additionally, the different study methodologies and ethnicities contributed to heterogeneity in the sample which potentially confounded the analyses. It is also clear from this study that significant heterogeneity exists for a large fraction of the variants tested. In fact, heterogeneity is significantly and positively correlated with the number of independent datasets included per variant in the analyses, indicating that the current study may not have uncovered all the heterogeneity. We were able to account for some of the heterogeneity after stratifying for ethnicity and study methodology, but heterogeneity influenced the results for some for the variants even after this. This indicates that other additional factors contribute to variance in the effect. One potential source of heterogeneity is finer population stratification. Fine-scale population stratification cannot be addressed in candidate gene association studies as these test only a few SNPs. We were unable to stratify based on sex or clinical ascertainment - two factors known to contribute to heterogeneity in autism. It is unclear how clinical heterogeneity maps onto genetic heterogeneity in autism.

52

Existing genetic studies that stratify based on IQ or other clinical phenotype and subphenotypes and have had limited success (Chaste et al., 2015; Warrier, Chakrabarti, et al., 2015). The inability to completely identify sources of heterogeneity forced us to choose between two models (fixed effect vs random effects), when most variants are likely to have varying levels of heterogeneity.

Another cause for concern is the small number of genes with enough data to meta- analyse. Of 552 genes, we had data for only 27 of these, less than 5%. None of the 27 genes analysed were autism risk genes as predicted by DAWN (Liu et al., 2014). Further, with the exception of SHANK3 (Sanders et al., 2015) none of these genes have sufficient evidence to categorize them as risk genes using sequencing or copy number variation studies, though, emerging evidence suggests that rare and common variants may contribute to different risk pathways in autism. For example, while rare variant mutation burden is correlated with reduction in cognitive ability (Robinson et al., 2014; Samocha et al., 2014), common risk variants for autism is associated with higher educational attainment (Bulik-Sullivan, Finucane, et al., 2015) and cognitive aptitude (Sniekers et al., 2017). A few genes in the list of 552 genes but absent from the final list of 27 genes are predicted to be autism risk genes. This includes GABRB3, GRIN2B and NRXN1 (Sanders et al., 2015). However, there was insufficient evidence to evaluate the role of common variants in autism for these genes through the current meta-analysis.

The candidate gene association studies typically have small samples, which overestimate effect sizes. The lack of replication does not indicate that these loci do not contribute to the aetiology of autism, but, rather, that there is insufficient evidence to implicate it in autism. Autism is highly polygenic and more than 49% of its variance can be attributed to common variants (Gaugler et al., 2014). As effect size for each individual common variant is likely to be very modest and not likely to exceed an OR of 1.3, this indicates that there are several common variants that contribute to the condition. Disentangling this would require very large sample sizes, much larger than those in the current PGC Autism GWAS. It is evident, from the current study, that candidate gene association studies in autism have been underpowered to reliably detect causative variants with precision. For this reason, in Chapters 3 – 7, we conduct genome-wide association studies of traits related to autism.

53

3. Genome-wide association study of self-reported empathy

3.1 Introduction Empathy is the ability to identify other people’s thoughts, intentions, desires, and feelings, and to respond to others’ mental states with an appropriate emotion (Baron-Cohen & Wheelwright, 2004). It plays an important role in social interaction by facilitating both making sense of other people’s behaviour and in responding appropriately to their behaviour. For these reasons, it is considered a key component of prosocial behaviour, social cooperation, and social cognition (Decety, Bartal, Uzefovsky, & Knafo-Noam, 2015). Aspects of empathy are observed in humans and other animals and is thought to have evolved to support a range of prosocial behaviour and cooperative behaviour (Decety et al., 2015).

Differences in various fractions of empathy have been observed in several psychiatric conditions. Two major fractions of empathy include affective empathy (the drive to respond to another’s mental state with an appropriate emotion) and cognitive empathy (the ability to recognize another’s mental state). Differences in empathy have been reported in autism (Baron- Cohen & Wheelwright, 2004), bipolar disorder (Derntl, Seidel, Schneider, & Habel, 2012), schizophrenia (Bora, Gökçen, & Veznedaroglu, 2008; Lehmann et al., 2014; Michaels et al., 2014), antisocial personality disorder (American Psychiatric Association, 2013) , borderline personality disorder (American Psychiatric Association, 2013), anorexia (Morris, Bramham, Smith, & Tchanturia, 2014), and major depressive disorder (Derntl et al., 2012; Thoma, Schmidt, Juckel, Norra, & Suchan, 2015; Weightman, Air, & Baune, 2014). These differences vary between psychiatric conditions: for example, individuals with schizophrenia are more likely to report higher personal distress and emotional contagion (Lehmann et al., 2014), whereas individuals with autism are likely to show difficulties with cognitive (but not affective) empathy (Baron-Cohen, 2009; Baron-Cohen & Wheelwright, 2004). In contrast, those with antisocial personality disorder are likely to show difficulties with affective (but not cognitive) empathy (Baron-Cohen, 2011). These may reflect causal risk mechanisms where alterations in fractions of empathy contribute to higher risk for developing a specific psychiatric condition. Equally, differences in empathy may also be a consequence of the presence of a psychiatric condition, which may not allow individuals to understand and respond to another person’s mental state effectively. For example, those with depression may have intact empathy prior to their low mood, but show reduced empathy as a consequence of mood change.

54

Whilst empathy is clearly shaped by early experience, parenting, and other social factors, different lines of evidence suggest that empathy is partly biological. Empathy is modestly heritable (approximately a third of the variance is heritable) (Davis, Luce, & Kraus, 1994; Emde et al., 1992; Hatemi, Smith, Alford, Martin, & Hibbing, 2015), and a few candidate gene association studies have investigated the role of various genes in empathy (Chakrabarti et al., 2009; Uzefovsky et al., 2014; Warrier et al., 2013). In addition, several studies have identified a role for the oxytocinergic and the fetal testosterone systems in modulating empathy (Auyeung et al., 2006; Chapman et al., 2006; Decety, 2010; Decety et al., 2015). Neuroimaging studies have identified distinct brain regions implicated in different aspects of empathy including the amygdala and the ventromedial prefrontal cortex (Morelli, Rameson, & Lieberman, 2014; Siegal & Varley, 2002). Empathy also shows a marked sex difference: females, on average, score higher on different measures of empathy (Baron-Cohen & Wheelwright, 2004; Holgado Tello, Delgado Egido, Carrasco Ortiz, & Del Barrio Gandara, 2013). A longitudinal study suggests that this female advantage grows larger with age (Mestre, Samper, Frías, & Tur, 2009). Sex differences in the mind arise from a combination of innate biological differences, cultural, and environmental factors. Studies in infant humans have identified sex differences in the developmental precursors to empathy, such as neonatal preference for faces over objects (Connellan, Baron-Cohen, Wheelwright, Batki, & Ahluwalia, 2000), when environmental and cultural influences are minimal, lending support to the idea that sex differences in empathy are at least partly biological (Christov-Moore et al., 2014).

Because empathy difficulties are found in a range of psychiatric conditions, empathy is an important phenotype for investigation. Understanding the biological networks that partly determine empathy may help us understand how it contributes to psychiatric phenotypes, an approach that has been used for other traits such as neuroticism (de Moor et al., 2015), creativity (Power et al., 2015), and cognitive ability (Clarke et al., 2015). We investigate the genetic correlates of empathy using the Empathy Quotient (EQ) (Baron-Cohen & Wheelwright, 2004). The EQ is listed in the Research Domain Criteria (RDoC)(Insel et al., 2010) as a self- report measure under the domain of ‘Understanding Mental States’ (https://www.nimh.nih.gov/research-priorities/rdoc/units/self-reports/151133.shtml). The EQ is widely used, has excellent test-retest reliability (r = 0.83, P = 0.0001) (Lawrence, Shaw, Baker, Baron-Cohen, & David, 2004), high internal consistency (Cronbach’s alpha ~ 0.9) (Bos et al., 2016; Melchers, Montag, Markett, & Reuter, 2015) and is significantly correlated with factors in the Interpersonal Reactivity Index (IRI) and the Toronto Empathy Questionnaire,

55

two other measures of empathy, suggesting good concurrent validity (Lawrence et al., 2004; Spreng, McKinnon, Mar, & Levine, 2009). Psychometric analysis of the EQ in 3,334 individuals from the general population suggests that the EQ is a good measure of empathy which can be measured across a single dimension (Allison, Baron-Cohen, Wheelwright, Stone, & Muncer, 2011). By focusing on a self-report measure of empathy, we were able to obtain phenotypic and genetic data from a large number of participants, increasing the statistical power of the study. In Chapter 4, we investigate the genetic correlates of cognitive empathy using a specific performance measure – the ‘Reading the Mind in the Eyes’ Test (the Eyes Test) (Warrier et al., 2017) . The Eyes Test has a low correlation with the EQ (r ~ 0.10) (Baron- Cohen et al., 2015; Melchers et al., 2015) and measures only one facet of empathy, namely cognitive empathy. Cognitive empathy is also referred to as employing a ‘theory of mind’, or ‘mentalizing’. The EQ includes items that measure cognitive empathy, but others that measure affective empathy. Yet other items on the EQ involve both cognitive and affective empathy.

In this study, we aim to answer three questions: 1. What are the genetic correlates of empathy? 2. Is empathy genetically correlated to various psychiatric conditions, psychological traits, and education? and 3. Is there a genetic contribution to sex differences in empathy? We performed sex-stratified and non-stratified genome-wide association analyses of empathy in research participants from 23andMe, a personalized genetics company. We calculated the narrow sense heritability explained by all the SNPs tested, and investigated sex differences. Finally, we conducted genetic correlation analyses with six psychiatric conditions (anorexia, ADHD, autism, bipolar disorder, major depressive disorder, and schizophrenia), six psychological traits, and educational attainment.

3.2 Methods 3.2.1 Participants Research Participants were drawn from the customer base of 23andMe, Inc. a personal genetics company and are described in detail elsewhere (Do et al., 2011; Tung et al., 2011). There were 46,861 participants (24,543 females and 22,318 males). All participants included in the analyses provided informed consent and answered surveys online according to a human subjects research protocol, which was reviewed and approved by Ethical & Independent Review Services, an AAHRPP-accredited private institutional review board (http://www.eandireview.com). All participants completed the online version of the questionnaire accessible via the research tab of their password protected 23andMe personal online account. Only participants who were primarily of European ancestry (97% European

56

Ancestry) were selected for the analysis using existing methods (Eriksson et al., 2012). Unrelated individuals were selected using a segmental identity-by-descent algorithm (Henn et al., 2012).

3.2.2 Measures The Empathy Quotient (EQ) (Baron-Cohen & Wheelwright, 2004), is a self-report measure of empathy, and includes items relevant to both cognitive and affective empathy. It comprises 60 questions and has a good test-retest reliability (Lawrence et al., 2004). 20 questions are filler questions, of the remaining 40 questions participants can score a maximum of 2 points and a minimum of 0 point per question. Therefore, in this study, participants scored a maximum of 80 and a minimum of 0.

3.2.3 Genotyping, imputation and quality control DNA extraction and genotyping were performed on saliva samples by the National Genetic Institute, USA. Participants were genotyped on one of four different platforms – V1, V2, V3 and V4. The V1 and V2 platforms have a total of 560,000 SNPs largely based on the Illumina HumanHap550+ BeadChip. The V3 platform has 950,000 SNPs based on the Illumina OmniExpress+ Beadchip and has custom content to improve the overlap with the V2 platform. The V4 platform is a fully customized array and has about 570,000 SNPs. All samples had a call rate greater than 98.5%. A total of 1,030,430 SNPs (including Insertion/Deletion or InDels) were genotyped across all platforms. Imputation was performed using the March 2012 (v3) release of the 1000 Genomes Phase 1 reference haplotypes. First, we used Beagle (version 3.3.1) (Browning & Browning, 2007) to phase batches of 8000-9000 individuals across chromosomal segments of no more than 10,000 genotyped SNPs, with overlaps of 200 SNPs. SNPs were excluded if they were not in Hardy-Weinberg equilibrium (P < 10-20), had a genotype call rate less than 95%, or had discrepancies in allele frequency compared to the reference European 1000 Genomes data (chi-squared P < 10-15). We then imputed each phased segment against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac2 (Fuchsberger, Abecasis, & Hinds, 2015), using 5 rounds and 200 states for parameter estimation. We restricted the analyses to only SNPs that had a minor allele frequency (MAF) of at least 1%. For genotyped SNPs, those present only on platform V1, or in chromosome Y and mitochondrial chromosomes were excluded due to small sample sizes and unreliable genotype calling respectively. Next, using trio data from that was available from research participants in the 23andMe dataset, SNPs that failed a parent offspring transmission test were excluded. For imputed SNPs, we excluded SNPs with average r2 < 0.5

57

or minimum r2 < 0.3 in any imputation batch, as well as SNPs that had strong evidence of an imputation batch effect. The batch effect test is an F test from an ANOVA of the SNP dosages against a factor representing imputation batch; we excluded results with P<10−50. After quality control, 9,955,952 SNPs were analysed. Genotyping, imputation, and preliminary quality control were performed by 23andMe.

3.2.4 Genetic association We performed a linear regression assuming an additive model of genetic effects. Age and sex along with the first five ancestry principal components were included as covariates. Additionally, we performed a male-only and a female-only linear regression analysis to identify sex-specific loci. Lead SNPs in each locus were identified after pruning for LD (r2 > 0.8) using Plink version 1.9. We calculated the variance explained by the top SNPs using a previously used formula (Hibar et al., 2015):

2 2 푅푔|푐 푡 2 = × 100 1 − 푅푐 푛 − 푘 − 1

2 푅푔|푐 2 is the proportion of variance explained by the SNP after accounting for the effects of the 1−푅푐 covariates (four ancestry principal components, age, and, additionally, sex for the non-stratified analyses), t is the t-statistic of the regression co-efficient, k is the number of covariates, and n is the sample size. Winner’s curse correction was conducted using FDR Inverse Quantile Transformation (Bigdeli et al., 2016).

3.2.5 Genomic inflation factor, heritability, and functional enrichment We used Linkage Disequilibrium Score regression coefficient (LDSR) to calculate genomic inflation due to population stratification (Bulik-Sullivan, Loh, et al., 2015) (https://github.com/bulik/ldsc). Heritability and genetic correlation was performed using extended methods in LDSR (Bulik-Sullivan, Finucane, et al., 2015). Difference in heritability between males and females was quantified using (Ge et al., 2017):

2 2 ℎ푚푎푙푒푠 − ℎ푓푒푚푎푙푒푠 푍 = 2 2 √푆퐸푚푎푙푒푠 + 푆퐸푓푒푚푎푙푒푠

2 2 Where Z is the Z score for the difference in heritability for a trait, (h males - h females) is the difference SNP heritability estimate in males and females, and SE is the standard errors for heritability. Two-tailed P-values were calculated, and reported as significant if P < 0.05. We identified enrichment in genomic functional elements for the traits by partitioning heritability

58

performed in LDSR (Finucane et al., 2015). In addition to the baseline partitions we conducted four additional enrichment analyses. Enrichment for CNS specific histone marks was conducted using cell type specific partitioned heritability analysis. For genes that are intolerant to loss-of-function mutations, we identified gene boundaries of genes with probability of loss- of-function intolerance scores > 0.9 from the Exome Aggregation Consortium (Lek et al., 2016), and conducted partitioned heritability analysis for all common SNPs within the gene boundaries identified. Similarly, for sex-differentially enriched genes, we identified gene boundaries of genes with sex differential expression in cerebral cortex and associates structures (Brain Other) (Chen et al., 2016) (Appendix: Table 3). We divided this into two separate lists – genes with higher expression in males and genes with higher expression in females with an FDR corrected P-value < 0.05. Partitioned heritability analyses were conducted to identify enrichment.

3.2.6 Genetic correlations LDSR was also used to calculate genetic correlations. We restricted our analyses to only the non-stratified GWAS dataset due to the unavailability of sex-stratified GWAS data in the phenotypes investigated. We calculated initial genetic correlations using LD Hub (Zheng et al., 2016) for schizophrenia (Ripke et al., 2014), bipolar disorder, major depressive disorder, depressive symptoms, educational attainment (years of schooling.), NEO-Openness to experience, NEO-Conscientiousness, subjective wellbeing, and neuroticism. For anorexia nervosa (Duncan et al., 2017), autism (The Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium, 2017), and ADHD (Demontis et al., 2017), we used the data available from the PGC webpage (https://www.med.unc.edu/pgc/results-and-downloads) to conduct genetic correlation analyses as these are in larger samples and, consequently, have greater statistical power than the datasets available on LD Hub. In addition, for autism, we also used an independent dataset from a larger autism cohort from Denmark – the autism_iPSYCH dataset (Pedersen et al., 2017). We also conducted genetic correlation for extraversion (van den Berg et al., 2016) separately as the data was unavailable on LD Hub. For the anorexia nervosa and the extraversion analyses, the North West European LD scores were used and the intercepts were not constrained as the extent of participant overlap was unknown. We report significant lists if the Bonferroni corrected P-value < 0.05, which we acknowledge is conservative. For anorexia nervosa and autism, we also conducted genetic correlation analyses using the sex- stratified EQ dataset due to the significant sex-differences observed in these conditions. We

59

correct for these using Bonferroni correction, and report significant correlations with a P-value < 0.05.

3.2.7 Gene-based analysis Gene based analyses for the non-stratified GWAS were performed using MAGMA (de Leeuw, Mooij, Heskes, & Posthuma, 2015), which integrates LD information between SNPs to prioritize genes. Genes were significant if they had a Bonferroni corrected P-value < 0.05. In addition, we also investigated enrichment in terms using MAGMA.

3.2.8 Genome-wide colocalization Pairwise genome-wide colocalization analyses were conducted using GWAS-PW (Pickrell et al., 2016) by dividing the genomes into segments containing approximately 5,000 SNPs each. We considered the posterior probability of model 3 i.e. the model wherein SNPs in the same locus influence both the traits. We used a rigorous threshold of posterior probability > 0.9 to identify significant loci that influenced both the traits. We conducted pairwise colocalization for empathy (non-stratified) and schizophrenia (Ripke et al., 2014), and for empathy (non-stratified) and anorexia nervosa (Duncan et al., 2017).

3.2.9 Data Availability Summary statistics for the EQ can be requested directly from 23andMe, and will be made available to qualified researchers subject to the terms of a data transfer agreement with 23andMe that protects the privacy of the 23andMe research participants. Please contact David Hinds ([email protected]) for more information. Top SNPs can be visualized here: https://ghfc.pasteur.fr/eq/.

3.3 Results 3.3.1 Phenotype description To understand the genetic correlates of empathy, we collaborated with 23andMe to conduct a Genome Wide Association Study (GWAS) of empathy (n = 46,861) using the EQ, which was normally distributed. A flow chart of the study protocol is shown in Figure 1.

60

Figure 1: Schematic diagram of the study protocol

Phenotyping was conducted in research participants from 23andMe., Inc using the 60 question Empathy Quotient (Panel A). Three GWAS analyses were conducted: non-stratified GWAS (Panel B) and sex-stratified GWAS (Panel D) in unrelated individuals of primarily European ancestry. Summary non-stratified GWAS data was used to conduct SNP heritability, gene, pathway, and functional enrichment, genetic correlations with psychiatric conditions, psychological traits and education, and Bayesian gene colocalization (Panel C). Summary sex- stratified GWAS data was used to conduct sex-specific SNP heritability, genetic correlation between the male and female datasets, and enrichment in sex-differentially expressed genes (Panel E).

The mean score for all participants was 46.4 (s.d. = 13.7) on a total of 80 on the EQ, which is similar to the mean score reported in 90 typical participants in the first study describing the EQ (42.1, sd =10.6) (Baron-Cohen & Wheelwright, 2004). Females scored higher than males on the EQ (41.9, s.d. =13.5 in males, 50.4, s.d. =12.6 in females) (Figure 2), as previously observed (Baron-Cohen & Wheelwright, 2004). There was significant age effect, with scores increasing with age (Beta = 0.08±0.003; P = 3.3x10-104) and a significant sex effect with females scoring higher than males (Beta = 8.4±0.11; P ~ 0).

61

Figure 2: Mean scores and heritability estimates for the EQ

Mean scores and standard deviations for scores on the EQ (a). The effect size of the difference between males and females for the EQ scores was Cohen’s d= 0.65. The bar on top and the number represents the statistical significance of the male-female difference in mean scores for the EQ. Mean estimates and standard errors for heritability for scores on the EQ (b) for the females-only GWAS, males-only GWAS and the non-stratified GWAS. Numbers on top of the graphs represent P-values for each heritability estimate. Note, the y-axis does not start at 0.

3.3.2 Genome-wide association analyses We conducted three GWAS analyses: a male-only analysis, a female-only analysis, and a non-stratified analysis, using a linear regression model with age and the first four ancestry principal components as covariates (Methods). LD score regression coefficient suggested non- significant genomic inflation due to population stratification (Figures 3 - 6). We identified one genome-wide significant SNP (rs4882760; P = 4.29x10-8, non-stratified GWAS) is an intronic SNP in TMEM132C (Figures 3 - 6 and Table 1). This SNP lies in the second intron of TMEM132C. Due to the high recombination rates in this region, we were unable to investigate other SNPs in high LD with the lead SNP. Therefore, a cautious interpretation of this result is warranted. TMEM132C encodes a transmembrane , but is otherwise poorly characterized. We identified 11 loci with P < 1 x 10-6 in the three GWAS.

62

Figure 3: Manhattan Plot (A) and QQ Plot (B) of the non-stratified GWAS analysis

λGC = 1.092, LDSR intercept = 1.0007±0.0066

63

Figure 4: Manhattan Plot (A) and QQ Plot (B) of the females-only GWAS analysis

λGC = 1.049, LDSR intercept = 1.0015±0.0064

64

Figure 5: Manhattan Plot (A) and QQ Plot (B) of the males-only GWAS analysis

λGC = 1.051, LDSR intercept = 0.99±0.0055

65

Figure 6: Regional association plot for rs4882760 (Non-stratified GWAS)

66

Table 1: Independent SNPs with P < 1x10-6 from the GWAS studies SNP Chr Freq Freq BP EA OA P Effect SE Study Imputation Nearest Gene Genic EA OA r2 (GENCODE) Location rs4882760 12 0.49 0.51 128913894 T A 4.29E-08 -0.51 0.09 M+F 0.86 TMEM132C intron rs189163756 2 0.04 0.96 143370827 C A 1.21E-07 -1.33 0.25 M+F 0.69 KYNU intergenic rs76891664 2 0.09 0.91 143354362 G A 1.97E-07 -1.16 0.22 M 0.98 AC016706.1 intergenic rs146838217 10 0.26 0.74 120542718 G C 3.06E-07 0.54 0.11 M+F 0.82 U3 intergenic rs11264567 1 0.30 0.70 153875689 G A 3.14E-07 0.72 0.14 M 0.97 GATAD2B intron rs1141090 11 0.57 0.43 13033155 C A 3.64E-07 0.43 0.09 M+F 0.98 RASSF10 3-prime-UTR rs10265275 7 0.47 0.53 135775848 T C 6.37E-07 -0.74 0.15 M 0.86 AC009784.3 intergenic rs75171949 5 0.08 0.92 101110228 G C 6.5E-07 -1.23 0.25 M 0.89 Metazoa_SRP intergenic rs201219357 15 0.04 0.96 69399212 I D 6.81E-07 -1.87 0.38 F 0.57 RP11- 809H16.5 rs2089401 16 0.90 0.10 67549653 G C 8.17E-07 -1.04 0.21 M 0.96 CTD- intergenic 2012K14.1 rs201645977 16 0.85 0.15 67797575 I D 8.88E-07 -0.88 0.18 M 0.95 RANBP10 intergenic This table provides the details of all independent SNPs of suggestive significance (P < 1x10-6). Chr = Chromosome, BP = base pair (Hg19), P = Regression P-value, Effect = Regression beta (not standardized), SE = Standard error, EA = effect allele, OA = other allele

67

To investigate if the top SNPs from the EQ also contribute to cognitive empathy as measured by the Eyes Test (Chapter 4), we conducted SNP lookup of all the 11 suggestive loci in the Eyes Test GWAS. None of the 11 loci were significant in the Eyes Test GWAS, and only 7 out of the 11 SNPs had concordant effect directions in the two traits (P = 0.54; two- sided binomial sign test).

The most significant SNP in each GWAS analysis explained 0.06 – 0.13% of the total variance (Table 2). However, this reduced to 0.0006 – 0.016% after correcting for winner’s curse (Table 2).

Table 2: Variance explained by the top SNPs Study SNP Chr EA OA P Beta SE R2 R2 (WC) EQ_All rs4882760 12 T A 4.29E-08 -0.51 0.09 0.06 0.0021 EQ_M rs7689166 2 G A 1.97E-07 -1.16 0.22 0.12 0.0165 4 EQ_F rs2012193 15 I D 6.81E-07 -1.87 0.38 0.10 0.0006 57 This table provides details of the most significant SNPs in each GWAS and the variance explained .EA = effect allele, OA = other allele, Beta = regression beta (not standardized), SE = standard error of regression, R2 = variance explained by the SNP, R2 (WC) = variance explained after winner's curse correction (see methods for winner’s curse correction)

3.3.3 Gene-based association, heritability, and enrichment in functional categories Gene based analysis identified two significant genes for the EQ: SEMA6D (P = 9.14x10- 7), and FBN2 (P = 1.68x10-6) (Appendix: Table 4). Analysis for enrichment in Gene Ontology (GO) terms did not identify any significant enrichment (Appendix: Table 5). The most significant GO process was negative regulation of neurotransmitter secretion.

We used LDSR (Bulik-Sullivan, Loh, et al., 2015) to calculate the heritability explained by all the SNPs tested (Methods) and identified a heritability of 0.11±0.014 for the EQ (P = 1.7x10-14) (Figure 2, Table 3). Partitioning heritability by functional categories did not identify any significant enrichment after correcting for multiple testing (Table 4). We next investigated if there was an enrichment in heritability for histone marks in cells in the CNS (Finucane et al., 2015), but did not find a significant enrichment (enrichment = 3.67±1.45; P = 0.077).

68

Table 3: Additive SNP heritability for the three GWAS Study h2 (SE) Z score P

EQ_All 0.1074 (0.014) 7.67 1.70E-14 EQ_M 0.1307 (0.0257) 5.09 3.66E-07 EQ_F 0.1074 (0.0208) 5.16 2.42E-07 This table provides the results of the additive heritability analyses. h2 = Additive SNP heritability, SE = Standard error, P = P-value.

69

Table 4: Results of the partitioned heritability analyses for the EQ Category Prop Proph2 Prop Enrich Enrich Enrich P FDR < SNP h2 SE SE 5% base_0 1 1 0 1 0 NA NA Conserved_LindbladToh_0 0.03 0.46 0.14 17.52 5.40 2.07E-03 No H3K9ac_Trynka_0 0.13 0.60 0.18 4.78 1.43 9.17E-03 No Intron_UCSC.extend.500_0 0.40 0.58 0.07 1.46 0.19 1.08E-02 No PromoterFlanking_Hoffman.extend.500_0 0.03 0.28 0.10 8.24 3.10 1.59E-02 No UTR_5_UCSC.extend.500_0 0.03 0.22 0.09 8.08 3.27 3.24E-02 No FetalDHS_Trynka.extend.500_0 0.29 0.75 0.24 2.64 0.85 5.69E-02 No H3K27ac_PGC2.extend.500_0 0.34 0.63 0.17 1.88 0.50 7.89E-02 No H3K4me3_Trynka_0 0.13 0.40 0.16 2.99 1.20 9.68E-02 No DHS_peaks_Trynka_0 0.11 -0.40 0.33 -3.57 2.96 1.03E-01 No H3K4me3_peaks_Trynka_0 0.04 0.33 0.19 7.97 4.44 1.18E-01 No TSS_Hoffman.extend.500_0 0.03 0.19 0.10 5.33 2.89 1.38E-01 No PromoterFlanking_Hoffman_0 0.01 0.14 0.09 16.49 10.88 1.42E-01 No WeakEnhancer_Hoffman.extend.500_0 0.09 0.27 0.13 3.00 1.44 1.57E-01 No Enhancer_Hoffman_0 0.06 -0.19 0.18 -2.93 2.91 1.65E-01 No H3K9ac_peaks_Trynka_0 0.04 0.29 0.18 7.45 4.77 1.83E-01 No SuperEnhancer_Hnisz.extend.500_0 0.17 0.27 0.07 1.55 0.41 1.85E-01 No TFBS_ENCODE.extend.500_0 0.34 0.71 0.28 2.06 0.83 1.95E-01 No Intron_UCSC_0 0.39 0.50 0.09 1.29 0.24 2.07E-01 No Enhancer_Hoffman.extend.500_0 0.15 0.35 0.16 2.26 1.03 2.18E-01 No DGF_ENCODE_0 0.14 -0.25 0.34 -1.78 2.45 2.42E-01 No H3K9ac_Trynka.extend.500_0 0.23 0.42 0.17 1.83 0.72 2.48E-01 No TFBS_ENCODE_0 0.13 -0.14 0.27 -1.04 2.07 3.14E-01 No CTCF_Hoffman.extend.500_0 0.07 0.24 0.16 3.31 2.31 3.16E-01 No Transcribed_Hoffman.extend.500_0 0.76 0.63 0.15 0.82 0.19 3.60E-01 No Enhancer_Andersson_0 0.00 -0.05 0.07 -12.54 15.45 3.77E-01 No H3K4me3_Trynka.extend.500_0 0.26 0.41 0.18 1.60 0.72 3.96E-01 No SuperEnhancer_Hnisz_0 0.17 0.23 0.08 1.36 0.45 4.47E-01 No Coding_UCSC_0 0.01 0.08 0.08 5.36 5.72 4.53E-01 No Coding_UCSC.extend.500_0 0.06 0.13 0.09 2.02 1.42 4.73E-01 No Conserved_LindbladToh.extend.500_0 0.33 0.46 0.18 1.38 0.54 4.78E-01 No DGF_ENCODE.extend.500_0 0.54 0.68 0.23 1.26 0.42 5.39E-01 No Transcribed_Hoffman_0 0.35 0.47 0.22 1.37 0.64 5.55E-01 No

70

WeakEnhancer_Hoffman_0 0.02 -0.05 0.13 -2.28 6.12 5.87E-01 No TSS_Hoffman_0 0.02 -0.03 0.09 -1.74 5.20 5.94E-01 No H3K27ac_Hnisz.extend.500_0 0.42 0.49 0.14 1.16 0.32 6.32E-01 No Promoter_UCSC_0 0.03 0.08 0.11 2.49 3.60 6.80E-01 No DHS_Trynka.extend.500_0 0.50 0.58 0.23 1.17 0.46 7.13E-01 No H3K4me1_Trynka.extend.500_0 0.61 0.66 0.14 1.08 0.22 7.26E-01 No DHS_Trynka_0 0.17 0.06 0.34 0.36 2.03 7.50E-01 No FetalDHS_Trynka_0 0.08 0.15 0.27 1.76 3.17 8.11E-01 No H3K4me1_peaks_Trynka_0 0.17 0.11 0.27 0.63 1.56 8.13E-01 No H3K27ac_PGC2_0 0.27 0.31 0.20 1.17 0.73 8.20E-01 No Enhancer_Andersson.extend.500_0 0.02 0.00 0.09 0.20 4.51 8.60E-01 No CTCF_Hoffman_0 0.02 0.00 0.15 -0.03 6.21 8.67E-01 No H3K27ac_Hnisz_0 0.39 0.37 0.12 0.95 0.32 8.70E-01 No Repressed_Hoffman.extend.500_0 0.72 0.71 0.10 0.98 0.13 8.92E-01 No Repressed_Hoffman_0 0.46 0.49 0.26 1.07 0.57 8.98E-01 No UTR_3_UCSC.extend.500_0 0.03 0.02 0.08 0.62 3.07 9.01E-01 No Promoter_UCSC.extend.500_0 0.04 0.03 0.08 0.74 2.19 9.07E-01 No UTR_5_UCSC_0 0.01 0.01 0.05 2.02 9.38 9.13E-01 No H3K4me1_Trynka_0 0.43 0.40 0.26 0.94 0.61 9.19E-01 No UTR_3_UCSC_0 0.01 0.01 0.07 0.55 6.38 9.44E-01 No This table provides the results of the partitioned heritability analyses. h2 = Additive SNP heritability, SE = Standard error, P = P-value, Enrich = enrichment. Recent studies have identified an enrichment of associations in or near genes that are extremely intolerant to loss-of-function variation in schizophrenia (Singh et al., 2016), autism (Kosmicki et al., 2017; Samocha et al., 2014), and developmental disorders (McRae et al., 2017), conditions that are often accompanied by difficulties in social behaviour and empathy. We investigated if there was a significant enrichment in GWAS signal for the EQ in these genes that are extremely intolerant to loss-of-function variation. We identified a nominal, near 2 two-fold enrichment. These genes explain 19% of the proportion of h SNP but only 9% of the total SNPs lie in these genes (enrichment = 1.83±0.42; P = 0.044). The enrichment did not survive correction for multiple testing.

3.3.4 Sex differences Sex differences in empathy (Baron-Cohen, 2010) may reflect genetic as well as non- genetic factors (such as prenatal steroid hormones, and postnatal learning) (Auyeung, Lombardo, & Baron-Cohen, 2013). In our dataset, there was significant female advantage on

71

the EQ (P < 2x10-16 Cohen’s d = 0.65) (Figure 2). To investigate the biological basis for the sex-difference observed in the traits, we tested the heritability of the sex-stratified GWAS analyses for EQ. Our analyses revealed no significance difference between the heritability in the males-only and the females-only datasets (P = 0.48 for male-female difference in EQ) (Figure 2 and Table 3). Additionally, there was a high genetic-correlation between the males- -7 only and females-only GWAS (rg = 0.82±0.16, P = 2.34x10 ). This was not significantly different from 1 (P = 0.13, one-sided Wald Test), indicating a high degree of similarity in the genetic architecture of the phenotypes in males and females. We investigated the heterogeneity in the 11 SNPs of suggestive significance in both the sexes using Cochran’s Q-Test, and did not identify significant heterogeneity (Table 5).

Sex differences may also arise by differential expression of specific genes in different neural tissues at different developmental stages (Chen et al., 2016; Shi, Zhang, Su, Thompson, & Thiel, 2016). This could be due to multiple factors, including sex-specific transcription factors and sex-specific DNA methylation. We investigated this by performing enrichment analysis of the non-stratified GWAS in genes with higher expression in either males or females in cortical tissue samples. We did not identify a significant enrichment for either genes with higher expression in males (enrichment = 1.95±0.70, P = 0.17) or females (enrichment = 0.28±0.84, P = 0.39) (Appendix: Table 3).

72

Table 5: Male-female heterogeneity in effect sizes and direction SNP Chr Position Effect Other Females-only GWAS Males-only GWAS Heterogeneity allele allele (males-females) Beta SE P-value Beta SE P-value Q value P rs10265275 7 135775848 T C 0.02 0.13 9.03E-01 -0.74 0.15 6.37E-07 2 0.15 rs11264567 1 153875689 G A 0.07 0.12 5.93E-01 0.72 0.14 3.14E-07 1.6 0.20 rs1141090 11 13033155 C A 0.40 0.11 4.52E-04 0.47 0.13 2.74E-04 0.01 0.89 rs146838217 10 120542718 G C 0.38 0.14 6.36E-03 0.71 0.16 8.64E-06 0.35 0.55 rs189163756 2 143370827 C A -0.95 0.34 4.57E-03 -1.74 0.38 3.87E-06 0.86 0.35 rs201219357 15 69399212 I D -1.87 0.38 6.81E-07 -0.02 0.44 9.70E-01 4.22 0.04 rs201645977 16 67797575 I D 0.05 0.16 7.78E-01 -0.88 0.18 8.88E-07 2.51 0.11 rs2089401 16 67549653 G C 0.18 0.19 3.36E-01 -1.04 0.21 8.17E-07 3.74 0.05 rs4882760 12 128913894 T A -0.43 0.12 4.41E-04 -0.59 0.14 2.75E-05 0.09 0.75 rs75171949 5 101110228 G C -0.19 0.22 3.95E-01 -1.23 0.25 6.50E-07 2.35 0.12 rs76891664 2 143354362 G A -0.24 0.20 2.34E-01 -1.16 0.22 1.97E-07 2.02 0.15

73

3.3.5 Genetic correlations To investigate how the EQ correlates with psychiatric conditions, psychological traits and educational attainment, we performed genetic correlation (Methods) with six psychiatric conditions (autism, ADHD, anorexia nervosa, bipolar disorder, depression (major depressive disorder and the larger depressive symptoms dataset) and schizophrenia), six psychological traits (NEO-extraversion, NEO-openness to experience, NEO-conscientiousness, neuroticism, and subjective wellbeing), and educational attainment (a proxy measure of IQ, measured using years of schooling) (Table 6 and Figure 7).

With psychiatric conditions, three genetic correlations were significant following -4 Bonferroni correction: EQ-autism_iPSYCH (rg = -0.27±0.07, P = 2x10 ), EQ-schizophrenia -5 -4 (rg = 0.19±0.04; P = 1.36x10 ) and EQ-anorexia nervosa (rg = 0.32±0.09; P = 6x10 ). In contrast, we did not identify a significant negative correlation with the autismPGC dataset (rg = -0.08±0.06, P = 0.19), though the effect direction was concordant for both the autism datasets.

We note that there is a high genetic correlation between the two autism GWAS datasets (rg = 0.94±0.05, P < 2.2x10-16).

As anorexia nervosa is primarily diagnosed in women, and autism is primarily diagnosed in men, we further tested sex-specific correlations. After Bonferroni correction, we identified significant genetic correlations between the EQ-F and anorexia (rg = 0.48±0.12; P = -5 -4 8.46x10 ) and the EQ-M and autism_iPSYCH (rg = -0.3±0.08, P = 3x10 ).

With psychological traits, we identified one significant correlation after Bonferroni -8 correction: the EQ with extraversion (rg = 0.45±0.08; P = 5.76x10 ). Additionally, we identified two nominally significant correlations: the EQ with subjective wellbeing (rg = -3 -3 0.19±0.07; P = 7.8x10 ), and NEO-conscientiousness (rg = 0.39±0.14; P = 8.8x10 ). All three correlations were in the predicted direction as studies have identified a positive phenotypic correlation between all three phenotypes and the EQ (Bos et al., 2016; Melchers et al., 2016).

74

Figure 7: Genetic correlations between the EQ and other conditions

Mean and 95% confidence intervals shown for genetic correlations between empathy and other conditions. P-values are provided for nominally significant correlations. *indicates that the correlation was significant after Bonferroni correction.

75

Table 6A: Genetic correlation for the non-stratified GWAS

Phenotype rg SE P-value N PMID

Anorexia Nervosa* 0.33 0.09 6.00E-04 14477 28494655 ADHD -0.08 0.07 2.07E-01 55374 NA Autism_PGC -0.08 0.07 1.96E-01 16350 28540026 Autism_iPSYCH* -0.27 0.07 2.00E-04 19142 NA Bipolar disorder 0.16 0.07 2.16E-02 16731 21926972 Depressive symptoms 0.03 0.07 7.01E-01 161460 27089181 Extraversion* 0.46 0.08 5.76E-08 63030 26362575 Major depressive disorder 0.09 0.09 3.29E-01 18759 22472876 Neo-conscientiousness 0.39 0.15 8.80E-03 17375 21173776 Neo-openness to experience 0.15 0.13 2.73E-01 17375 21173776 Neuroticism -0.02 0.05 7.72E-01 170911 27089181 Schizophrenia* 0.19 0.04 1.36E-05 77096 25056061 Subjective well being 0.20 0.07 7.80E-03 298420 27089181 Years of schooling 2016 -0.06 0.03 7.98E-02 293723 27225129 This table provides the results of the genetic correlation with the non-stratified GWAS. *represents significant correlations after Bonferroni correction. rg = genetic correlation, N = sample size, PMID = Pubmed ID, SE = standard error

Table 6B: Sex stratified genetic correlations

Phenotype 1 Phenotype 2 rg SE P-value PMID EQ-F Anorexia 0.48 0.12 8.56E-05 28494655 EQ-M Anorexia 0.16 0.12 1.70E-01 28494655 EQ-F Autism_PGC -0.05 0.09 5.75E-01 28540026 EQ-M Autism_PGC -0.10 0.08 2.29E-01 28540026 EQ-F Autism_iPSYCH -0.20 0.09 3.20E-02 NA EQ-M Autism_iPSYCH -0.30 0.08 3.00E-04 NA This table provides the results of the genetic correlation with the stratified GWAS. *represents significant correlations after Bonferroni correction. rg = genetic correlation, PMID = Pubmed ID, SE = standard error

3.3.6 Bayesian genomic colocalization As there was a significant positive correlation between anorexia nervosa and the EQ, and schizophrenia and the EQ, we investigated if there are genomic regions that influence both phenotypes (colocalization) by estimating the bayesian posterior probability. We did not

76

identify any regions associated with empathy and either of the two conditions. The most significant region identified in this analysis was in Chr11p12, posterior probability = 0.78. The most significant SNPs in this region for both anorexia and empathy are intronic SNPs in the gene LRRC4C, which is implicated in excitatory synapse development (Seiradake et al., 2011; Y. S. Song, Lee, Prosselkov, Itohara, & Kim, 2013). Further, this gene is highly intolerant to loss-of-function mutations (probability of Loss-of-function Intolerance = 0.95). We did not identify any eQTLs in this region in neural tissues. These results are preliminary, and a cautious interpretation is warranted as the probability is influenced by the modest power of both the GWAS (Pickrell et al., 2016).

3.4 Discussion This is the first GWAS to investigate the genetic correlates of self-reported empathy. We identified four significant genetic correlations with the EQ and psychiatric conditions and psychological phenotypes (autism, anorexia nervosa, schizophrenia, and extraversion), providing insights into the shared genetic correlates. We identify one significant SNP associated with the EQ in the non-stratified GWAS, though this result must be interpreted with -6 caution. We further identified eleven SNPs of suggestive significance (P < 1x10 ). Males and females perform differently on the tests, but there was limited evidence of a sex-specific genetic architecture.

We identified a significant negative genetic correlation between the EQ and autism using the iPSYCH dataset. Several studies have identified lower self-reported empathy in individuals with autism, and our results mirror these studies(Baron-Cohen et al., 2014; Baron- Cohen & Wheelwright, 2004). Whilst the genetic correlation between the PGC_autism dataset and empathy was negative, this was not significant. The iPSYCH dataset has a few advantages over the autism PGC dataset. First, given that all the participants are from Denmark, the iPSYCH dataset is drawn from a genetically more homogeneous cohort in comparison to the autism PGC dataset. Second, the autism PGC dataset may have a loss in statistical power due to cohort heterogeneity (study design, ancestry differences, differences in sex ratio etc) in the meta-analysis. Third, the autism_iPSYCH dataset is larger than the autism_PGC dataset and has more statistical power (mean LDSR χ2 = 1.1 and 1.2 for the autism_PGC dataset and the autism_iPSYCH autism dataset respectively, limited inflation due to uncorrected population stratification for both datasets). Fourth, 83% (13580 out of 16350) of the GWAS sample consists of case-pseudocontrols in the PGC autism dataset, which reduces the statistical power further, and does not necessarily take into account subthreshold polygenic risk in parents

77

(Peyrot et al., 2016). Given all of this, we think that the iPSYCH dataset is a better representation of the autism risk in the general population and is statistically better powered.

We also identified significant genetic correlations for the EQ with schizophrenia and anorexia nervosa. The empirical literature in general report deficits in cognitive empathy (Bora et al., 2008; Horan et al., 2015), but preserved or stronger affective empathy (Horan et al., 2015; Lehmann et al., 2014) and emotional contagion/personal distress (Lehmann et al., 2014) in individuals with schizophrenia compared to controls. Studies with anorexia nervosa, on the other hand, have yielded mixed results. Some studies suggest preserved empathy (Hambrook, Tchanturia, Schmidt, Russell, & Treasure, 2008), some identify reduced cognitive empathy(Brewer, Cook, Cardi, Treasure, & Bird, 2015; Harrison, Sullivan, Tchanturia, & Treasure; Morris et al., 2014), and others identify greater emotional contagion/personal distress (Beadle, Paradiso, Salerno, & McCormick, 2013) in individuals with anorexia nervosa compared to controls. These studies are typically conducted in small samples, which may explain the different results in these heterogeneous conditions. Our results identify shared genetics between empathy and risk for schizophrenia and anorexia nervosa; the latter remained significant even after using the females-only EQ dataset. In Chapter 4, we report a significant genetic correlation between cognitive empathy (measured using the Eyes Test (Baron-Cohen, Wheelwright, Hill, et al., 2001) and anorexia nervosa, underscoring the importance of empathy as a genetic risk factor in anorexia nervosa. However, both cognitive empathy and anorexia nervosa are positively correlated with educational attainment (Duncan et al., 2017) (Chapter 4), and it is possible that the correlation between cognitive empathy and anorexia nervosa may be mediated by educational attainment. Here, self-reported empathy is not correlated with educational attainment. Further, while cognitive empathy was not correlated with schizophrenia, self-reported empathy was positively and significantly correlated with schizophrenia, suggesting distinct roles for the two phenotypes in pathology. Schizophrenia and anorexia share significant positive genetic correlation (rg = 0.23±0.06) (Duncan et al., 2017), and it is possible that the pleiotropy between these two conditions may, in part, be mediated by genetic variants that contribute to empathy. This needs to be tested. The modest power of the remaining psychiatric GWAS precludes the identification of significant genetic correlations with self-reported empathy. Together with the GWAS on cognitive empathy (Chapter 4), this study provides evidence for the distinct roles of different social processes in various psychiatric conditions.

78

Investigating genetic correlation results with psychological phenotypes and measures of cognition further helped elucidate the genetic correlates of self-reported empathy. The EQ was significantly correlated with extraversion and nominally correlated with subjective wellbeing and conscientiousness. Both extraversion and conscientiousness correlate with empathy (Melchers et al., 2016) which, in turn, contributes to subjective wellbeing (Bos et al., 2016). Of the five personality factors, extraversion, conscientiousness and agreeableness have modest correlations with self-reported empathy (Melchers et al., 2016). We did not test for genetic correlation with agreeableness due to the low heritability of the trait. The direction of our genetic correlation results mirror observed phenotypic correlations and provide additional evidence for the positive role of self-reported empathy in subjective wellbeing.

This is also the first study to provide estimates of additive heritability explained by all the SNPs tested for self-reported empathy, and approximately 11% of variance was explained by SNPs. One study, investigating the heritability of the reduced EQ (18 items) in 250 twin pairs, identified a heritability of 0.32 (Hatemi et al., 2015). The literature on the heritability of empathy and prosociality is inconsistent, with heritability estimates ranging from 0.20 (Davis et al., 1994) to 0.69 (Knafo-Noam, Uzefovsky, Israel, Davidov, & Zahn-Waxler, 2015), although a meta-analysis of different studies identified a heritability estimate of 0.35 (95% CI – 0.21 – 0.41) (Knafo-Noam & Uzefovsky, 2013). Our analysis therefore suggests that a third of the heritability can be attributed to common genetic variants. Like IQ (Bouchard, 2013), the heritability of empathy and prosociality behaviour changes with age (Davis et al., 1994). We did not investigate the effect of age on heritability in our study.

We did not find any significant differences in heritability between males and females. Further, the male-female genetic correlations for both the phenotypes were high. Despite the high genetic correlation, sex-specific correlations with anorexia nervosa were significant only for the females-only dataset. This suggests that the sex-specific genetic component of empathy can contribute differentially to psychiatric conditions. Several other factors may explain the observed phenotypic sex difference. For example, genetic variants for empathy may be enriched in sex-specific gene expression pathways. We conducted preliminary analysis by investigating if there is an enrichment in sex-differentially expressed genes in cortical tissue samples, but did not find significant enrichment. However, sex-specific gene expression is a dynamic process with both spatial and developmental differences (Chen et al., 2016; Shi et al., 2016). Investigating across different tissues and developmental time points in well powered gene expression datasets will help better understand sex differences in empathy.

79

There are a few limitations that need to be taken into consideration in interpreting our results. The EQ is a self-report measure, and while it has excellent psychometric properties and construct validity, it is unclear how much of the intrinsic biological variation in this phenotype is captured by it. Further, while this is the largest GWAS to date of self-reported empathy, it still is only modestly powered, reflected in our inability to identify genome-wide significant loci. This modest statistical power influences subsequent analysis, and we highlight several nominally significant results for further investigation in larger datasets.

In conclusion, the current study provides the first narrow sense heritability for empathy. While there is a highly significant difference on the EQ between males and females, heritability is similar, with a high genetic correlation between the sexes. We also identified significant genetic correlations between empathy and some psychiatric conditions and psychological phenotypes, including autism. This global view of the genomic architecture of empathy will allow us to better understand psychiatric conditions, and improve our knowledge of the biological bases of neurodiversity in humans. In Chapter 4, we turn to a GWAS of a performance test of cognitive empathy, to see how the genetic correlates of this autism-related phenotype differs to that underpinning self-reported empathy.

80

4. Genome-wide association meta-analysis of Cognitive Empathy

4.1 Introduction Cognitive empathy, defined as the ability to recognize what another person is thinking or feeling, and to predict their behaviour based on their mental states, is vital for interpersonal relationships, which in turn is a key contributor of wellbeing. Cognitive empathy is distinct from affective empathy, the latter of which is defined as the drive to respond to another’s mental states with an appropriate emotion (Baron-Cohen, Wheelwright, Hill, et al., 2001). Difficulties in cognitive empathy have been found in different psychiatric conditions, particularly autism (Decety & Moriguchi, 2007). The dissociation between cognitive and affective empathy (the latter is often intact in autism, for example, whilst it is invariably impaired in antisocial personality disorder) suggests these have independent biological mechanisms.

Differences in cognitive empathy have been identified in individuals with psychiatric conditions such as autism (Baron-Cohen et al., 2015), schizophrenia (Bora et al., 2008; Michaels et al., 2014), and anorexia nervosa (Calderoni et al., 2013). This includes either elevated or reduced cognitive empathy in comparison to neurotypical controls, either of which can contribute to difficulties in social interactions and wellbeing (Tone & Tully, 2014). However, although such alterations in cognitive empathy in psychiatric conditions are well established, little is known about the genetic correlates of cognitive empathy. For example, it is unclear to what extent differences in cognitive empathy are a genetic risk factor for developing various psychiatric conditions. Furthermore, as previous studies have often used self-report or performance tests, results from these studies may be influenced by the characteristics of the test and/or factors associated with the psychiatric conditions themselves. There is a need to use more objective, performance tests. In sum, from previous studies, it is difficult to tease apart the genetic and non-genetic contributions to performance in cognitive empathy, and how these relate to various psychiatric conditions.

To address this gap, here we investigate the genetics of this aspect of social cognition using a well-validated test, the ‘Reading the Mind in the Eyes’ Test (Eyes Test). The Eyes Test is a brief online test where participants are shown photographs of the eye regions and have to identify the appropriate emotion or mental state they express (Baron-Cohen, Wheelwright, Hill,

81

et al., 2001). It has been widely used to investigate differences in cognitive empathy in a range of neuropsychiatric conditions including autism (Baron-Cohen et al., 2015), schizophrenia (Lam, Raine, & Lee, 2014), bipolar disorder (Cusi, Macqueen, & McKinnon, 2012), anorexia nervosa (Tapajóz Pereira de Sampaio, Soneira, Aulicino, & Allegri, 2013), and major depressive disorder (Berlim, McGirr, Beaulieu, & Turecki, 2012). The NIMH Research Domain Criteria (RDoC) lists the Eyes Test as one of several important tests for characterizing variation in social processes, under the category of Perception and Understanding of Others (http://1.usa.gov/1Qs6MdI) (Lombardo et al., 2016). We conducted a genome-wide association meta-analysis of cognitive empathy in more than 89,000 individuals of European ancestry, and investigated both SNP-based and twin-based heritabilities. We further conducted bivariate genetic correlation analyses for psychiatric conditions, psychological phenotypes, and brain volumes. We finally conducted gene based enrichment analysis and investigate potential genetic sources of sex differences.

4.2 Methods 4.2.1 Participants 23andMe: Research participants were customers of 23andMe, Inc., and have been described in detail elsewhere (Do et al., 2011; Tung et al., 2011). All participants completed an online version of the ‘Reading the Mind in the Eyes’ test (Eyes Test) (Baron-Cohen, Wheelwright, Hill, et al., 2001) online on the 23andMe research participant website (36 items). In total, 88,056 participants (44,574 females and 43,482 males) of European ancestry completed the Eyes Test and were genotyped. All participants provided informed consent and answered questions online according to 23andMe’s human subjects protocol, which was reviewed and approved by Ethical & Independent Review Services, an AAHRPP-accredited private institutional review board (http://www.eandireview.com). Only participants who were primarily of European ancestry (97% European Ancestry) were selected for the analysis using existing methods (Eriksson et al., 2012). Unrelated individuals were selected using a segmental identity-by-descent algorithm (Henn et al., 2012).

Brisbane Longitudinal Twin Study (BLTS): In addition, 1,497 participants (891 females and 606 males) of Caucasian ancestry with genotype data from the BLTS completed the short version (14 questions) of the Eyes Test online as part of a study on genetic and environmental foundations of political and economic behaviors (Hatemi et al., 2015). Participant ages ranged from 18 to 73 (Mean = 37, s.d. = 14). All participants provided online consent and the study was approved by the QIMR Berghofer Human Research Ethics Committee. Twin heritability

82

was estimated from 749 twin individuals (including 122 complete monozygotic pairs and 176 complete dizygotic pairs).

4.2.2 Measures The ‘Reading the Mind in the Eyes’ Test (Eyes Test) is a brief questionnaire of cognitive empathy. Participants are shown scaled, black and white photographs of eye regions of actors and they have to choose the cognitive state portrayed from the four options provided. The Eyes Test has good test-retest reliability (Baron-Cohen et al., 2015; Fernández-Abascal, Cabello, Fernández-Berrocal, & Baron-Cohen, 2013; Vellante et al., 2013), and scores are unimodally and near-normally distributed in the general population. In the BLTS dataset, there was a modest test-retest correlation of 0.47 in 259 participants who retook the test after a gap of nearly two years. For each correct answer on the Eyes Test, participants score 1 point, so the scores ranged from 0 – 36 on the full version of the Eyes Test and 0 – 14 on the short version of the Eyes Test.

4.2.3 Genotyping, imputation and quality control 23andMe cohort: Complete details of genotyping of the 23andMe cohort is provided in Section 3.2.3.

BLTS cohort: The BLTS participants were genotyped on Illumina Human610-Quad v1 or HumanCoreExome-12 v1 chips. These samples were genotyped in the context of a larger genome-wide association project. Genotype data was screened for genotyping quality (GenCall < 0.7 from the Human610-Quad v.1 chip), individual and SNP call rates (< 0.95 and < 0.99 for exome markers on the HumanCoreExome-12 v1.0 chip), Hardy-Weinberg Equilibrium (P < 10-6), and MAF (< 0.01). The data were checked for non-European ancestry, pedigree, sex, and Mendelian errors. Data from the two different chips were separately phased using SHAPEIT2 and imputed to the 1000 Genomes reference panel (Phase 1 v3) using Minimac3 . After imputation SNPs with a MAF < 0.05% were excluded, leaving 11,133,794 SNPs for analyses. We further excluded SNPs with imputation r2 < 0.6 for meta-analysis.

4.2.4 Association analyses Linear regression for the 23andMe cohort was performed for the Eyes Test scores using age, sex, and the first four ancestry principal components as covariates. For the sex-stratified analyses, sex was excluded as a covariate. The same regression model was used for the BLTS after accounting for relatedness using RAREMETALWORKER. Inverse variance weighted

83

meta-analysis was performed using Metal (Willer, Li, & Abecasis, 2010). Post meta-analysis, we excluded SNPs that were only genotyped in the BLTS cohort due to the small sample size, but included SNPs that were only genotyped in the 23andMe cohort. LD pruning was performed using Plink (Purcell et al., 2007) with an r2 of 0.1. We calculated the variance explained by each individual SNP (Hibar et al., 2015) using the following formula:

2 2 푅푔|푐 푡 2 = × 100 1 − 푅푐 푛 − 푘 − 1

2 푅푔|푐 2 is the proportion of variance explained by the SNP after accounting for the effects 1−푅푐 of the covariates, t is the t-statistic of the regression co-efficient, k is the number of covariates, and n is the sample size. We corrected for winner’s curse using an FDR based approach (Bigdeli et al., 2016).

4.2.5 Heritability and genetic correlation We used the intercept from Linkage Disequilibrium Score regression (LDSR) to calculate genomic inflation in the meta-analysis due to population stratification (Bulik- Sullivan, Loh, et al., 2015) (https://github.com/bulik/ldsc). SNP heritability and genetic correlation were calculated using LDSR. Difference in heritability between males and females was tested using:

2 2 ℎ푚푎푙푒푠 − ℎ푓푒푚푎푙푒푠 푍 = 2 2 √푆퐸푚푎푙푒푠 + 푆퐸푓푒푚푎푙푒푠

2 Where Z is the Z score for the difference in heritability for a phenotype, (h males - 2 h females) is the difference SNP heritability estimate in males and females, and SE is the standard errors for the heritabilities. We calculated two-tailed P-values in R. We performed genetic correlation using summary GWAS data using LDSR. For all genetic correlation analyses, we used LD data from the North West European population as implemented in LDSR. Intercepts were not constrained in the analyses. We used Bonferroni correction to correct for multiple testing in the genetic correlation. We note that Bonferroni correction is likely to be conservative due to the reasonably high degree of phenotypic and genetic correlations between the phenotypes tested.

84

For genetic correlations, we used summary GWAS data for schizophrenia, bipolar disorder (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013), autism (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013), anorexia (Duncan et al., 2017), anxiety (Otowa et al., 2016), and depression (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013), that were downloaded from the Psychiatric Genomics Consortium website (http://www.med.unc.edu/pgc/downloads). Summary GWAS data for educational attainment measured through number of college years (Rietveld et al., 2013), educational attainment (Okbay, Beauchamp, et al., 2016), and cognitive aptitude (Rietveld et al., 2014) were downloaded from the Social Science Genetic Association Consortium website (http://ssgac.org/Data.php). Cognitive aptitude is measured independent of knowledge of facts and words. Though the mental-state words provided in the Eyes Test are fairly common, we cannot completely discount the fact that word-knowledge may facilitate better performance on the test. Summary GWAS data for personality traits (de Moor et al., 2012, 2015; van den Berg et al., 2016) were downloaded from the Genetics of Personality Consortium website: http://www.tweelingenregister.org/GPC/. Data for subcortical brain volumes (Hibar et al., 2015) were downloaded from the ENIGMA consortium website (http://enigma.ini.usc.edu/download-enigma-gwas-results/). We did not include data for amygdala volume in the analysis due to non-significant heritability estimates using LDSC. In addition, we used data for empathy measured using the Empathy Quotient, and data for Systemizing measured using the Systemizing Quotient from 23andMe, Inc. Data for the Borderline Personality Features GWAS were obtained from the authors of the paper (Lubke et al., 2014).

4.2.6 Twin Heritability Twin heritability was estimated from 749 twin individuals (including 122 complete monozygotic pairs and 176 complete dizygotic pairs) in the BLTS using full information maximum likelihood in OpenMx (Boker et al., 2011) in R, which makes use of all available data. All twins completed the short version of the Eyes Test, and for those who completed the test twice only their first attempt was included in analyses. ADE, ACE, AE, CE, and E models were fit to the data and fit indices compared to determine the best-fitting model. Standardised variance components are reported from the best-fitting model, the AE model and, for completeness, from the ADE and ACE models.

85

4.2.7 Gene-based analyses and sex difference analyses We used MetaXcan (Barbeira et al., 2016) using tissue weights from the GTEx to perform gene-based analysis (https://github.com/hakyimlab/MetaXcan). MetaXcan uses summary statistics to perform gene based association analyses. It incorporates eQTL data from the GTEx consortium to infer gene level expression based on the summary GWAS statistics provided. This can be used to identify tissue-specific gene expression for the phenotype of interest. Here, we performed gene based analysis for the non-stratified GWAS meta-analysis for nine neural tissues: anterior cingulate cortex (BA24), caudate basal ganglia, cerebellar hemisphere, cerebellum, cortex, frontal cortex (BA9), hippocampus, hypothalamus, nucleus accumbens basal ganglia, and putamen basal ganglia, using gene-expression regression coefficients for these tissues from the GTEx project. This is based on tissues from 73 – 103 individuals. We chose neural tissues as cognitive empathy can be assumed to be a neural phenotype. As MetaXcan predicts expression level from SNP information, we filtered out genes whose correlation with predicted models of expression was < 0.01, as incorporated in MetaXcan. This helps guard against false positives, by removing genes whose expressions are poorly predicted by the model. We used an FDR based correction to correct for all the tests run across all the tissues.

For sex-difference analysis, we ran MetaXcan on the sex-stratified analyses only for the cortical tissues. We focussed on the cortical tissue as it was relevant for the phenotype investigated and we had access to the list of sex-differentially expressed genes only from the cortex (Werling, Parikshak, & Geschwind, 2016). To check for overlap, we ran hypergeometric tests.

Overlap between sexes: First, to identify overlap between the sexes for the phenotype, we identified nominally significant genes (P < 0.05) in the two sexes separately and checked for overlap among these lists after pruning the background gene-lists to a common set of genes for both the sexes. We used a program available online to calculate both the overlap and the P- value of the overlap available here: http://nemates.org/MA/progs/overlap_stats.html. The test performed is a normal approximation of an exact hypergeometric test.

Hypergeometric tests are usually performed using 4 different lists. Let a = list of genes in set a; b = list of genes in set b; x = list of overlapping genes in sets a and b, i.e. a intersection b, and

86

n = total list of background genes (note, this is different and usually larger than a union b).

Sets a, b, and x must be subsets of set n.

To identify overlap between sexes, a was the number of nominally significant genes in males- only Eyes Test GWAMA identified using MetaXcan, b was the number of nominally significant genes in the females-only Eyes Test GWAMA identified using MetaXcan, x was the overlapping genes between the two sets, and n was the set of common genes in the gene- based analyses of the males-only GWAMA and the gene based-analysis of the females-only GWAMA.

Sex-differentially expressed enrichment analyses: We performed hypergeometric tests to investigate if nominally significant genes for the Eyes Test in the sex-stratified GWAS are enriched for sex-differentially expressed genes. We wanted to check if genes that are nominally significant for the sex-stratified GWAMA of the Eyes Test are significantly enriched for genes that have sex-differential expression in the cortex.

To conduct this analysis, we first used MetaXcan to conduct gene-based association for the two sex-stratified GWAS using tissue weights from the cortex tissue in the GTEx project. This generated a list of 5951 genes with P-values for the males-only GWAS and 6071 genes with P-values for the females only GWAS. We also used a list of sex-differentially expressed genes identified in the Cortex from Werling et al., 2016. We included only autosomal genes with a fold-difference >1, regardless of the P-value. We identified a list of common genes that were identified by both MetaXcan and were investigated in Werling et al., 2016, and this common set of genes were used as the background gene list n. From this list of n, we defined sets a, b, and x.

For set a, we used genes with P < 0.05 in the gene-based association using MetaXcan.

For set b, using a list of sex-differentially expressed genes identified in the Cortex from Werling et al., 2016, we identified all genes with a fold-difference of greater than 1. x was the intersection between sets a and b.

We performed four different enrichment analyses (Eyes Test-male: male-expressed; Eyes Test- female: male-expressed, Eyes Test-female: female-expressed; and Eyes Test-female: male- expressed), and used a P-value threshold of 0.025 to identify any significant enrichment.

87

4.2.8 Data Availability Summary level data may be requested from 23andMe, Inc. and received subject to 23andMe's standard data transfer agreement.

4.3 Results 4.3.1 Study overview In collaboration with 23andMe, Inc. and the Brisbane Longitudinal Twin Study (BLTS) cohort, we conducted three separate genome-wide association study meta-analyses (GWAMAs) of the Eyes Test: a males-only GWAS (n = 43,482), a females-only GWAS (n = 44,574), and a non-stratified GWAS (n = 88,056). The study protocol is provided in Figure 1. All participants from the 23andMe cohort completed the full version of the Eyes Test online, comprising 36 questions (mean score = 27.47, s.d. = 3.67), while participants from the BLTS cohort completed the short version of the Eyes Test (14 questions, mean = 8.85, s.d. = 2.34). Scores on the Eyes Test were significantly associated with age and sex in the 23andMe cohort (age: -0.026±0.0007; P < 2.2x10-16, sex (females): 0.77±0.02; P < 2.2x10-16).

88

Figure 1: Schematic diagram of the study protocol

88,056 Caucasian participants from 23andMe, Inc. completed the full version of the Eyes Test and were genotyped. An additional 1,497 Caucasian participants from the Brisbane Longitudinal Twin Study completed the short version (14 questions) of the Eyes Test and genotyped. Genome-wide association meta-analysis was performed on the combined cohort of 89553 participants. Three separate meta- analyses were performed: males-only, females-only, and non-stratified. Subsequently, functional enrichment and gene-based analysis was performed for the non-stratified meta-analysis GWAS using the 23andMe dataset. SNP heritability and genetic correlation using LDSC was performed for the 23andMe GWAS dataset. Sex differences were also investigated using the same dataset. In parallel, twin heritability was calculated from 749 twin individuals from the Brisbane Longitudinal Twin Study who had completed the short version of the Eyes Test.

89

4.3.2 Phenotypic properties of the short version of the Eyes Test All participants from the BLTS completed one of two versions of the short-version of the Eyes Test. 580 participants completed V1 (July 2008 to December 2009) of the short version of the Eyes Test, and 1141 participants completed V2 (July 2010 to November 2011) of the short version of the Eyes Test, totalling 1716 participants. Of these, 127 participants were not included in final analysis as they were not genotyped, 47 were not included as they were of non-Caucasian ancestry, and 45 were not included as they were missing data on the age covariate. 259 participants completed both V1 and V2 of the short version of the test. For these participants, we used scores from V1 of the test for the analysis to avoid a learning bias.

14 questions were common to both the versions, and so, the final short version of the Eyes Test had only these 14 questions (Table 1). In V1 of the test, participants had to choose the right answer from four different options describing various mental states. In V2, an additional ‘don’t know’ option was provided as the fifth option. Both the images and the four options describing various mental states were the same across all three tests (complete Eyes Test, short Eyes Test V1, short Eyes Test V2). Scores on the short-version of the Eyes Test were unimodally and near-normally distributed. We visually inspected both the frequency histogram and the quantile-quantile plot (Figure 2) to determine the normalcy of the distribution. In addition, the measure of skewness (-0.44) and excess kurtosis (0.068) were within the acceptable range of ± 1 of a normal distribution.

Figure 2: Frequency histogram (left) and Quantile-quantile plot of the scores on the short version (V2) of the Eyes Test.

90

Table 1: Questions used in the three different versions of the Eyes Test Sex of % Question the Full Short Short correct1 Number Option 1 Option 2 Option 3 Option 4 actor version V1 V2# example jealous panicked arrogant hateful M example X 1 playful comforting irritated bored M X 2 terrified upset arrogant annoyed M X X X 71 3 joking flustered desire convinced F X 4 joking insisting amused relaxed M X X X 69 5 irritated sarcastic worried friendly M X 6 aghast fantasizing impatient alarmed F X X 7 apologetic friendly uneasy dispirited M X 8 despondent relieved shy excited M X 9 annoyed hostile horrified preoccupied F X 10 cautious insisting bored aghast M X 11 terrified amused regretful flirtatious M X 12 indifferent embarrassed sceptical dispirited M X 13 decisive anticipating threatening shy M X 14 irritated disappointed depressed accusing M X X X 55 15 contemplative flustered encouraging amused F X 16 irritated thoughtful encouraging sympathetic M X X X 83 17 doubtful affectionate playful aghast F X 18 decisive amused aghast bored F X X X 61 19 arrogant grateful sarcastic tentative F X 20 dominant friendly guilty horrified M X 21 embarrassed fantasizing confused panicked F X X X 78 22 preoccupied grateful insisting imploring F X 23 contented apologetic defiant curious M X X X 39 24 pensive irritated excited hostile M X 25 panicked incredulous despondent interested F X 26 alarmed shy hostile anxious M X X X 58 27 joking cautious arrogant reassuring F X X 28* interested joking affectionate contented F X X X 57 29 impatient aghast irritated reflective F X X X 68 30 grateful flirtatious hostile disappointed F X 31 ashamed confident joking dispirited F X 32 serious ashamed bewildered alarmed M X X X 72 33 embarrassed guilty fantasizing concerned M X 34 aghast baffled distrustful terrified F X X X 61 35 puzzled nervous insisting contemplative F X X X 60 36 ashamed nervous suspicious indecisive M X X X 75 1 Number of participants who chose the correct option in the BLTS Cohort. Data unavailable for the 23andMe cohort. * The second option changed to ‘insisting’ in V1 and V2 of the short version of the test. #In addition, all questions in the V2 of the short version of the test had an additional ‘Don’t Know’ option.

91

We next calculated if the valence of the items was significantly different between the two versions of the test (full version and short version). We divided all the items into three different valences : Positive, negative, and neutral (Harkness, Sabbagh, Jacobson, Chowdrey, & Chen, 2005). In the full version of the test, there were 8 positive items, 12 negative items, and 16 neutral items. In the short version of the test, there were 3 positive items, 7 negative items, and 4 neutral items, indicating an excess of negative items and a deficit of neutral items. A chi-square test did not indicate that there was a significant difference in the valence of the two versions of the test. However, we cannot completely rule out that the difference in valences between the two phenotypes can affect the genetic architecture between the two different GWAS datasets.

We next investigated the correlation between the short Eyes Test and the full Eyes Test (adult version). To do this, we used data from control participants from the Cambridge Autism Research Database (CARD). We identified individuals who had completed the full version of the Eyes Test, did not indicate that they had a psychiatric diagnosis, did not have anyone in the immediate family (parents, siblings, and children) with an autism diagnosis, and were above fifteen years of age. We excluded participants who had more than 3 missing answers (i.e. > 10% missing). In total, we had 855 participants who met our criteria (276 males and 579 females). Participant ages ranged from 16 – 81 years. For each participant, using data from the same test, calculated two sets of scores: a score using all 36 questions (full Eyes) and a score for 14 questions found in the short Eyes Test (short Eyes). There was a highly significant correlation between the two scores (r = 0.77; P < 2.2x10-16). Ideally, participants should complete two different versions of the Eyes Test, and the correlation must be calculated between the two different versions. However, we did not have access to these data. There was a unimodal and near-normal distribution for both the datasets as measured using visual inspection of frequency histograms and quantile-quantile plots.

4.3.3 Genome-wide association meta-analyses GWAMA of the non-stratified and the males-only datasets did not identify any significant loci. In the females-only analysis, we identified one locus at 3p26.2 that was significant at a threshold of P < 5x10-8. This locus contains 21 significant SNPs in high LD with the leading SNP rs7641347 (Figure 3), with concordant effect direction for 19 SNPs in -8 the 23andMe and BLTS datasets. The leading SNP rs7641347 (Pmeta = 1.58 x 10 ) explained 0.067% of the total variance, or 0.013% of the total variance after correcting for winner’s curse (Bigdeli et al., 2016). Of the two SNPs with discordant effects in the two datasets, rs114076548

92

was the most-significant SNP in the 23andMe dataset and had P = 6.49x10-9. We did not identify any inflation in the P-values of the GWAMA due to population stratification using LDSR (intercept = 1.01 ± 0.007). The intercept for the non-stratified GWAMA was 1.01 (0.006), for the males-only GWAMA was 1.006 (0.006), and for the females-only GWAMA was 1.005 (0.006).

Figure 3: Manhattan plot and regional association plot for the Eyes Test (females) meta- analysis GWAS

A. Manhattan plot of the Eyes Test meta-analysis (female). X axis is the chromosomal position of the SNP, and Y axis is the negative logarithm of the P-value. The red line indicates genome-wide significant -8 -6 threshold of 5x10 . Lead SNP for all loci with P < 1x10 is provided. n = 44,574, and λgc = 1.05. LDSR intercept = 1.05. Regional association plot of the significant locus for the Eyes Test (females) meta-analysis.

The leading SNP (rs7641347) is located in an intron of SUMF1 and was nominally -5 significant in the non-stratified analysis (Pmeta = 1.1x10 ), but non-significant in the males- 2 only analysis (Pmeta = 0.49). In addition, SNPs in high LD (r > 0.8) were also not nominally significant in the males-only analysis. Together, all 21 SNPs span a region of approximately 77kb of 3p26.2 (Table 2). At this locus, in addition to SUMF1, 2 other genes are present: Leucine Rich Neuronal 1 (LRRN1) and SET Domain and Mariner Transposase Fusion Gene (SETMAR). LRRN1 is highly expressed in brain tissues (Lonsdale et al., 2013), with median expression the highest in the putamen, nucleus accumbens and the caudate nucleus, all three of which are part of the striatum. Deletion of 3p26.1 and 3p26.2 can cause developmental delay, hypotonia and epileptic seizures and has been implicated in autism (Pinto et al., 2014).

93

Table 2: All SNPs with P < 1x10-6 from the females-only meta-analysis

freq Meta-analysis Discovery Replication SNP EA OA EA Chr BP Effect SE P (E-08) Effect SE P Effect SE P rs1488445 a g 0.97 3 3920844 0.41 0.08 4.01E-08 0.41 0.08 5.73E-08 0.34 0.40 0.39 rs1843296 c g 0.97 3 3924207 0.41 0.08 3.44E-08 0.41 0.08 5.46E-08 0.40 0.43 0.34 rs1485203 c g 0.03 3 3924580 -0.41 0.08 3.41E-08 -0.42 0.08 5.40E-08 -0.40 0.42 0.34 rs4685699 a g 0.03 3 3925167 -0.41 0.08 3.34E-08 -0.42 0.08 5.25E-08 -0.40 0.42 0.34 rs1488448 a g 0.03 3 3926918 -0.41 0.07 3.16E-08 -0.41 0.08 5.29E-08 -0.42 0.42 0.31 rs116426915* t c 0.02 3 3928282 0.48 0.09 3.42E-08 0.50 0.09 1.40E-08 -0.12 0.46 0.78 rs4685704 a g 0.03 3 3930561 -0.41 0.07 3.10E-08 -0.41 0.08 5.13E-08 -0.41 0.41 0.32 rs2218422 a c 0.97 3 3931044 0.41 0.07 1.65E-08 0.41 0.07 2.56E-08 0.38 0.42 0.35 rs1844120 a g 0.97 3 3933270 0.43 0.08 3.22E-08 0.43 0.08 4.99E-08 0.38 0.41 0.35 rs981190 a t 0.97 3 3934506 0.43 0.08 1.91E-08 0.43 0.08 2.55E-08 0.33 0.42 0.43 rs73114886 t c 0.97 3 3935964 -0.42 0.08 4.03E-08 -0.43 0.08 3.66E-08 -0.17 0.42 0.68 rs7641347 t c 0.03 3 3936632 0.42 0.07 1.58E-08 0.42 0.08 2.55E-08 0.39 0.41 0.33 rs9799040 c g 0.03 3 3938639 -0.42 0.08 3.89E-08 -0.43 0.08 3.55E-08 -0.17 0.42 0.68 rs73114890 t c 0.97 3 3941990 -0.42 0.08 4.03E-08 -0.43 0.08 3.61E-08 -0.17 0.42 0.69 rs17038229 t g 0.97 3 3943462 -0.42 0.08 4.45E-08 -0.43 0.08 3.98E-08 -0.16 0.42 0.69 rs901041 c g 0.97 3 3950030 0.42 0.07 2.29E-08 0.43 0.08 2.11E-08 0.18 0.42 0.67 rs146190050 c g 0.03 3 3952266 -0.42 0.08 3.09E-08 -0.42 0.08 2.84E-08 -0.18 0.42 0.67 rs181756740 a g 0.97 3 3955867 0.43 0.08 3.31E-08 0.44 0.08 2.80E-08 0.15 0.42 0.72 rs61253298 c g 0.03 3 3956801 -0.43 0.08 3.07E-08 -0.44 0.08 2.59E-08 -0.15 0.42 0.72 rs17038393 t c 0.03 3 3961721 0.42 0.08 3.40E-08 0.43 0.08 2.69E-08 0.12 0.42 0.77 rs114076548* c g 0.99 3 4017677 0.58 0.10 2.45E-08 0.62 0.11 6.49E-09 -0.14 0.45 0.75

EA: Effect allele, OA: Other allele, Chr = chromosome, BP = position, *represents SNPs with discordant effect direction between the discovery and the replication cohorts.

94

The most significant SNP in the males-only GWAS meta-analysis (rs4300633 in 16p12.3, P = 9.11x10-8) explained 0.062% of the variance, and the most significant SNP in the non-stratified GWAS meta-analysis (rs149662397 in 17q21.32 P = 1.58x10-7) explained only 0.029% of the variance. All LD pruned SNPs in the three GWAMA analyses with P < 1x10-6 are provided in Table 3. The QQ-plots for all the GWAS are provided in Figure 4. Manhattan plots for the non-stratified and the males-only GWAS are provided in Figure 5. QQ-plots for the most significant SNPs in the non-stratified and the males-only GWAS provided in Figure 6.

We also investigated the similarity between the two GWAS datasets (BLTS and 23andMe), by investigating the direction of effects for all independent nominally significant (P < 0.05) SNPs. We calculated the proportion of SNPs with concordant effect direction in the two datasets in the stratified and the non-stratified GWAS datasets, and quantified the significance using 1-sided binomial sign test. For the non-stratified analyses, 65% of the SNPs had a concordant effect direction, 66% for the males-only analyses, and 70% for the females- only analyses. All sign tests were significant (P < 2.2x10-16 for all three binomial sign tests).

Gene-based analyses MetaXcan (Barbeira et al., 2016) for ten neural tissues (Methods) (Appendix: Table 6 ) and partitioned heritability analyses for the non-stratified GWAS did not identify any significant results (Table 4).

95

Table 3: Independent SNPs with P < 1x10-6 from the GWAS studies Non-stratified Females_only Male-only SNP Chr BP EA OA Effect SE P Effect SE P Effect SE P rs10486743 7 42979942 a g 0.14 0.04 2.23E-03 -0.04 0.06 4.84E-01 0.32 0.07 6.52E-07 rs111841153 6 151417895 a g -0.26 0.08 1.32E-03 0.06 0.11 5.81E-01 -0.57 0.12 7.58E-07 rs149662397 17 45058068 t g 0.18 0.04 1.58E-07 0.22 0.05 4.11E-06 0.15 0.05 4.38E-03 rs2213309 6 33077478 t c 0.05 0.02 3.61E-03 0.12 0.02 8.44E-07 -0.02 0.03 5.50E-01 rs34127688 10 28812111 t c 0.17 0.05 1.37E-03 0.36 0.07 9.24E-07 -0.02 0.08 7.55E-01 rs40836 16 28510537 a g 0.07 0.02 5.50E-05 0.12 0.02 4.10E-07 0.02 0.03 4.80E-01 rs4300633 16 17914135 t c -0.09 0.02 4.63E-07 -0.04 0.02 8.17E-02 -0.14 0.03 9.11E-08 rs6866984 5 161297795 a g -0.12 0.02 3.15E-07 -0.10 0.03 1.84E-03 -0.13 0.03 6.06E-05 rs7641347 3 3936632 t c 0.24 0.05 1.10E-05 0.42 0.07 1.58E-08 0.05 0.08 4.95E-01 rs9302534 16 18048710 t c 0.09 0.02 1.96E-07 0.05 0.02 3.02E-02 0.13 0.03 2.62E-07 EA: Effect allele, OA: Other allele, Chr = Chromosome, BP = position

96

Figure 4: QQ-plots for all the GWAMAs

Quantile-quantile plots for the non-stratified GWAMA (A), the females-only GWAMA (B), and the males-only GWAMA (C). n = 89553, λgc = 1.089, LDS intercept = 1.01 and for the non- stratified GWAMA. n = 44,574, λgc = 1.05, LDS intercept = 1.005 for the females only GWAMA. n = 44088, λgc = 1.06, LDS intercept = 1.006 for the males-only GWAMA.

Figure 5: Manhattan plot of Eyes Test meta-analysis

Manhattan plots for the non-stratified GWAMA (A), and the males-only GWAMA (B). n = 89553 and λgc = 1.089 and for the non-stratified GWAS. n = 44088 and λgc = 1.06 for the males-only GWAS.

97

Figure 6: Locus zoom plots for the most significant loci in the non-stratified and the males-only loci

Locus zoom plots for the most significant SNP in the males-only GWAS (A) (rs4300633, P = 9.11x10-8), and the non-stratified GWAS (B) (rs149662397, P = 1.58x10-7).

98

Table 4: Partitioned heritability results for the Eyes Test GWAS Category Prop Prop Prop Enrich Enric Enrich P FDR P SNP h2 h2 h SE SE H3K9ac_Trynka.extend.500_0 0.23 0.65 0.17 2.83 0.76 1.39E-02 2.95E-01 DGF_ENCODE.extend.500_0 0.54 0.96 0.19 1.77 0.35 2.68E-02 2.95E-01 Conserved_LindbladToh.exten 0.33 0.67 0.15 2.01 0.46 2.69E-02 2.95E-01 d.500_0 TSS_Hoffman_0 0.02 0.21 0.09 11.43 5.01 2.97E-02 2.95E-01 DHS_Trynka.extend.500_0 0.50 1.00 0.24 2.00 0.48 3.72E-02 2.95E-01 H3K4me1_Trynka.extend.500 0.61 0.88 0.13 1.45 0.21 4.08E-02 2.95E-01 _0 H3K4me1_peaks_Trynka_0 0.17 0.74 0.30 4.30 1.78 5.55E-02 2.95E-01 FetalDHS_Trynka.extend.500 0.29 0.72 0.23 2.53 0.82 5.76E-02 2.95E-01 _0 Enhancer_Hoffman.extend.50 0.15 0.42 0.14 2.73 0.94 6.36E-02 2.95E-01 0_0 UTR_3_UCSC.extend.500_0 0.03 0.15 0.07 5.45 2.57 7.16E-02 2.95E-01 Conserved_LindbladToh_0 0.03 0.26 0.13 10.01 5.14 7.17E-02 2.95E-01 H3K9ac_peaks_Trynka_0 0.04 0.34 0.17 8.78 4.43 7.20E-02 2.95E-01 Repressed_Hoffman.extend.50 0.72 0.57 0.08 0.79 0.12 7.77E-02 2.95E-01 0_0 H3K4me3_Trynka.extend.500 0.26 0.50 0.15 1.96 0.57 8.24E-02 2.95E-01 _0 H3K9ac_Trynka_0 0.13 0.41 0.17 3.27 1.36 8.79E-02 2.95E-01 Transcribed_Hoffman.extend. 0.76 0.51 0.15 0.67 0.19 8.90E-02 2.95E-01 500_0 H3K4me1_Trynka_0 0.43 0.71 0.22 1.67 0.52 1.91E-01 5.96E-01 WeakEnhancer_Hoffman.exte 0.09 0.24 0.13 2.74 1.48 2.44E-01 7.18E-01 nd.500_0 CTCF_Hoffman.extend.500_0 0.07 -0.06 0.13 -0.81 1.84 3.11E-01 8.67E-01 H3K4me3_Trynka_0 0.13 0.28 0.18 2.09 1.32 4.06E-01 9.10E-01 UTR_5_UCSC_0 0.01 -0.03 0.04 -5.45 7.79 4.07E-01 9.10E-01 FetalDHS_Trynka_0 0.08 0.30 0.26 3.50 3.06 4.14E-01 9.10E-01 UTR_3_UCSC_0 0.01 0.06 0.06 5.28 5.47 4.37E-01 9.10E-01 CTCF_Hoffman_0 0.02 0.13 0.14 5.38 5.78 4.50E-01 9.10E-01

99

H3K27ac_Hnisz.extend.500_0 0.42 0.34 0.12 0.80 0.28 4.59E-01 9.10E-01 H3K27ac_Hnisz_0 0.39 0.30 0.12 0.78 0.32 4.69E-01 9.10E-01 Intron_UCSC_0 0.39 0.44 0.07 1.13 0.19 4.78E-01 9.10E-01 H3K27ac_PGC2.extend.500_0 0.34 0.44 0.14 1.30 0.42 4.81E-01 9.10E-01 DHS_peaks_Trynka_0 0.11 0.31 0.31 2.73 2.75 5.27E-01 9.14E-01 Promoter_UCSC.extend.500_0 0.04 0.08 0.07 2.06 1.77 5.52E-01 9.14E-01 TFBS_ENCODE_0 0.13 0.25 0.22 1.92 1.64 5.73E-01 9.14E-01 Transcribed_Hoffman_0 0.35 0.23 0.21 0.66 0.61 5.75E-01 9.14E-01 Coding_UCSC.extend.500_0 0.06 0.02 0.09 0.29 1.37 6.06E-01 9.14E-01 H3K27ac_PGC2_0 0.27 0.37 0.20 1.37 0.72 6.07E-01 9.14E-01 TSS_Hoffman.extend.500_0 0.03 0.08 0.09 2.28 2.57 6.19E-01 9.14E-01 WeakEnhancer_Hoffman_0 0.02 -0.03 0.12 -1.64 5.80 6.47E-01 9.14E-01 Enhancer_Andersson.extend.5 0.02 -0.01 0.08 -0.71 3.93 6.64E-01 9.14E-01 00_0 SuperEnhancer_Hnisz_0 0.17 0.19 0.06 1.16 0.36 6.67E-01 9.14E-01 UTR_5_UCSC.extend.500_0 0.03 0.00 0.06 0.10 2.11 6.73E-01 9.14E-01 Promoter_UCSC_0 0.03 0.06 0.10 2.02 3.27 7.55E-01 9.72E-01 DGF_ENCODE_0 0.14 0.04 0.31 0.33 2.27 7.66E-01 9.72E-01 PromoterFlanking_Hoffman.e 0.03 0.01 0.10 0.15 2.92 7.70E-01 9.72E-01 xtend.500_0 TFBS_ENCODE.extend.500_ 0.34 0.29 0.19 0.86 0.54 7.91E-01 9.75E-01 0 DHS_Trynka_0 0.17 0.23 0.35 1.38 2.07 8.53E-01 1.00E+00 Coding_UCSC_0 0.01 0.03 0.08 1.98 5.74 8.65E-01 1.00E+00 SuperEnhancer_Hnisz.extend. 0.17 0.18 0.06 1.05 0.35 8.83E-01 1.00E+00 500_0 Repressed_Hoffman_0 0.46 0.43 0.24 0.94 0.51 9.06E-01 1.00E+00 Enhancer_Andersson_0 0.00 0.01 0.06 1.92 13.41 9.45E-01 1.00E+00 Enhancer_Hoffman_0 0.06 0.07 0.16 1.13 2.54 9.60E-01 1.00E+00 H3K4me3_peaks_Trynka_0 0.04 0.03 0.18 0.80 4.25 9.62E-01 1.00E+00 PromoterFlanking_Hoffman_0 0.01 0.01 0.07 0.71 8.20 9.71E-01 1.00E+00 Intron_UCSC.extend.500_0 0.40 0.40 0.06 1.00 0.16 9.85E-01 1.00E+00 base_0 1.00 1.00 0.00 1.00 0.00 NA NA This table provides the results of the partitioned heritability analyses. h2 = Additive SNP heritability, SE = Standard error, P = P-value, Enrich = enrichment.

100

4.3.4 Heritability analyses We used LDSR to calculate the heritability explained by all the SNPs in the HapMap3 with minor allele frequency > 5%. We identified a significant narrow sense heritability of 5.8% (95% CI: 4.5% – 7.2%; P = 1.00 x 10-17) in the non-stratified GWAS (Figure 7).

We calculated twin heritability using twin pairs from the BLTS cohort. For this subsample of the BLTS, twin ages at the time of testing ranged from 18 to 31 years (M = 25.3, SD = 3.0). As described earlier, some twins completed the Eyes Test twice, for these participants only their first attempt was included in analyses. The distribution of the Eyes Test data was normal; 3 univariate outliers (< -3 SD) were excluded, and there were no bivariate outliers within each zygosity group. In total, data were available for 749 twin individuals, including 122 complete monozygotic twin pairs (74 female, 48 male), and 176 complete dizygotic twin pairs (60 female, 33 male, and 83 opposite sex pairs) plus 149 unpaired individuals whose responses nevertheless strengthen estimates of mean and variances. MZ correlation (r = .31) was more than twice the DZ correlation (r = .09), which suggests an ADE model would fit the data better than an ACE model. Structural equation models were fit to the raw data using full information maximum likelihood estimation in OpenMx. A series of nested models indicated that means and variances could be equated across females and males, and MZ and DZ twins. Although age and sex could be dropped as covariates on the means without a significant loss of model fit, they were retained for consistency with the GWAS analyses and to reduce possible bias in parameter estimates. As ACE and ADE models are not nested, they were compared using Akaike’s Information Criterion (AIC), and nested submodels (AC, AE, and E) were tested using the likelihood-ratio test (LRT). The AE model was the best fitting model, although the CE model for familial aggregation could not be formally rejected (p=0.07); model fitting statistics are reported in Table 5(a) and standardised parameter estimates for this model, and for the ACE and ADE models are shown in Table 5(b) along with their 95% confidence intervals. In this small sample, although there was ample power to detect genetic effects in a reduced model, there was low power in the full model to estimate C or D in the presence of A (or vice versa). Thus, while the total genetic variance is around 30% in all three models, the upper 95% confidence limit for C from the ACE model indicates that C could account for as much as 22% of variance.

Our results are similar to predicted heritability estimates based on a previous meta-analysis of twin studies of empathy and prosocial behaviour (Knafo-Noam & Uzefovsky, 2013).

101

Figure 7: Mean scores and SNP heritability

A. Mean phenotypic scores and standard deviations for the Eyes Test in the 23andMe cohort. Point estimate provides the mean score, and the error bars represent standard deviations. Difference in mean scores between males and females was highly significant (P < 2.2E-16; Cohen’s d = 0.21). Numbers in brackets indicate the number of participants in each GWAS. All: non-stratified GWAS; Females: Females-only GWAS; Males: Males-only GWAS. B. Mean SNP heritability estimates and standard errors for the Eyes Test in the GWAMA. Point estimate provides mean SNP heritability, and error bar represents standard errors. There was no significant difference in SNP heritability estimates between males and females (P = 0.79). Numbers in brackets indicate the number of participants in each GWAMA. All: non-stratified GWAMA; Females: Females-only GWAMA; Males: Males-only GWAMA

Table 5A: Twin heritability analyses of the Eyes test short version using the BLTS cohort (14 items) Model Model Models LRT AIC -2LL df No. Type Compared ∆-2LL ∆df p

1 ADE 1702.025 3188.025 743 - - - - 2 ACE 1702.632 3188.632 743 - - - - 3 CE 1703.949 3191.949 744 3 vs. 2 3.317 1 0.069 4 AE 1700.632 3188.632 744 4 vs. 1 0.607 1 0.436 5 E 1711.615 3201.615 745 5 vs. 1 13.590 2 0.001

102

Table 5B: Standardised variance components with 95% CIs for the ACE, ADE, and AE models

Model A D C E Model No. Variance (95% Variance (95% Variance Variance (95% Type CI) CI) (95% CI) CI) 1 ACE 0.28 (0 - 0.42) NA 0 (0 - 0.22) 0.72 (0.58 - 0.87) 2 ADE 0.05 (0 - 0.41) 0.26 (0 - 0.46) NA 0.69 (0.54 - 0.86) 3 AE 0.28 (0.13 - 0.42) NA NA 0.72 (0.58 - 0.87) ADE and ACE are not nested models and were compared using AIC, where a lower value indicates a better fitting model. AE, CE, and E models are nested within the ADE and/or ACE models and were compared to the fuller model with the likelihood ratio test (LRT). A non- significant p-value from the LRT indicates the submodel is an acceptable fit to the data. Although D could be dropped from the ADE model without a significant loss of fit, A and D could not, indicating significant genetic effects on variation in the Eyes Test.

4.3.5 Genetic correlation We next investigated how the non-stratified Eyes Test is genetically correlated to psychiatric conditions and specific psychological and cognitive phenotypes for which summary GWAS data were available (Table 6). After correcting for multiple testing (Bonferroni correction alpha = 2.08 × 10−3), we identified significant positive genetic correlations between

Eyes Test scores and the NEO-Five Factor Inventory measure of openness (rg = 0.54±0.14; P = 1.17 × 10−4) (de Moor et al., 2012). We also identified significant positive correlations with -11 different measures of cognition and education: college years (rg = 0.40±0.6; P = 2.48x10 ) (Rietveld et al., 2013), educational attainment (0.34±0.04; P = 3.7x10-17) (Okbay, Beauchamp, et al., 2016), and childhood cognitive aptitude (calculated as Spearman’s g and is, hence, -3 independent of word knowledge) (Benyamin et al., 2014) (rg = 0.34±10; P =1.3x10 ). In addition, we identified a significant positive genetic correlation between the Eyes Test scores -3 and anorexia nervosa (Anorexia rg = 0.25±0.0; P = 1.9x10 ) (Figure 8). We did not identify a significant genetic correlation between autism and scores on the Eyes Test. We also investigated if subcortical brain volumes are correlated with performance on the Eyes Test. We used data from the ENIGMA consortium for six subcortical regions and intracranial volume (Hibar et al., 2015). We excluded the amygdala, even though it is relevant for social cognition, as the low heritability of the amygdala could not be accurately quantified using LDSR23. None of the correlations were significant after Bonferroni correction. However, we identified nominally significant positive correlation between the Eyes Test scores and the

103

-3 volumes of the caudate nucleus (Hibar et al., 2015) (rg = 0.24±0.09; P = 9.2x10 ) and volume of the putamen (rg = 0.21±0.08; P = 0.013), which together form the dorsal striatum.

Table 6: Genetic correlations

Phenotype rg SE P FDR P N Caudate nucleus 0.25 0.09 9.25E-03 3.30E-02 11,624 Hippocampus -0.08 0.12 5.20E-01 6.19E-01 11,621 ICV 0.12 0.11 2.68E-01 4.19E-01 9,826 Nucleus Accumbens 0.21 0.15 1.54E-01 2.95E-01 11,603 Pallidum 0.04 0.12 7.49E-01 7.49E-01 11,595 Putamen 0.21 0.08 1.33E-02 4.14E-02 11,598 Thalamus -0.06 0.12 6.14E-01 6.75E-01 11,646 Anorexia* 0.25 0.08 1.90E-03 9.48E-03 14,477 Anxiety -0.25 0.15 1.02E-01 2.11E-01 17,310 Autism 0.11 0.09 2.27E-01 3.79E-01 10,263 Bipolar disorder 0.14 0.07 5.78E-02 1.44E-01 16,731 Major Depressive Disorder 0.13 0.11 2.22E-01 3.79E-01 16,610 Schizophrenia 0.07 0.04 8.11E-02 1.84E-01 79,845 Agreeableness 1.13 1.20 3.49E-01 4.36E-01 17,375 Borderline personality disorder features -0.27 0.55 6.21E-01 6.75E-01 7,125 Childhood cognitive aptitude* 0.34 0.10 1.20E-03 7.50E-03 12,441 College years* 0.40 0.06 2.48E-11 3.10E-10 95,427 Conscientiousness 0.15 0.15 3.19E-01 4.33E-01 17,375 Educational attainment* 0.34 0.04 1.49E-17 3.73E-16 328,917 Empathy 0.18 0.07 7.54E-03 3.14E-02 46,861 Extraversion -0.03 0.09 6.92E-01 7.21E-01 63,030 Neuroticism -0.10 0.11 3.28E-01 4.33E-01 63,661 Openness* 0.54 0.14 1.17E-04 9.73E-04 17,375 Systemizing 0.07 0.07 3.29E-01 4.33E-01 51,564

*Significant genetic correlations after Bonferroni correction. rg = genetic correlation, SE = Standard error, P = P-value, FDR P = FDR adjusted P, N = sample size.

104

Figure 8: Genetic correlations between the Eyes Test and psychiatric conditions, psychological phenotypes and subcortical brain volumes

Genetic correlations and standard errors for the Eyes Test in the 23andMe cohort. Figures above the bars represent P-values. All P-values with p < 0.05 provided. * represents significant genetic correlations after Bonferroni correction. Point estimate represents the genetic correlation, and the error bars represent the standard errors. BPD features is borderline personality disorder features, ICV is intracranial volume. We have removed the genetic correlation for agreeableness from this figure due to the high standard errors.

105

We also investigated sex-stratified genetic correlations between the Eyes Test and educational attainment, the only relevant phenotype where we had access to sex-stratified data. We identified a modest, significant genetic correlation between educational attainment and the -5 Eyes Test in the males-only dataset: rg = 0.23±0.05; P = 2.6x10 . We identified a higher, significant genetic correlation between educational attainment and Eyes Test in the females- -11 only dataset: rg = 0.39±0.06; P = 5.88x10 . These results suggest that females share greater pleiotropy between general cognition and cognitive empathy than males, indicating different genetic mechanisms for the development of cognitive empathy.

4.3.6 Sex differences We also investigated sex-differences in the Eyes Test. There was a significant female advantage on the scores of the full Eyes Test (males = 27.08±3.75; females = 27.85±3.55; cohen’s d = 0.21, P < 2x10-16), replicating previous results (Kirkland, Peterson, Baker, Miller, & Pulos, 2013) (Figure 7). We also investigated sex-difference in the short Eyes Test. On average, women scored significantly higher than men (8.99, sd = 2.30 and 8.66, sd = 2.41; P = 0.019; Cohen’s d = 0.152). As this included participants on both versions of the short test, we also checked if there was a significant difference in the sex-ratio between V1 and V2 of the test, to account for the potential facilitation effect seen in V2. A chi-square test showed that there was no significant difference in the number of male and female participants between the two versions of the test (chi-square = 0.73; two-tailed P = 0.39). The ratio of males to females was the same in both the versions of the test (0.66).

There was no significant difference in males-only or females-only SNP heritability estimates (males = 0.071±0.011, females = 0.067±0.011; P = 0.79) (Figure 7). There was a reasonably high but incomplete genetic correlation between males and females (rg = 0.68±0.12; P = 2.70x10-8). This is significantly different from 1 (P = 0.003, one-sided Wald Test). Binomial sign test of LD-pruned nominally significant SNPs in the sex-stratified analyses identified that 61% (95% CI: 59% - 62%) of the SNPs had a concordant effect direction (P < 2.2x10-16). We further investigated the effect direction and statistical significance of all independent SNPs with P < 1x10-6. SNPs that were of suggestive significance in one sex were not nominally significant in the other (Table 3 and Figure 9), which was supported by Cochran’s Q-value. However, the effect sizes of these SNPs are likely to be inflated by

106

winner’s curse, and after correcting for winner’s curse, we did not identify significant Cochran’s Q-value (Table 3).

Figure 9: Effect direction for independent suggestive SNPs (P < 1x10-6)

Point estimates are effect sizes (uncorrected for winner’s curse) from the GWAMA and bars represent standard errors. P-values provided for each SNP

107

Using MetaXcan (Barbeira et al., 2016) we identified the top cortically expressed genes (P < 0.05) for both sexes and calculated the overlap in the genes. First, for the overlap in top genes between males and females for the Eyes Test, we identified all nominally significant genes ( P < 0.05) in the cortex using MetaXcan (Barbeira et al., 2016). To identify the background gene set, we overlapped all the genes for males and females after filtering out genes whose correlation with predicted models of expression was < 0.01 and where there were zero SNPs from our dataset. We did not find any enrichment in gene overlap (Fold difference = 1.2, P = 0.26).

Figure 10: Overlap of top genes in males and females (Eyes Test)

We identified cortical genes associated with the phenotype using MetaXcan for each sex using the sex-stratified GWAS and compared the number of nominally significant genes that are common to both the sexes. Total number of genes is 4738, yellow circle represents the nominally significant genes in Females, blue circle represents the nominally significant genes in males. The overlap is given in the shaded green portion in the middle. Fold difference = 1.2; P = 0.264. We also investigated if there was an enrichment of female-overexpressed or male- overexpressed cortical genes for the Eyes Test. For the overlap in sex-differentially expressed genes, we used the discovery dataset from Werling et al. (2016) (Werling et al., 2016), which was from the BrainSpan project. For the background gene set, we used all the genes identified in MetaXcan for the Eyes Test (males or females), as we reasoned that all known genes were covered in the RNA sequence analysis of human cortical tissues in the BrainSpan project. We used only cortical gene-expression from the MetaXcan results. In total, we conducted 4 separate enrichment analysis: Eyes Test-male: male-expressed; Eyes Test-female: male- expressed, Eyes Test-female: female-expressed; and Eyes Test-female: male-expressed. Figure 3 provides the results of the enrichment analyses. We used a P-value threshold of 0.025 (0.05/2) to account for two different tests performed for each Eyes Test dataset. We did not identify any significant enrichment (Figure 11) (Appendix: Tables 7 – 9 provide the results of the sex-

108

stratified MetaXcan analyses, and the list of sex differentially expressed genes in the adult cortex).

Figure 11: Sex-difference enrichment analyses

A - Eyes Test-male: male-expressed; B - Eyes Test-male: female-expressed; C -Eyes Test- female: male-expressed, and D - Eyes Test-female: female-expressed. Background gene numbers are provided in the white box. Overlap is provided in the green overlapping space.

4.4 Discussion This is the first large-scale genetic study investigating the genetic architecture of cognitive empathy. We investigated heritability estimates of the Eye Test in two samples. In our sample of 749 twin individuals (which included 122 complete MZ pairs and 176 complete DZ pairs), heritability was approximately 0.28 (95% CI: 0.13–0.42). This is in keeping with previous studies that have investigated heritability of other facets of empathy in twins. A meta- analysis of empathy in twins identified that approximately a third of the variance is heritable (Knafo-Noam & Uzefovsky, 2013). In our sample of 88,056 unrelated research volunteers from 23andMe Inc, SNP-based heritability was estimated using LDSR, and approximately 5% of the phenotype was additively heritable. It is likely that heritability of cognitive empathy changes with age (which was significantly correlated with scores on the Eyes Test in this dataset), as is observed in prosocial behaviour (Knafo-Noam & Uzefovsky, 2013).

109

We identified significant positive genetic correlations with different measures of cognitive ability including educational attainment. This reflects the phenotypic correlation between measures of cognitive empathy and cognitive ability. A meta-analysis identified a significant positive correlation between scores on the Eyes Test and IQ (n = 3583; r = 0.24; 95% CI: 0.16 – 0.32) (Baker, Peterson, Pulos, & Kirkland, 2014), perhaps reflecting that the Eyes Test has a verbal component that includes a varied mental state lexicon (matching a mental state word to an emotional expression). Other tests of theory of mind are also positively correlated with cognitive aptitude and measures of intelligence (Buitelaar, van der Wees, Swaab-Barneveld, & van der Gaag, 1999; Charlton, Barrick, Markus, & Morris, 2009; Ibanez et al., 2013). This may reflect that theory of mind, and in particular joint attention in infancy, may facilitate language development and learning from others (Baron-Cohen, Baldwin, & Crowson, 1997). Theory of mind may also be related to cognitive aptitude and IQ because we often infer another person’s mental state through their speech – speech is the ‘print-out’ of a person’s mind – so verbal IQ and language skills may facilitate theory of mind and vice versa. We also found a significant positive genetic correlation with the NEO-Openness to experience which likely reflects a previous correlation at a phenotypic level between measures of empathy and personality (Magalhães, Costa, & Costa, 2012). With psychiatric conditions, there was a significant positive correlation with anorexia. One study identified that individuals with anorexia report higher personal distress (Beadle et al., 2013), a subscale on a widely used measure of empathy, whilst other studies have reported that deficits in social cognition in anorexia may be attributable to comorbid alexithymia (Brewer et al., 2015).

We did not identify a significant genetic correlation between the Eyes Test and autism. This may be due to heterogeneity in performance in the Eyes Test, since only a subset of individuals with autism show impaired performance on the Eyes Test (Baron-Cohen et al., 2015; Lombardo et al., 2016). In addition, the cognitive phenotype of autism involves non- social aspects (such as excellent attention to detail), not just social deficits. A meta-analysis reported global or selective deficits in performance on the Eyes Test in individuals with schizophrenia, anorexia, bipolar disorder, and clinical depression, but preserved or even enhanced performance for individuals with non-clinical depression and borderline personality disorder (Dinsdale, Mokkonen, & Crespi, 2016). However, these studies are typically conducted in small sample sizes. Performance on the Eyes Test may be influenced by multiple

110

factors related to psychiatric conditions, and may not measure a direct causal relationship between psychiatric conditions and cognitive empathy.

We also note the nominally significant genetic correlation between volume of the caudate nucleus and putamen and scores on the Eyes Test. Although the correlations were not significant after Bonferroni correction, this is of potential interest as neuroimaging studies have reported activation in both the putamen (Campanella, Shallice, Ius, Fabbro, & Skrap, 2014) and caudate nucleus (Kemp et al., 2013) during tasks of social cognition. In humans, the ventral striatum is composed of the nucleus accumbens and olfactory tubercle, whereas the dorsal striatum is composed of the caudate nucleus and putamen. There is some evidence to support the role of the striatum in theory of mind (Abu-Akel & Shamay-Tsoory, 2011), and further research is need to confirm that cognitive and affective empathy utilize different neural circuits. Using larger GWAS samples of subcortical brain volumes will help better understand if common genetic variants contribute to both volumes of the dorsal striatum and cognitive empathy.

We also identified one locus that is significantly associated with empathy in females. The top SNP (rs7641347) had a P-value = 1.58x10-8. One of the closest gene, LRRN1, is highly expressed in striatum according to the GTEx database. However, we were unable to identify any eQTL that specifically linked this locus to the gene. LRRN1 is a gene that is not well characterized. In chicks, Lrrn1 is necessary for the formation of the mid-brain hind-brain boundary (Tossell et al., 2011). The locus was significant in females, nominally significant in the non-stratified analyses, and non-significant in the males-only analyses, suggesting a sex specific involvement of this locus in cognitive empathy measured using the Eyes Test. We note that even with approximately 90,000 individuals this GWAMA was underpowered to detect loci of significant effect, owing to the relatively low variance explained per SNP. Future research needs to investigate the functional significance of LRRN1 in human brain development and its role in neurodevelopmental conditions.

It is also interesting to note that while twin and SNP-based heritability did not vary between the sexes in our study, we replicated the female-advantage on the Eyes Test in the largest sample to date. Sex-stratified analyses also allowed us to investigate the genetic correlation between males and females, and subsequently, sex-specific imputed gene expression in cortical tissues. Male-female genetic correlation was only modest, which was

111

supported by a binomial sign test. In comparison, other phenotypes for which we had sex- stratified data, genetic correlation was considerably higher (e.g., self-reported empathy

(Chapter 3): rg = 0.82±0.16, systemizing (Chapter 7): rg = 1.0±0.16 ; educational attainment

(Okbay, Beauchamp, et al., 2016) : rg = 0.91±0.02). We also did not identify a significant overlap between the genes identified for the sex-stratified GWAS. All of this suggests that there is some sex specificity in the genetic architecture of cognitive empathy. How this sex- specific architecture is expressed and interacts with prenatal steroid hormones (Chapman et al., 2006) will help shed further light on the biological contributions to the female superiority on the Eyes Test.

In conclusion, we identify a genetic locus that is associated with scores on the Eyes Test in females. We identify significant positive genetic correlations between scores on the Eyes Test and four phenotypes: anorexia nervosa, cognitive aptitude and educational attainment, and openness to experience. Phenotypic sex-differences for the Eyes Test may be partly due to different genetic architectures in males and females, interacting with postnatal social experience. In Chapter 5, we turn to look at different measures of cognitive empathy, this time in adolescents, again from the perspective of GWAS.

112

5. Genome-wide association study of theory of mind in adolescents

5.1 Introduction Theory of mind is the ability to attribute mental states to one self and others and to use such mental state attribution to make sense of behaviour and predict it. First order theory of mind refers to the ability to understand another person’s mental state (e.g., “He thinks x”). Second-order theory of mind is when theory of mind is applied recursively to understand what someone is thinking of another person’s mental state (e.g., “He thinks that she thinks x”). Typically, first order theory of mind develops in early childhood (by 3 to 4 years of age) (Frith & Frith, 2005), though precursors to theory of mind are evident at the end of infancy, around 9-14 months of age, in behaviours such as proto-declarative pointing and gaze cueing (Baron- Cohen, 1991). This suggests a developmental component to theory of mind, including a social learning component. Second-order theory of mind develops a little later, by age 5 to 6 years of age (Miller, 2009). Other studies have identified that infants are capable understanding mental states of others by investigating the looking-time towards ‘impossible’ events (Onishi & Baillargeon, 2005). The development of theory of mind follows, by and large, consistent patterns, which are observed across cultures (Wellman, Cross, & Watson, 2001).

Due to the complex nature of theory of mind, twin studies have identified different heritabilities for theory of mind and related phenotypes at different developmental stages. Heritability is also different for first-order and second-order theory of mind tasks. We note that no study has investigated the twin heritability of the task used in this study – the Emotional Triangles Task (Triangles Task), a first-order test of theory of mind (Boraston, Blakemore, Chilvers, & Skuse, 2007). However, a few studies have investigated the twin heritability of other theory of mind tasks. A large study investigating the heritability of different theory of mind tasks in 1,116 5-year olds suggested that shared environmental influences rather than genetic factors contribute to most of the variance in these tasks (Hughes et al., 2005). Another study in 695 9-year-olds identified a small, but non-significant additive genetic component (Ronald, Viding, Happé, & Plomin, 2006). However, other studies have identified modest heritabilities in theory of mind and related phenotypes. A study based on parent-reports of children’s prosocial and antisocial behaviour requiring theory of mind in 2 – 4 year olds identified a modest and significant heritability (Ronald, Happé, Hughes, & Plomin, 2005). In adults, cognitive empathy, measured using the ‘Reading the Mind in the Eyes’ Test (Eyes Test),

113

identified a significant twin heritability of approximately 28% (Warrier et al., 2017) (Chapter 4).

Difficulties in theory of mind have been identified in different psychiatric conditions. In autism, studies have identified that children with autism have difficulties in attributing mental states (Baron-Cohen, Leslie, & Frith, 1985), known as ‘mindblindness’. This comes by degrees, rather than being all or none. This may reflect in children with autism developing theory of mind abilities later developmentally. Other studies have identified that adults with autism also have difficulties in theory of mind (Baron-Cohen, Wheelwright, Hill, et al., 2001; Jolliffe & Baron-Cohen, 1999). Similarly, a meta-analysis across multiple studies identified significant impairments in tasks involving theory of mind in individuals with schizophrenia (Popolo et al., 2016). Difficulties in theory of mind have also been identified in unipolar and bipolar disorders (Ang & Pridmore, 2009; Zobel et al., 2010), and eating disorders (Tapajóz P de Sampaio et al., 2013), and ADHD (Maoz, Gvirts, Sheffer, & Bloch, 2017). Studies also suggest that theory of mind is predicted by measures of cognition and working memory (Buitelaar et al., 1999; Mutter, Alcorn, & Welsh, 2006).

These differences in performance on tests of theory of mind could be due to underlying biology, or due to other environmental processes that mediate performance on tests of theory of mind in individuals with psychiatric conditions. Here, we test the genetic correlates of first- order theory of mind using the Emotional Triangles Task (Triangles Task) (Boraston et al., 2007). In the Triangles Task, participants are required to attribute mental states to animated triangles (e.g., “the triangle is angry”). In the original version of this task, the participant is simply asked to describe what they see, and the spontaneous narratives are coded for the number and type of mental state attribution. In the version used in the current study, participants are asked to pick the right mental state (from one of four options in a forced choice format) based on motion cues of the triangles. The sample comprised 4,577 13 year olds from the Avon Longitudinal Study of Parents and Children on whom we had both genetic and phenotypic data after quality control. They took the Emotional Triangles task at adolescence, a period marked by key changes in neural architecture, in peer-relationship, and in hormonal profile (Paus, Keshavan, & Giedd, 2008; Whitaker et al., 2016). Interrogating the genetic relationship between theory of mind at this age and risk for psychiatric conditions with known difficulties in theory of mind (autism, ADHD, anorexia nervosa, bipolar disorder, depression, and schizophrenia) may identify genetic biomarkers for this developmental stage.

114

This study has three specific aims: 1. To determine the narrow-sense heritability of theory of mind in 13-year-olds, measured using the Emotional Triangles Task; 2. To identify any genes and genetic loci associated with the Triangles Task; and 3. To test if polygenic score for six psychiatric conditions (ADHD, anorexia, autism, bipolar disorder, depression and schizophrenia), cognitive aptitude, and two different measures of empathy (the EQ and the “Reading the Mind in the Eyes” Test (Eyes Test), Chapters 3 and 4) predict performance on the Triangles Task in 13-year-olds.

5.2 Methods 5.2.1Phenotype and participants Theory of mind was measured using the Emotional Triangles Task (Triangles Task) (Boraston et al., 2007). All participants were 13 years of age (born in April 1991- Dec 1992), and measures were collected as a part of the ongoing study - the Avon Longitudinal Study of Parents and Children (ALSPAC). Data was queried using the fully searchable data dictionary, which is available online here: http://www.bristol.ac.uk/alspac/researchers/access/. ALSPAC consists of 14,541 initial pregnancies from women resident in Avon, UK resulting in a total of 13,988 children who were alive at 1 year of age. In addition, children were enrolled in additional phases, which are described in greater detail elsewhere (Boyd et al., 2013).

The study received ethical approval from the ALSPAC Ethics and Law Committee, and written informed consent was obtained from parent or a responsible legal guardian for the child to participate. Assent was obtained from the child participants where possible. In addition, we also received ethical permission to use de-identified summary genetic and phenotype data from the Human Biology Research Ethics Committee at the University of Cambridge. All research was performed in accordance to the Helsinki Declaration.

The Triangles Task is a test of theory of mind, where participants have to attribute mental states to non-living objects (animated triangles). Test-retest reliability of the mental state coding scheme has identified an interclass correlation of 0.69 and a technical error of measurement of 0.66 (Boraston et al., 2007). The Triangles Task consists of 28 questions (16 scored questions and 12 control questions). In each question, a 5-second animation of a triangle is shown and a question is asked about the mental state of the triangle (e.g. Was the triangle angry?). Participants are asked to choose from 0 to 5 (a Likert-scale) to respond to the question,

115

where 0 indicates that the triangle did not possess the mental state described in the question, and 5 indicates that the triangle definitely possessed the mental state described in the question. In total, four mental states were tested (happy, sad, angry, and scared). For each mental state, there were two positive questions, where the mental state of the triangle matched the mental state described in the question, and two negative questions, where the mental state of the triangle did not match the mental state described in the question (e.g. the triangle is shown to be happy, and the subsequent question is “Was the triangle sad?”). Hence, in total, there were 16 questions that were scored. Control questions comprised of asking if the triangle was living, and participants, again, had to choose between 0 - 5.

We calculated the total score by adding the score for all the positive questions and subtracting the score for the negative items. To avoid negative scores, we added 40 to the total score, giving the score a range from 0 - 80. We removed participants who had chosen the same answer for more than 50% of the items, suggesting that they were not attending to the task. This was done after carefully evaluating the options selected and the reaction times. We noticed that, for these participants: 1. The reaction time was minimal for three or more consecutive items at various points in the test 2. The same option was chosen for three or more consecutive items at various points on the tests, 3. The same option was chosen for both the positive and negative questions, and 4. The control questions were answered incorrectly at various points in the test. After removing these participants, we had genetic and phenotypic data on 4,948 participants (n = 2,412 females, and n = 2,536 males).

5.2.2 Genotyping and Imputation Genotyping and imputation was conducted by ALSPAC. All participants were genotyped using the Illumina HumanHap550 quad chip by 23andMe. GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corportation of America) using support from 23andMe. This resulting raw genome-wide data were subjected to the following quality control procedures: Individuals were removed with discordant gender information, if there was excessive or low genetic heterozygosity, if missingness was > 3%, and if they were of non-European ancestry as measured using multidimensional scaling analyses and compared with Hapmap II (release 22), and if there was evidence of cryptic relatedness (>10% IBD); SNPs were removed if they had a minor allele frequency < 1%, deviated from Hardy-Weinberg equilibrium (P < 5x10-7), and had a call rate < 95%. This resulted in a total of 526,688 genotyped SNPs. Using SNP data

116

Gfrom mothers and children (477,482 common SNPs between mothers and children), haplotypes were estimated using ShapeIT (v2.r644) (Delaneau, Marchini, & Zagury, 2011). Imputation was performed using Impute2 V2.2 (Howie, Donnelly, & Marchini, 2009) against the 1000 genomes reference panel (Phase 1, Version 3), using all 2186 reference haplotypes including non-Europeans. Imputed SNPs were excluded from all further analyses if they had a minor allele frequency < 1% and info < 0.8, which resulted in a total of 8,282,911 SNPs.

5.2.3 Genome-wide association analyses and gene based analyses We conducted a genome-wide association analyses (GWAS) on the total score on the Triangles Task. In addition to the quality control procedure described above, we conducted additional quality control steps for the participants included in this study. We included only those SNPs with a minor-allele frequency > 1%. We excluded all SNPs that were not in Hardy- Weinberg equilibrium (P < 5 x 10-7), had a per-SNP missing rate > 5%. Similarly, we excluded all individuals who had a genotype missing rate > 10%. We included sex and the first two genetic ancestry principal components. Regression analyses was run using a linear regression model in Plink 1.9 (Purcell et al., 2007). BGEN files were converted to Plink format using hard calls. Calls with uncertainty greater than 0.1 were treated as missing. After quality control, 4,577 were included in the GWAS.

Gene-based and pathway analysis for Gene Ontology terms was conducted using MAGMA (de Leeuw et al., 2015). Significant genes were identified after Bonferroni correction (P < 2.74x10-6). Significant pathways were identified using Benjamini-Hochberg FDR corrected P-value < 0.05.

5.2.4 Heritability and Polygenic risk scores SNP heritability was estimated using GCTA v1.26 GREML (Yang, Lee, Goddard, & Visscher, 2011) (REML) and LDSC (Bulik-Sullivan, Finucane, et al., 2015). For GCTA GREML, heritability was calculated after including sex and the first two genetic principal components as covariates. We investigated for inflation in chi-square statistics due to uncorrected population stratification using LDSC (Bulik-Sullivan, Finucane, et al., 2015). Given that the cohort comprised of unrelated individuals and individuals with cryptic relatedness were removed (IBD > 10%) during quality control of the raw genotype data, we calculated the genetic relatedness matrix using all individuals in the study.

117

Given the different polygenicity and power of the GWAS used as the training datasets, we constructed polygenic risk score using PRSice (Euesden, Lewis, & O’Reilly, 2015) at eight different P-value thresholds (0.01, 0.05, 0.1, 0.15, 0.20, 0.5, 0.8 and 1.0). We chose these thresholds so as to balance the signal to noise ratio in GWAS results used as training datasets. Polygenic risk scores were constructed using summary data for 6 psychiatric conditions (ADHD (n = 55,374) (Demontis et al., 2017), anorexia (Duncan et al., 2017) (n = 14,477), autism (n = 15,954) (The Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium, 2017), bipolar disorder (Sklar et al., 2011) (n = 16,731), major depressive disorder (Ripke et al., 2013) (n = 16,610), and schizophrenia (Ripke et al., 2014) (n = 79,845), cognitive aptitude (n = 78,803) (Sniekers et al., 2017), self-reported empathy (EQ, Chapter 3) (n = 46,861), and cognitive empathy (the Eyes Test, Chapter 4) (n = 89,553). We included sex, and the first two ancestry principal components as covariates for polygenic scoring. We did not include age as all participants were tested at approximately 13 years of age. We used a Benjamini Hochberg FDR correction to correct for the traits.

5.2.5 Data availability Data were downloaded from the psychiatric genomics consortium website for the 6 psychiatric conditions (http://www.med.unc.edu/pgc/results-and-downloads), and the Complex Traits Genomics website for cognitive aptitude (https://ctg.cncr.nl/software/summary_statistics).

5.3 Results 5.3.1Phenotypic distribution The range of the scores of the participants was 28 - 80. Inspection of the frequency histogram and quantile-quantile plot suggested a normal distribution (Figure 1). The mean score of the participants was 56.93 (SD = 7.43). Females scored significantly higher than males (Females: 57.68, SD = 7.43; Males: 56.22, SD = 7.36; P < 0.001, unpaired, two-tailed T-test), though the effect size was small (Cohen’s D = 0.19).

118

Figure 1: Frequency histogram and Quantile-quantile plot of the scores on the Triangles Task

A. Frequency histogram of scores on the Triangles Task. B. Quantile-quantile plot of the scores on the Triangles Task.

5.3.2 Genome-wide association analyses and heritability Genome-wide association analyses did not identify any significant SNP. The top SNP was rs2120452 (P = 6.8x10-7) on . The SNP lies in a non-coding RNA LOC105372904. The next top SNP (rs17753687; P = 8.6x10-7) is an intronic SNP in BBS4, a gene implicated in Bardet–Biedl syndrome. Investigation of the QQ plots and LD score regression intercept did not reveal any inflation in effect sizes due to population stratification (LD score regression intercept = 0.99 ±0.0063). SNP heritability was small and nonsignificant 2 2 (LDSR - h SNP = 0.13 ± 0.10; P = 0.16; GCTA - h SNP = 0.072±0.069; P = 0.29). Manhattan and QQ-plots are provided in Figure 2.

119

Figure 2: Manhattan plot and quantile-quantile plot of the GWAS of the Triangles Task

Manhattan plot of the Triangles Task GWAS (top). Y-axis is the -log10(P-value) for each SNP. X axis is the Chromosome. Quantile-quantile plot of the Triangles Task GWAS (bottom).

5.3.3 Gene-based and pathway analysis We conducted gene-based analysis using MAGMA, and did not identify and significant genes at a genome-wide P-value threshold of 2.74x10-6. The most significant gene was MARK4 at 19q13.32 (P = 2.96x10-6). MARK4 is involved in phosphorylating microtubule associated . It has high expression in the brain and in testes according to GTEx (Ardlie et al., 2015). The list of genes and their P-values is provided in Appendix: Table 10. Pathway analyses also did not identify any significant pathways. The list of pathways is provided in Appendix: Table 11

120

5.3.4 Polygenic risk score As the heritability was non-significant and low to conduct genetic correlation analyses, we conducted polygenic risk score with 6 psychiatric conditions, cognition, cognitive empathy and self-reported empathy. We conducted polygenic risk score at 8 P-value thresholds. We did not identify a significant polygenic score after FDR based correction for any of the six psychiatric conditions investigated (Figure 3). Overall, the polygenic scores predicted limited variance for psychiatric conditions (Table 1). However, polygenic scores in both cognitive aptitude and cognitive empathy were significantly associated with scores in the Triangles Task across all the thresholds tested using after FDR correction. For cognitive aptitude and at two P-value thresholds for cognitive empathy, these results were significant even after using a more stringent, highly conservative Bonferroni correction (Figure 3 and Table 1). This likely reflects both the greater statistical power of the two datasets when compared to the other GWAS datasets in the condition and the underlying pleiotropy between theory of mind and cognition as previous studies have identified a modest, positive correlation between different measures of theory of mind and cognition (Peterson & Miller, 2012; Warrier et al., 2017) (Chapter 4).

121

Table 1: Results of the Polygenic score analyses

Threshold P r2 FDR P Phenotype 0.01 9.94E-01 1.37E-08 9.94E-01 Anorexia 0.05 9.90E-01 3.40E-08 9.94E-01 Anorexia 0.1 3.75E-01 1.70E-04 6.50E-01 Anorexia 0.15 2.83E-01 2.49E-04 5.72E-01 Anorexia 0.2 4.63E-01 1.17E-04 7.25E-01 Anorexia 0.5 5.69E-01 7.03E-05 8.34E-01 Anorexia 0.8 4.47E-01 1.25E-04 7.22E-01 Anorexia 1 4.36E-01 1.32E-04 7.22E-01 Anorexia 0.01 7.90E-01 1.54E-05 9.56E-01 Autism 0.05 5.23E-01 8.86E-05 8.01E-01 Autism 0.1 7.06E-01 3.08E-05 9.08E-01 Autism 0.15 7.52E-01 2.17E-05 9.50E-01 Autism 0.2 8.08E-01 1.27E-05 9.56E-01 Autism 0.5 9.30E-01 1.69E-06 9.74E-01 Autism 0.8 8.23E-01 1.08E-05 9.56E-01 Autism 1 8.08E-01 1.27E-05 9.56E-01 Autism 0.01 3.90E-02 9.24E-04 1.65E-01 ADHD 0.05 1.33E-01 4.88E-04 4.86E-01 ADHD 0.1 1.83E-01 3.85E-04 4.86E-01 ADHD 0.15 1.28E-01 5.03E-04 4.86E-01 ADHD 0.2 1.66E-01 4.15E-04 4.86E-01 ADHD 0.5 1.53E-01 4.43E-04 4.86E-01 ADHD 0.8 1.74E-01 4.01E-04 4.86E-01 ADHD 1 1.69E-01 4.11E-04 4.86E-01 ADHD 0.01 8.52E-01 7.54E-06 9.59E-01 BIP 0.05 6.31E-01 5.01E-05 8.68E-01 BIP 0.1 3.68E-01 1.76E-04 6.50E-01 BIP 0.15 2.43E-01 2.96E-04 5.47E-01 BIP 0.2 2.23E-01 3.23E-04 5.35E-01 BIP 0.5 1.89E-01 3.73E-04 4.86E-01 BIP 0.8 1.86E-01 3.79E-04 4.86E-01 BIP 1 1.99E-01 3.57E-04 4.94E-01 BIP 0.01 8.42E-01 8.59E-06 9.59E-01 Depression 0.05 8.98E-01 3.56E-06 9.74E-01 Depression 0.1 8.20E-01 1.13E-05 9.56E-01 Depression 0.15 5.88E-01 6.36E-05 8.34E-01 Depression 0.2 6.41E-01 4.71E-05 8.68E-01 Depression 0.5 9.71E-01 2.88E-07 9.94E-01 Depression 0.8 9.28E-01 1.76E-06 9.74E-01 Depression

122

1 9.33E-01 1.51E-06 9.74E-01 Depression 0.01 2.95E-01 2.38E-04 5.74E-01 SCZ 0.05 1.38E-01 4.76E-04 4.86E-01 SCZ 0.1 1.46E-01 4.59E-04 4.86E-01 SCZ 0.15 2.57E-01 2.78E-04 5.61E-01 SCZ 0.2 2.81E-01 2.52E-04 5.72E-01 SCZ 0.5 2.86E-01 2.47E-04 5.72E-01 SCZ 0.8 3.19E-01 2.15E-04 5.89E-01 SCZ 1 3.15E-01 2.19E-04 5.89E-01 SCZ 0.01 7.06E-04 2.48E-03 4.62E-03 Cognitive empathy 0.05 8.08E-04 2.43E-03 4.85E-03 Cognitive empathy 0.1 3.38E-04* 2.78E-03 2.43E-03 Cognitive empathy 0.15 1.34E-04* 3.16E-03 1.07E-03 Cognitive empathy 0.2 2.78E-03 1.94E-03 1.25E-02 Cognitive empathy 0.5 1.87E-03 2.10E-03 8.98E-03 Cognitive empathy 0.8 1.15E-03 2.29E-03 6.37E-03 Cognitive empathy 1 1.43E-03 2.20E-03 7.35E-03 Cognitive empathy 0.01 5.75E-01 6.82E-05 8.34E-01 EQ 0.05 8.79E-01 5.01E-06 9.74E-01 EQ 0.1 4.51E-01 1.23E-04 7.22E-01 EQ 0.15 2.39E-01 3.00E-04 5.47E-01 EQ 0.2 3.79E-01 1.68E-04 6.50E-01 EQ 0.5 5.91E-01 6.27E-05 8.34E-01 EQ 0.8 6.69E-01 3.96E-05 8.76E-01 EQ 1 6.51E-01 4.44E-05 8.68E-01 EQ 0.01 2.13E-05* 4.56E-03 1.92E-04 Cognitive aptitude 0.05 3.83E-09* 7.50E-03 5.52E-08 Cognitive aptitude 0.1 1.87E-08* 6.83E-03 1.92E-07 Cognitive aptitude 0.15 6.00E-09* 7.31E-03 7.20E-08 Cognitive aptitude 0.2 3.54E-09* 7.53E-03 5.52E-08 Cognitive aptitude 0.5 1.21E-09* 7.98E-03 4.13E-08 Cognitive aptitude 0.8 1.72E-09* 7.83E-03 4.13E-08 Cognitive aptitude 1 1.61E-09* 7.86E-03 4.13E-08 Cognitive aptitude

*indicates significant polygenic score association after Bonferroni correction (P < 6.94x10-4)

123

Figure 3: Polygenic score results at various P-value thresholds for the Triangles Task

Height of bars (Y-axis) represent the model fit (R2). Numbers above bars represent P-values (FDR corrected). X-axis represents the 8 P-value thresholds. Names of the GWAS datasets provided under the bar graphs.

124

5.4 Discussion We investigated the genetic correlates of first-order theory of mind, using the Triangles Task. In total, 4,577 13-year-olds completed the Triangles Task, making this the largest investigation of genetic correlates of theory of mind at a specific age. At a phenotypic level, the scores on the Triangles Task were normally distributed and we observed a small but significant female-advantage on the Triangles Task. This is similar to what has been observed in other studies of cognitive empathy (Kirkland et al., 2013) and facial expression processing (McClure, 2000).

The current study finds limited evidence for a genetic contribution to the Triangles Task in 13-year-olds in this sample. Genome-wide association analyses did not identify any significant loci at P < 5x10-8. Further, gene-based analyses also did not identify any significant genes. We note here that the current study is statistically underpowered. Previous work on the genetics of cognitive empathy (Chapter 4), which is related to theory of mind had identified that the per-SNP variance explained for the most significant SNP was 0.013% after correcting for winner’s curse (Warrier et al., 2017). Post-hoc power calculations suggest that a sample two orders of magnitude larger than the current sample would be required to identify genome- wide significant loci, if the effect sizes are similar. This, however, is challenging given the nature of the task, which demands that participants spend at least half an hour to complete the task. We also note that we are statistically underpowered to identify significant additive SNP heritability, assuming a true additive SNP of 5% which is similar to the SNP heritability reported in Chapter 5. These calculations preclude us from conducting genetic correlation analyses using the current cohort. It is likely that given a sufficiently large sample size, we will identify both significant SNP heritability and significant loci.

We also investigated if polygenic scores from 6 psychiatric conditions and cognitive aptitude and cognitive empathy predict performance on the Triangles Task. We used PRSice an investigated the predictive power of polygenic scores at eight different P-value thresholds providing reasonable resolution. We note that the sample sizes for the training GWAS set are varied. All the psychiatric conditions had more than 10,000 participants suggesting reasonable statistical power. However, polygenic scores for none of the psychiatric conditions significantly predicted performance on the Triangles Task across the six different P-value thresholds. In contrast, polygenic scores for cognitive empathy as measured using the Eyes

125

Test, and cognitive aptitude significantly predicted variance in the Triangles Task, underscoring previously observed results in Chapter 4. We excluded educational attainment from the current analyses given that the summary GWAS data available for the educational attainment GWAS included participants from the ALSPAC and is likely to increase the probability of false positives.

Our results indicate, that despite the reasonable statistical power of the training dataset, genetic risk for psychiatric conditions do not explain much of the variance in theory of mind ability in adolescents. We speculate that this must be due to different reasons. It is likely that the current task does not capture the entire variance in theory of mind. Indeed, as mentioned earlier, theory of mind is complex and designing a task to capture the intrinsic variance in theory of mind is challenging. The Triangles task only considers first order mental state attributions, and the range of mental states is very limited, compared to for example the ‘Reading the Mind in the Eyes’ test (Baron-Cohen, Wheelwright, Hill, et al., 2001). It is also likely that the difficulties in theory of mind observed in individuals with psychiatric conditions may be due to other processes that mediate theory of mind. Interrogation of the genetic architecture of diverse phenotypes that contribute to social behaviour and theory of mind will help understand how they contribute to genetic risk for various psychiatric conditions. We can also not exclude the possibility that using either a larger training dataset and/or a larger target dataset will help improve the statistical significance of the polygenic score association. Finally, we cannot ignore non-biological contributors to theory of mind. Certainly, twin studies do suggest that for certain theory of mind tasks, the genetic contribution is negligible.

Previous work from our lab (Chapter 4) investigated the genetic architecture of cognitive empathy measured using the ‘Reading the Mind in the Eyes’ test in a sample of more than 88,000 individuals of European ancestry (Warrier et al., 2017). Here we make several distinctions between the current study and the earlier study. First, the Triangle Task requires making inferences about mental states to animate non-social objects, while the Eyes Test requires identifying the mental state from photographs of human eyes. Second, the current study investigates the genetic architecture of Theory of Mind at a specific age of 13-years. This allows for interrogation of the genetic contribution in adolescence when individuals are particularly vulnerable to several psychiatric conditions.

126

In conclusion, we find a small genetic contribution to first-order theory of mind in adolescents. We find limited evidence that genetic variants that contribute to risk for psychiatric conditions predict variance in theory of mind ability in adolescents. However, we do find that genetic variants contributing to cognitive aptitude and cognitive empathy are significantly associated with theory of mind ability in adolescence. We speculate that observed differences in theory of mind in individuals with psychiatric conditions may be due to both biological and non-biological factors, or other biological phenotypes that mediate performance on tasks of theory of mind. In Chapter 6, we turn to explore another social phenotype, social relationship satisfaction, and its genetic underpinnings.

127

6. Genome-wide association meta-analysis of social relationship satisfaction

6.1 Introduction Difficulties in forming and maintaining social relationships are reported widely in psychiatric conditions (Cable, Bartley, Chandola, & Sacker, 2013; Greenberg, Rosenblum, McInnis, & Muzik, 2014; Stevens, McNichol, & Magalhaes, 2009; Teo, Choi, & Valenstein, 2013). Positive social relationship satisfaction can both reduce the risk and ameliorate the severity of a psychiatric condition. In the DSM-5 (American Psychiatric Association, 2013), difficulties in social functioning is one of the criteria for diagnosing conditions such as autism, anorexia nervosa, schizophrenia, bipolar disorder. Social relationship satisfaction is one of the contributors to subjective wellbeing, alongside other domains such as occupational, financial, and health satisfaction (Diener, E., Suh, E. M., Lucas, R. E., & Smith, 1999). However, very little is known about the genetic architecture of social relationship satisfaction, and if social relationship dissatisfaction genetically contributes to risk for psychiatric conditions. To our knowledge, there is no study that has investigated the twin heritability of social relationship satisfaction. A few studies have investigated the heritability of subjective wellbeing and have identified heritability estimates between 38 – 50% (Lykken & Tellegen, 1996; Stubbe, Posthuma, Boomsma, & de Geus, 2005).

Here we report the results of two genome-wide association studies (GWAS) of social relationship satisfaction in the UK Biobank. We focus on friendship satisfaction and family relationship satisfaction, measured using two questionnaires. We leverage the high genetic correlation between the two phenotypes to: 1. Investigate the genetic correlates of friendship and family relationship satisfaction; 2. Identify the genetic correlation between these two phenotypes and select psychiatric conditions and psychological phenotypes; 3. Prioritize associated genes, and enriched gene sets and tissues for the two phenotypes; 4. Quantify cross- phenotype polygenic predictive power of the two phenotypes.

We identify two genetic loci associated with the phenotypes. We further identify significant genetic correlations with several psychiatric conditions, demonstrate an enrichment for brain tissues, and prioritize genes and gene sets for further investigation.

128

6.2 Methods 6.2.1 Phenotypes and participants Participants were individuals from the UK Biobank. Participants were asked “In general, how satisfied are you with your family relationships?” and “In general, how satisfied are you with your friendships?”. Participants could choose one of eight options: “Extremely happy”, “Very happy”, “Moderately happy”, “Moderately unhappy”, “Very unhappy”, “Extremely unhappy”, “Do not know”, and “Prefer not to answer”. We excluded individuals who responded: “Do not know”, or “Prefer not to answer” The phenotypes were coded in three waves. We combined responses from three waves, and removed participants who had responded in more than one wave (N = 178,675 for family and N = 178,721 for friendship). Responses were recoded from 1 to 6 with 1 being “Extremely unhappy”, and 6 being “Extremely happy”. We removed participants who were not genotyped, who were not of “British ancestry”, and had excess relatives, who were outliers for heterozygosity, and whose reported sex did not match their genetic sex. This left us with N = 139,603 for family, and N = 139,826 for friendship. Finally, we removed related individuals from this list resulting in a total of N = 134,681 (family) and N = 134,941 (friendship) unrelated individuals for further analyses. A total of 131,790 individuals were included in both the analyses. At the time of recruitment, participants were between 40 – 69 years of age. All participants provided informed consent to participate in the UK Biobank. In addition, we obtained ethical approval from the University of Cambridge Human Biology Research Ethics committee to use de-identified data from the UK Biobank and ALSPAC for this study.

6.2.2 Genetic analyses Details of the UK Biobank genotyping, imputation, and quality control procedures is available elsewhere (Bycroft et al., 2017). Genetic association was conducted using Plink 2.0 (Purcell et al., 2007) (https://www.cog-genomics.org/plink/2.0/), which supports BGEN v 1.2 files. We included sex, year of birth, genotyping batch and the first 40 genetic principal components as covariates. We excluded SNPs that failed Hardy-Weinberg Equilibrium (P < 1x10-6), had a minor allele frequency < 0.01, and had a per-SNP genotyping rate < 95%, and imputation INFO < 0.1. We further excluded SNPs not in the Haplotype Reference Consortium (http://www.haplotype-reference-consortium.org/) (McCarthy et al., 2016). We excluded individuals who were ancestry outliers, had per-individual genotyping rate < 90%.

129

To increase the effective sample size, we leveraged the high genetic correlation between the two phenotypes to conduct MTAG (Turley et al., 2017) (or multi-trait analysis of GWAS) (https://github.com/omeed-maghzian/mtag/). MTAG is an extension of the standard inverse variance weighted meta-analysis that considers the genetic correlation between phenotypes to estimate phenotype-specific effects. Our phenotypes are excellent for MTAG given both the high genetic correlation between the two phenotypes and the similar mean chi-square. We do not expect the estimates to be biased.

Clumping of the independent SNPs was conducted using Plink (Purcell et al., 2007), using an r2 of 0.6. Winner’s curse correction was conducted using FIQT, which is an FDR based inverse quantile transformation (Bigdeli et al., 2016).

6.2.3 Genetic correlations Genetic correlations were conducted using LD score regression (LDSR) (Bulik- Sullivan, Finucane, et al., 2015; Bulik-Sullivan, Loh, et al., 2015) (https://github.com/bulik/ldsc/wiki). We used north-west European population LD scores, did not constrain the intercept and used the pre-MTAG summary statistics to obtain unbiased genetic correlations. We ran correlations with psychiatric and psychological phenotypes (Table 1). Summary scores were obtained from the PGC (http://www.med.unc.edu/pgc/results-and-downloads), the SSGAC (https://www.thessgac.org/), and the CTG (https://ctg.cncr.nl/software/summary_statistics). In addition, summary statistics for systemizing, self-reported empathy (Chapter 3) (Warrier et al., 2016), and cognitive empathy (Chapter 4) (Warrier et al., 2017) were obtained from 23andMe, Inc. Summary statistics for the iPSYCH autism replication dataset were obtained from the iPSYCH autism team. Given the correlation between the phenotypes, we identified significant genetic correlations using a Benjamini-Hochberg FDR < 0.05, taking into account all the tests conducted for both the phenotypes combined, given the high genetic correlation between the two phenotypes. We also indicate the tests that pass Bonferroni correction in the results section. We note here that Bonferroni correction is likely to be conservative given the positive dependency in the tests conducted. The Benjamini-Hochberg is less conservative than Bonferroni, but more conservative than Benjamini-Yekutieli in light of the positive dependency in the tests conducted.

130

6.2.4 Heritability analyses Heritability analyses were conducted using both LDSR (all participants) (Bulik- Sullivan, Loh, et al., 2015) and GCTA-GREML (Yang et al., 2011) (for computational efficiency, this was conducted on one-fifth of the total participants, N = 26,000). To keep the methods as identical as possible, GCTA heritability was calculated after including sex, age, batch and the first 40 genetic principal components as covariates. Bivariate genetic correlations between the two phenotypes were conducted using both LDSR and GCTA.

6.2.5 Functional annotation Functional annotation was performed on the MTAG-GWAS datasets. We used FUMA (Watanabe, Taskesen, van Bochoven, & Posthuma, 2017) (http://fuma.ctglab.nl/) to identify eQTLs and chromatin interactions for the genome-wide significant results. eQTLs were identified using data from BRAINEAC and GTEx (Ardlie et al., 2015) brain tissues. Chromatin interactions were identified using Hi-C data from the dorsolateral prefrontal cortex, hippocampus, and neural progenitor cells. Significant mapping was identified at a Benjamini- Hochberg FDR < 0.05. Gene-based association analyses was conducted using MAGMA (de Leeuw et al., 2015) within FUMA, and significant genes were identified using a Bonferroni- corrected threshold < 0.05 for each phenotype, as provided within FUMA.

We conducted partitioned heritability (Finucane et al., 2015) analyses using the MTAG- GWAS dataset using the baseline categories and additional categories: genes that are loss-of- function intolerant (Lek et al., 2016), and genes with brain specific chromatin marks (Finucane et al., 2015). Significant enrichments were identified using a Benjamini-Hochberg FDR < 0.05 applied to both the phenotypes combined.

To identify phenotype relevant tissues based on gene expression, we conducted partitioned heritability analyses applied to tissue-specific genes (Finucane et al., 2017). We focussed on GTEx consortium based gene expression from 13 brain regions (Cortex, Anterior cingulate cortex, frontal cortex, cerebellum, cerebellar hemisphere, putamen, nucleus accumbens, caudate, substantia nigra, hippocampus, spinal cord, amygdala, and hypothalamus). Results were significant at a Benjamini-Hochberg FDR < 0.05. In addition, gene expression analyses for general tissue types and specific tissue types were also conducted using FUMA. These two methods are not identical. In partitioned heritability, the top 10% of gene with tissue specific expression (focal brain region vs non-focal brain region) were

131

included with a 100 kb window on either side of the transcribed region. In FUMA, all genes are tested using a linear regression framework. Here, the expression of genes in a particular tissue is regressed against the gene Z-values with the average expression of the gene in all tissues included alongside usual covariates (gene length, SNP density etc). Thus, while FUMA tests all the genes, partitioned heritability tests only the top decile.

Cell specific enrichment (Zhang et al., 2014) was conducted for three cell types (Neurons, astrocytes and oligodendrocytes) using both LDSR partitioned heritability and MAGMA gene set enrichment. Results were FDR corrected (P < 0.05). For MAGMA we used the top 500 specifically expressed genes in each cell type compared to the other cell types.

6.2.6 Polygenic regression analyses Polygenic regression analyses using the MTAG-GWAS results were conducted using PRSice (Euesden et al., 2015) to generate polygenic scores. PRSice is an extension of polygenic scoring in Plink and calculates the average of per-allele scores, which is the allelic regression coefficient in this study. This can be conducted at multiple different thresholds, to generate a polygenic score per individual, which can then be regressed against a phenotype of interest. Polygenic scoring was conducted in approximately 5,600 phenotyped and unrelated individuals from the Avon Longitudinal Study of Parents and Children (Boyd et al., 2013). We considered total scores from three parent-reported child-based questionnaires, collected between the ages of 7 and 10: the Children’s Communication Checklist (CCC) (Bishop, 1998), the Social and Communication Disorder Checklist (Scourfield, Martin, Lewis, & McGuffin, 1999), and the Strengths and Difficulties Questionnaire (SDQ) (Goodman, 1997). These three questionnaires are widely used to assess difficulties in child development. In addition, we also tested five subdomains of the SDQ (prosocial behaviour, hyperactivity, emotional difficulties, peer difficulties and conduct problems). Prosocial behaviour and CCC were reverse coded to indicate difficulties in prosocial behaviour and communication. Polygenic score regression was conducted using a negative binomial model (MASS package in R: https://cran.r- project.org/web/packages/MASS/MASS.pdf) as this provided the best fit. Sex and the first two genetic principal components were included as covariates. Incremental variance explained due to polygenic scores was calculated using Nagelkerke’s pseudo-R2.

6.2.7 ALSPAC cohort Participants

132

All participants included in the polygenic score analyses were from an ongoing study - the Avon Longitudinal Study of Parents and Children (ALSPAC). We queried the data using a fully searchable data dictionary that is available here: http://www.bristol.ac.uk/alspac/researchers/access/. The ALSPAC cohort comprises 14,541 initial pregnancies from women resident in Avon, UK resulting in a total of 13,988 children who were alive at 1 year of age. Pregnant women were recruited from 1st April 1991 to 31st December 1992. In addition to this, children were enrolled in other phases which is described elsewhere (Boyd et al., 2013). In total, 713 additional children were enrolled in this study. The study received ethical approval from the ALSPAC Ethics and Law Committee, and written informed consent was obtained from parent or a responsible legal guardian for the child to participate. Assent was obtained from the child participants where possible. In addition to ethical approval from ALSPAC, we obtain ethical approval from the Human Biology Research Ethics Committee at the University of Cambridge.

Details of genotyping and imputation are provided in section 5.2.2.

Phenotypes For all the phenotypes included in the cross-phenotype polygenic risk score analyses, we used the prorated scores as provided in the ALSPAC datasheet. We considered three main phenotypes (CCC, SDQ and SCDC), as they have been associated with psychiatric conditions. For example, both the CCC (Geurts et al., 2004) and the SCDC (Robinson et al., 2016) have been associated with autism. Further, the SDQ has been linked to ADHD and conduct problems (Goodman, Ford, Simmons, Gatward, & Meltzer, 2000). In addition to the total SDQ scores, we also considered the SDQ subscales as they capture difficulties in specific domains. Scores on these measures have been provided for multiple different age groups in the ALSPAC. We considered the age group with the highest number of participants for whom phenotypic data was available, which was, for all phenotypes, the youngest age group. Further details of the SCDC in particular are provided in Chapter 7, where we conduct a GWAS of the log- transformed SCDC scores to identify genetic correlations with the Systemizing Quotient- Revised.

133

6.3 Results 6.3.1 Phenotypic distributions Both phenotypes had a unimodal, near-normal distribution (Figure 1), with skew and excess kurtosis between ±1. The Spearman’s rank correlation between friendship and family relationship satisfaction was only 0.52 (P < 2.2x10-16) (Figure 2), suggesting that they are not identical phenotypes. This allowed us to investigate the genetic architecture of the two phenotypes separately. Both sex and age were significantly associated with the two phenotypes, though the effect was small (Figures 3 and 4).

Figure 1: Phenotypic distributions of family and friendship relationship satisfaction

Histograms plotting the distribution of family relationship satisfaction (A and B), and friendship satisfaction (C and D). Blue bar indicates scores for females, and pink for males. Plots B and D show stacked frequency histograms. Scores range from 1 to 6, with 1 corresponding to extremely unhappy and 6 corresponding to extremely happy.

134

Figure 2: Spearman’s rank correlation between phenotypic distributions of friendship and family relationship satisfaction

Each circle shows the overlap between scores on the two phenotypes. Larger the circle, larger the overlap. Spearman’s rank correlation = 0.52 (P < 2.2x10-16).

135

Figure 3: Difference in scores for friendship and family relationship satisfaction based on sex

Mean scores for family relationship (left) and friendship (right) satisfaction shown above. Lines represent standard deviations. There was a small, but significant difference between males and females (Two tailed unpaired T-test; P < 0.001 for both the phenotypes). For family relationship satisfaction, males scored significantly higher than females. Mean (males) = 4.81, sd = 0.89; Mean (females) = 4.79; sd = 0.87. The Cohen’s D for the T-test was small (0.02). For friendship satisfaction, males scored significantly lower than females. Mean (males) = 4.68, sd = 0.74; Mean (females) = 4.84, sd = 0.71. The Cohen’s D for the T-test was larger than that of family relationship satisfaction, but still small (0.22). The test was conducted in N = 131,790 individuals who completed both the questionnaires. N = 70,809 for females. N = 60,981 for males.

136

Figure 4: Difference in scores for friendship and family relationship satisfaction based on age

Scatterplot showing the relationship between year of birth and family relationship satisfaction (above) and friendship satisfaction (below). Age predicted a small proportion of the variance in family satisfaction (Beta = -0.009±0.0003 P < 2.2x10-16). Age also predicted a small proportion of the variance in friendship satisfaction (Beta = -0.007±0.0002; P < 2.2x10-16).

137

6.3.2 Genetic correlation Despite the modest phenotypic correlation, genetic correlation between family -16 relationship satisfaction and friendship satisfaction was high (rg = 0.87±0.03; P < 2.2x10 ), suggesting a similar genetic architecture between the two phenotypes. We observed a similarly -16 high bivariate genetic correlation using GCTA REML (rg = 1 ± 0.11; P < 2.2x10 ) in our subsample of participants. This similarity was reflected in the largely similar genetic correlation between the two phenotypes and other psychiatric and psychological phenotypes. After FDR correction, we identified significant negative genetic correlations between family relationship satisfaction and six of the seven psychiatric conditions tested (anxiety, autism, anorexia nervosa, bipolar disorder, depression, and schizophrenia) and between friendship satisfaction and four of the seven psychiatric conditions (autism, bipolar disorder, depression, and schizophrenia). We replicated the negative genetic correlations with autism using a separate autism GWAS dataset (autism_iPSYCH). Underscoring the contribution to the two phenotypes to most psychiatric conditions, we also identified significant negative genetic correlation for both phenotypes and cross-disorder psychiatric GWAS. Notably, the correlations between autism, schizophrenia, the cross-disorder GWAS, and depression are also significant after using a more-stringent and highly conservative Bonferroni correction. For cognitive and psychological phenotypes, we identified negative genetic correlations for both phenotypes and educational attainment, cognitive aptitude, depressive symptoms and neuroticism. In contrast, we identified significant positive genetic correlations between both phenotypes and conscientiousness, empathy, extraversion, and subjective wellbeing. Genetic correlations are provided in Figure 5 and Table 1.

138

Figure 5: Genetic correlations

Genetic correlations and 95% confidence intervals provided for the two phenotypes (red is family relationship satisfaction, blue is friendship satisfaction). Phenotypes tested are on the y axis. Magnitude of the genetic correlation is on the x axis.

139

Table 1: Genetic correlation for the two phenotypes

Family Friendship Genetic SE P FDR- Genetic SE P FDR- Sample PMID correlation adjusted P correlation adjusted P Size ADHD -0.08 0.05 1.20E-01 1.50E-01 0.07 0.05 1.48E-01 1.74E-01 55374 NA Anorexia -0.21 0.08 9.00E-03 1.29E-02 -0.12 0.08 1.40E-01 1.70E-01 14477 28494655 Anxiety -0.38 0.14 6.59E-03 9.76E-03 -0.22 0.13 8.74E-02 1.13E-01 17310 26754954 Autism_PGC -0.21 0.06 1.00E-03* 1.74E-03 -0.25 0.06 9.36E-05 1.87E-04* 16350 28540026 Autism_iPSYCH -0.37 0.06 5.36E-10* 1.95E-09 -0.38 0.05 1.85E-12 1.23E-11* 19142 NA Bipolar Disorder -0.28 0.07 6.30E-05* 1.33E-04 -0.18 0.07 1.03E-02 1.42E-02 16731 21926972 Depression -0.36 0.1 5.02E-04* 9.56E-04 -0.34 0.12 2.56E-03 4.27E-03 18759 22472876 Schizophrenia -0.37 0.03 3.37E-22* 3.37E-21 -0.31 0.03 2.05E-17 1.64E-16* 77096 25056061 PGC-crossdisorder -0.4 0.06 3.02E-11* 1.51E-10 -0.4 0.06 2.29E-11 1.31E-10* 61220 23453885 Cognitive aptitude -0.1 0.05 3.00E-02 4.00E-02 -0.2 0.04 3.69E-05 8.20E-05* 78308 28530673 Conscientiousness 0.4 0.14 3.90E-03 6.00E-03 0.53 0.16 6.95E-04 1.26E-03 17375 21173776 Depressive symptoms -0.54 0.05 1.51E-22* 2.01E-21 -0.4 0.06 5.06E-11 2.25E-10* 161460 27089181 Edu attainment -0.15 0.03 1.42E-05* 3.34E-05 -0.22 0.03 4.08E-10 1.63E-09* 293723 27225129 Empathy 0.33 0.07 3.03E-06* 7.58E-06 0.44 0.07 1.31E-09 4.37E-09* 46861 NA Extraversion 0.23 0.08 3.00E-03 4.80E-03 0.54 0.09 4.87E-09 1.50E-08* 63030 26362575 Neuroticism -0.45 0.09 1.01E-06* 2.69E-06 -0.49 0.09 2.50E-07 7.14E-07* 170911 27089181 Openness -0.09 0.11 3.70E-01 3.89E-01 -0.15 0.11 1.80E-01 2.06E-01 17375 21173776 Cognitive empathy -0.05 0.06 3.52E-01 3.81E-01 -0.06 0.06 3.32E-01 3.69E-01 89553 28584286 Subjective Well being 0.65 0.05 1.82E-33* 7.28E-32 0.67 0.05 1.80E-30 3.60E-29* 298420 27089181 SQ-R -0.04 0.06 4.70E-01 4.70E-01 0.05 0.07 4.20E-01 4.31E-01 51564 NA This table provides the results of the genetic correlation analyses. *indicates genetic correlations that were significant after using a more conservative Bonferroni correction (P < 1.25x10-3). SE = Standard error, P = P-value, PMID = Pubmed ID

140

6.3.3 SNP heritability LDSR based SNP-heritability was 0.065 (0.005) for friendship satisfaction and 0.066 (0.005) for family relationship satisfaction. GCTA based heritability for a subset of the participants identified a significant heritability of 0.053 (0.014) for friendship satisfaction and 0.056 (0.014) for family relationship satisfaction (Figure 6 and Table 2).

Table 2: Additive SNP heritability Phenotype Method h2 SE Family LDSR 0.065 0.005 Family GCTA 0.053 0.014 Friendship LDSR 0.066 0.005 Friendship GCTA 0.056 0.014 This table provides the results of the additive SNP heritability estimated by LDSR and GCTA. h2 = Additive SNP heritability. SE = Standard error.

Figure 6: Additive SNP heritability for family relationship and friendship satisfaction

Additive SNP heritability for family relationship satisfaction (left) and friendship satisfaction (right). Heritability estimates and 95% confidence intervals provided. We used two different methods to calculate additive SNP heritability: LD score regression and GCTA genomic- relatedness-based restricted maximum likelihood. Heritability estimates for the former was calculated using the full sample (N > 130,000 for both phenotypes). Heritability estimates for the latter was calculated using one-fifth of the full sample (N = 26,000 for both phenotypes), for computational efficiency. For both methods we included sex, batch, year of birth and the first forty genetic principal components as covariates. Heritability estimates were similar and significant for both methods.

141

6.3.4 Genetic association analyses Genome-wide association analyses did not identify any significant results. Manhattan and QQ-plots are provided in Figure 7. The λGC was 1.15 for both phenotypes. However, the LDSR intercept was 1.0018 (0.007) for family relationship satisfaction and 1.0002 (0.0085) for friendship satisfaction, suggesting negligible inflation due to population stratification. The high genetic correlation between the two phenotypes was supported by concordant effect direction for all the 13 independent loci with P < 1x10-6 for the two phenotypes (Figure 8). Six of these had significant P-values in the other phenotype after Bonferroni correction (Figure 8 and Table 3).

Given the similar mean chi-square of the two phenotypes (1.13 for family relationship satisfaction and 1.14 for friendship satisfaction) and the high genetic correlation, we conducted a modified inverse variance weighted meta-analysis (MTAG). This increased the effective sample size to 164,112 for family relationship satisfaction and over 158,166 for friendship satisfaction. There was a high genetic correlation between the pre-MTAG and post-MTAG -16 - GWAS datasets (Family: 0.98±0.005, P < 2.2x10 ; Friendship: rg = 0.98±0.004; P < 2.2x10 16). We identified two significant loci associated with both the phenotypes (Table 4). On Chromosome 3, the lead SNP was rs1483617 for both phenotypes (P = 1.6x10-8 for family relationship satisfaction and 4.55x10-8 for friendship satisfaction). In addition, on chromosome 6, we identified another locus with rs2189373 as the lead SNP for both phenotypes (P = 1.06x10-8 for family and P = 4.21x10-8 for friendship). The top-SNPs explained 0.019% of the variance for both phenotypes, which reduced to 0.0027% after correcting for winner’s curse. Manhattan and QQ-plots for the GWAS are provided in Figure 9.

142

Figure 7: Manhattan and QQ-plot for pre-MTAG GWAS

Manhattan plots for family relationship satisfaction (top), and friendship satisfaction (middle). QQplots for family relationship satisfaction and friendship satisfaction below.

143

Figure 8: Direction and P-values of all independent SNPs with P < 10-6 in the pre- MTAG family relationship satisfaction and friendship satisfaction GWAS

Regression Beta and 95% CI provided for all independent SNPs with P < 1x10-6. Blue lines indicate effect estimate and CI for friendship satisfaction. Red lines indicate effect estimate and 95%CI for family relationship satisfaction. P-values provided on top of the CI lines. We conducted 13 independent tests (replicating 13 independent SNPs). A Bonferroni corrected significant threshold is 0.0038.

144

Table 3: 13 independent loci in the pre -MTAG GWAS Friendship Family SNP Chr BP Effect_allele Non_effect_allele Effect SE P Effect SE P rs6732220 2 49222872 G C 0.011 0.003 3.00E-04 0.021 0.004 1.24E-07 rs139054935 2 52497293 A C 0.063 0.012 5.25E-07 0.036 0.015 1.83E-02 rs9846892 3 81065799 G A 0.015 0.003 1.55E-07 0.010 0.003 2.50E-03 rs4146336 3 117493600 A C -0.015 0.003 6.96E-07 -0.009 0.004 9.00E-03 rs1471695 3 175965350 A G -0.013 0.003 5.94E-06 -0.018 0.004 2.20E-07 rs72847141 6 28722114 G T 0.019 0.004 7.90E-07 0.015 0.005 1.70E-03 rs9482120 6 98392667 C A -0.015 0.003 1.01E-07 -0.009 0.004 1.10E-02 rs79937798 8 17730682 C T 0.039 0.008 9.48E-07 0.013 0.010 1.70E-01 rs10770042 11 9607032 A G 0.019 0.004 3.30E-07 0.006 0.005 1.80E-01 rs147087791 11 92551205 A G -0.049 0.009 1.99E-07 -0.038 0.011 9.30E-04 rs1785039 11 127900579 A G -0.018 0.004 4.00E-07 -0.011 0.004 1.10E-02 rs56167781 12 120146026 A T -0.026 0.007 1.00E-03 -0.047 0.010 8.92E-07 rs117479788 16 46764892 A G 0.032 0.006 7.12E-07 0.012 0.008 1.50E-01 Chr = Chromosome, BP = Basepair position, SE = Standard error, P = P-value. Italicized effect sizes, SEs and P-values represent replication in the second phenotype. Significant results are written in bold (Bonferroni correction).

Table 4: Significant SNPs in the MTAG GWAS Family (MTAG) Friendship (MTAG) SNP Info Chr BP EA OA MAF Beta SE P Beta SE P WC R2 rs2189373 0.99 6 30223428 C T 0.21 -0.026 0.0046 1.06E-08 -0.025 0.0046 4.21E-08 0.002723 rs1483617 0.99 3 175960305 G A 0.39 -0.019 0.0035 1.6E-08 -0.019 0.0035 4.15E-08 0.002723 Chr = Chromosome, BP = Basepair position, EA = effect allele, OA = other allele, MAF = minor allele frequency, SE = Standard error, P = P- value, WC R2 = winner’s curse corrected variance explained.

145

Figure 9: Manhattan and QQ plots

Manhattan and QQ plots of the two MTAG-GWAS datasets. A: Manhattan plot of family relationship satisfaction (Neffective = 164,112). B: QQ plot of family relationship satisfaction (LDSR intercept = 0.99(0.0084); λGC = 1.18). C. Manhattan plot for friendship satisfaction Neffective = 158,116). D QQplot for friendship satisfaction (LDSR intercept = 0.99(0.0087); λGC = 1.17). Significant SNPs have been labelled on the Manhattan plots in red.

146

6.3.5 Functional annotation We investigated genes associated with the two loci using eQTL and chromatin interaction analysis in brain tissues (Appendix: Tables 12 and 13). The analyses prioritized several genes including NLGN1 which is involved in the postysynaptic complex in excitatory synapses (rs1483617) and interacts with NRXN1, a gene implicated in schizophrenia (Ching et al., 2010; Levinson et al., 2011), autism (Sanders et al., 2015), and intellectual disability (Schaaf et al., 2012). A second gene identified, KCNMB2, modulates the calcium sensitivity in the BK channels. The second locus on chromosome 6 (rs2189373) is in the MHC complex. It is an intergenic SNP in HCG17, a gene which has been implicated in schizophrenia (Fromer et al., 2016). SNPs in high LD (r2 > 0.8) with the lead SNP have been implicated in schizophrenia (rs2021722, P = 2.2 × 10−12) (Ripke et al., 2011) (rs2523722 , P = 1.47x10-16) (Irish Schizophrenia Genomics Consortium and the Wellcome Trust Case Control Consortium 2, 2012) and in the cross-disorder psychiatric GWAS (rs2517614, P = 8.9x10-7) (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). When comparing the two previous studies and our current study, the effect allele that increased the risk for the psychiatric condition is in the same haplotype as the effect allele that decreased social relationship satisfaction. We note here that this region was not genotyped in the latest GWAS of schizophrenia (Ripke et al., 2014), and no SNP in r2 > 0.8 with rs2189373 was available in this schizophrenia summary statistics. eQTL and CCC identified several genes in this locus. Notably, one of the genes identified (ZNF184) was also prioritized in a recent large-scale GWAS on neuroticism (Nagel et al., 2017). Figure 10 provides the circos plots for the two loci. Figures 11 and 12 provide the local LD plots for the two phenotypes.

147

Figure 10: Circos plots showing chromosome interactions and eQTLs

Circos plots showing eQTLs (green lines) and chromosome interactions (orange lines) for the two significant loci. A: Circos plot for rs1483617. B: Circos plot for rs2189373. The outer ring shows a modified Manhattan plot with -log10 P-values. Blue regions identify genomic risk regions. Names of genes with significant chromosome interactions or eQTL interactions are provided in the middle circle. Red regions indicate regions implicated by both eQTL and chromosome interactions.

148

Figure 11: Regional LD plot for family relationship satisfaction

149

Figure 12: Regional LD plot for the friendship satisfaction

150

Using two different methods (FUMA and LDSR-partitioned heritability), we identified high enrichment in brain tissues including the pituitary, pointing to the significant role of neural tissues for both phenotypes (Figures 13 and 14, Tables 5 - 7). Brain specific chromatin modifications showed a significant, 3-fold enrichment using LDSR, accounting for nearly a third of the heritability (Tables 5 and 6). In addition, both methods identified significant enrichment for genes with PLI > 0.9 (Family: Partitioned-h2 P = 6.81x10-7, MAGMA P = 3.06x10-4; Friendship: Partitioned-h2 P = 1.56x10-6; MAGMA_P = 9.62x10-5) (Tables 5 and 6). Baseline partitioned heritability also identified enrichment for conserved regions and H3K4me1 modifications for both phenotypes.

We investigated enrichment in specific brain regions using FUMA and partitioned heritability (Figures 13 and 14, Table 7). FUMA identified a significant enrichment for the cerebellum. In contrast, partitioned heritability identified a significant enrichment for the cortex and the anterior cingulate cortex. This is likely to be due to methodological differences in how the enrichments are calculated. Partitioned heritability is more likely to capture region specific gene expression in the brain due to how region specific expression is calculated. Cell type specific analysis did not identify a significant enrichment for any specific brain types, using both partitioned heritability and MAGMA (Tables 8 and 9).

Gene-based analyses using MAGMA identified thirteen significant genes for family relationship satisfaction and 12 significant genes for friendship satisfaction (Appendix: Tables 14 and 15). The top gene in both the phenotypes was SHISA5, a gene that encodes a transmembrane protein on the endoplasmic reticulum.

151

Figure 13: General tissue enrichment (FUMA)

FUMA general tissue enrichment plots for family relationship satisfaction (above) and friendship relationship satisfaction (below). X axis indicates tissues, and Y axis indicated enrichment, with higher bars representing greater enrichment. Dotted line indicates Bonferroni corrected threshold. Gene expression data was obtained from the GTEx consortium.

152

Figure 14: Specific tissue enrichment (FUMA)

FUMA specific tissue enrichment plots for family relationship satisfaction (above) and friendship satisfaction (below). X axis indicates tissues, and Y axis indicated enrichment, with higher bars representing greater enrichment. Red bars indicate significant enrichment after Bonferroni correction. Dotted line indicates Bonferroni corrected threshold. Gene expression data was obtained from the GTEx consortium.

153

Table 5: Partitioned heritability analyses for the Family GWAS Category Pro Prop Pro Enrich SE P FDR P p h2 p h2 SNP SE PLI 0.10 0.21 0.02 2.08 0.24 6.81E-07 7.35E-05 Conserved_LindbladToh_0 0.03 0.34 0.08 13.09 2.90 5.31E-05 1.43E-03 Brain 0.08 0.28 0.06 3.55 0.69 2.28E-04 4.92E-03 H3K4me1_Trynka.extend.500_0 0.61 0.89 0.08 1.45 0.13 6.47E-04 9.36E-03 TSS_Hoffman.extend.500_0 0.03 0.17 0.05 4.91 1.36 5.20E-03 5.62E-02 DHS_Trynka.extend.500_0 0.50 0.79 0.12 1.58 0.25 2.27E-02 2.23E-01 Conserved_LindbladToh.extend.500_0 0.33 0.54 0.09 1.61 0.27 2.56E-02 2.30E-01 DHS_Trynka_0 0.17 -0.16 0.18 -0.97 1.06 6.36E-02 4.04E-01 UTR_3_UCSC_0 0.01 0.07 0.03 6.44 3.05 7.32E-02 4.17E-01 TSS_Hoffman_0 0.02 0.10 0.05 5.70 2.74 8.36E-02 4.51E-01 DHS_peaks_Trynka_0 0.11 -0.18 0.17 -1.58 1.50 8.77E-02 4.51E-01 SuperEnhancer_Hnisz_0 0.17 0.23 0.04 1.36 0.23 1.37E-01 6.43E-01 H3K9ac_Trynka.extend.500_0 0.23 0.34 0.09 1.46 0.37 2.16E-01 7.73E-01 DGF_ENCODE.extend.500_0 0.54 0.40 0.12 0.74 0.22 2.45E-01 7.73E-01 Transcribed_Hoffman_0 0.35 0.49 0.13 1.41 0.37 2.55E-01 7.73E-01 Promoter_UCSC.extend.500_0 0.04 -0.01 0.04 -0.17 1.05 2.63E-01 7.73E-01 H3K4me1_Trynka_0 0.43 0.58 0.14 1.36 0.32 2.63E-01 7.73E-01 H3K9ac_Trynka_0 0.13 0.24 0.10 1.87 0.79 2.69E-01 7.73E-01 H3K27ac_PGC2.extend.500_0 0.34 0.44 0.09 1.30 0.27 2.73E-01 7.73E-01 Repressed_Hoffman.extend.500_0 0.72 0.67 0.05 0.93 0.06 3.05E-01 7.84E-01 Enhancer_Hoffman.extend.500_0 0.15 0.07 0.08 0.46 0.53 3.09E-01 7.84E-01 Intron_UCSC_0 0.39 0.44 0.05 1.13 0.13 3.11E-01 7.84E-01 H3K4me1_peaks_Trynka_0 0.17 0.32 0.15 1.86 0.88 3.25E-01 7.84E-01 SuperEnhancer_Hnisz.extend.500_0 0.17 0.21 0.04 1.22 0.22 3.31E-01 7.84E-01 CTCF_Hoffman_0 0.02 -0.04 0.07 -1.83 3.09 3.54E-01 7.84E-01 Coding_UCSC_0 0.01 0.06 0.05 3.90 3.21 3.65E-01 7.84E-01 Transcribed_Hoffman.extend.500_0 0.76 0.83 0.08 1.09 0.10 3.81E-01 7.84E-01 TFBS_ENCODE.extend.500_0 0.34 0.24 0.13 0.69 0.38 4.02E-01 7.84E-01 Enhancer_Andersson_0 0.00 0.03 0.03 6.72 6.95 4.14E-01 7.84E-01 WeakEnhancer_Hoffman_0 0.02 0.07 0.07 3.53 3.12 4.20E-01 7.84E-01 WeakEnhancer_Hoffman.extend.500_0 0.09 0.03 0.08 0.39 0.85 4.75E-01 8.41E-01 PromoterFlanking_Hoffman.extend.500_ 0.03 0.08 0.06 2.24 1.84 5.03E-01 8.45E-01 0 UTR_5_UCSC.extend.500_0 0.03 0.00 0.04 0.08 1.45 5.30E-01 8.45E-01 PromoterFlanking_Hoffman_0 0.01 -0.02 0.05 -2.49 5.72 5.43E-01 8.45E-01

154

DGF_ENCODE_0 0.14 0.04 0.16 0.31 1.14 5.48E-01 8.45E-01 Enhancer_Andersson.extend.500_0 0.02 0.04 0.04 2.30 2.31 5.76E-01 8.72E-01 TFBS_ENCODE_0 0.13 0.05 0.15 0.37 1.15 5.81E-01 8.72E-01 H3K27ac_Hnisz_0 0.39 0.42 0.06 1.08 0.16 6.35E-01 9.17E-01 H3K27ac_PGC2_0 0.27 0.22 0.10 0.83 0.38 6.51E-01 9.17E-01 UTR_3_UCSC.extend.500_0 0.03 0.01 0.04 0.41 1.35 6.64E-01 9.17E-01 H3K9ac_peaks_Trynka_0 0.04 0.07 0.09 1.91 2.29 6.90E-01 9.17E-01 FetalDHS_Trynka_0 0.08 0.04 0.12 0.51 1.43 7.30E-01 9.17E-01 Coding_UCSC.extend.500_0 0.06 0.08 0.05 1.25 0.73 7.33E-01 9.17E-01 Intron_UCSC.extend.500_0 0.40 0.41 0.04 1.03 0.09 7.36E-01 9.17E-01 H3K27ac_Hnisz.extend.500_0 0.42 0.44 0.08 1.05 0.19 7.95E-01 9.53E-01 Promoter_UCSC_0 0.03 0.05 0.06 1.46 1.82 8.00E-01 9.53E-01 H3K4me3_Trynka_0 0.13 0.11 0.08 0.85 0.62 8.13E-01 9.53E-01 Enhancer_Hoffman_0 0.06 0.05 0.09 0.72 1.43 8.47E-01 9.53E-01 FetalDHS_Trynka.extend.500_0 0.29 0.27 0.12 0.93 0.42 8.69E-01 9.62E-01 H3K4me3_Trynka.extend.500_0 0.26 0.27 0.09 1.05 0.35 8.86E-01 9.62E-01 UTR_5_UCSC_0 0.01 0.00 0.02 0.39 4.47 8.91E-01 9.62E-01 Repressed_Hoffman_0 0.46 0.47 0.15 1.02 0.33 9.46E-01 9.98E-01 CTCF_Hoffman.extend.500_0 0.07 0.07 0.08 0.97 1.16 9.79E-01 9.98E-01 H3K4me3_peaks_Trynka_0 0.04 0.04 0.08 1.03 1.98 9.88E-01 9.98E-01 base_0 1.00 1.00 0.00 1.00 0.00 NA NA This table provides the results of the partitioned heritability analyses for the Family satisfaction GWAS. P = P-value, Enrich = enrichment, SE = Standard error, h2 = heritability.

155

Table 6: Partitioned heritability analyses for the Friendship GWAS Category Prop Prop Prop Enrich SE P FDR P SNP h2 h2 SE PLI 0.10 0.20 0.02 2.03 0.23 1.56E-06 8.42E-05 Conserved_LindbladToh_0 0.03 0.37 0.08 14.38 2.91 1.09E-05 3.92E-04 Brain 0.08 0.27 0.05 3.41 0.68 5.52E-04 9.36E-03 H3K4me1_Trynka.extend.500_0 0.61 0.88 0.08 1.45 0.13 6.93E-04 9.36E-03 TSS_Hoffman.extend.500_0 0.03 0.18 0.05 5.09 1.41 4.50E-03 5.40E-02 Conserved_LindbladToh.extend.500_0 0.33 0.55 0.10 1.64 0.29 2.77E-02 2.30E-01 DHS_Trynka.extend.500_0 0.50 0.76 0.13 1.53 0.25 3.91E-02 3.02E-01 TSS_Hoffman_0 0.02 0.12 0.05 6.46 2.73 4.47E-02 3.22E-01 DHS_Trynka_0 0.17 -0.1 0.18 -0.98 1.05 6.10E-02 4.04E-01 UTR_3_UCSC_0 0.01 0.07 0.03 6.09 2.87 7.34E-02 4.17E-01 SuperEnhancer_Hnisz_0 0.17 0.23 0.04 1.38 0.23 1.16E-01 5.69E-01 DHS_peaks_Trynka_0 0.11 -0.1 0.16 -1.04 1.44 1.59E-01 7.16E-01 H3K27ac_PGC2.extend.500_0 0.34 0.46 0.09 1.37 0.27 1.73E-01 7.47E-01 H3K4me1_Trynka_0 0.43 0.60 0.14 1.40 0.33 2.17E-01 7.73E-01 H3K9ac_Trynka_0 0.13 0.24 0.10 1.91 0.75 2.23E-01 7.73E-01 Coding_UCSC_0 0.01 0.07 0.05 4.82 3.29 2.46E-01 7.73E-01 Transcribed_Hoffman_0 0.35 0.49 0.13 1.41 0.37 2.62E-01 7.73E-01 H3K9ac_Trynka.extend.500_0 0.23 0.33 0.09 1.42 0.38 2.72E-01 7.73E-01 CTCF_Hoffman_0 0.02 -.06 0.08 -2.49 3.26 2.79E-01 7.73E-01 SuperEnhancer_Hnisz.extend.500_0 0.17 0.21 0.04 1.23 0.21 2.79E-01 7.73E-01 WeakEnhancer_Hoffman_0 0.02 0.09 0.07 4.18 3.16 3.16E-01 7.84E-01 Repressed_Hoffman.extend.500_0 0.72 0.67 0.05 0.94 0.07 3.38E-01 7.84E-01 Transcribed_Hoffman.extend.500_0 0.76 0.83 0.08 1.09 0.10 3.60E-01 7.84E-01 Coding_UCSC.extend.500_0 0.06 0.11 0.05 1.67 0.74 3.63E-01 7.84E-01 Enhancer_Andersson_0 0.00 0.03 0.03 7.27 6.98 3.72E-01 7.84E-01 H3K4me1_peaks_Trynka_0 0.17 0.30 0.15 1.74 0.86 3.90E-01 7.84E-01 Intron_UCSC_0 0.39 0.43 0.05 1.10 0.13 4.17E-01 7.84E-01 DGF_ENCODE.extend.500_0 0.54 0.45 0.12 0.83 0.22 4.21E-01 7.84E-01 WeakEnhancer_Hoffman.extend.500_0 0.09 0.03 0.08 0.32 0.86 4.30E-01 7.87E-01 PromoterFlanking_Hoffman.extend.500_ 0.03 0.08 0.06 2.35 1.84 4.68E-01 8.41E-01 0 TFBS_ENCODE.extend.500_0 0.34 0.26 0.13 0.74 0.38 4.95E-01 8.45E-01 DGF_ENCODE_0 0.14 0.04 0.15 0.28 1.08 5.06E-01 8.45E-01 PromoterFlanking_Hoffman_0 0.01 -.02 0.05 -2.75 5.69 5.12E-01 8.45E-01 Enhancer_Hoffman.extend.500_0 0.15 0.10 0.08 0.68 0.52 5.30E-01 8.45E-01 Enhancer_Andersson.extend.500_0 0.02 0.05 0.04 2.36 2.25 5.46E-01 8.45E-01

156

Promoter_UCSC_0 0.03 0.06 0.06 1.77 1.93 6.91E-01 9.17E-01 TFBS_ENCODE_0 0.13 0.07 0.15 0.55 1.16 6.98E-01 9.17E-01 H3K27ac_Hnisz_0 0.39 0.42 0.06 1.06 0.16 7.04E-01 9.17E-01 Intron_UCSC.extend.500_0 0.40 0.41 0.04 1.04 0.10 7.09E-01 9.17E-01 Promoter_UCSC.extend.500_0 0.04 0.02 0.04 0.59 1.12 7.14E-01 9.17E-01 H3K27ac_PGC2_0 0.27 0.23 0.10 0.86 0.38 7.16E-01 9.17E-01 UTR_3_UCSC.extend.500_0 0.03 0.01 0.04 0.54 1.35 7.32E-01 9.17E-01 UTR_5_UCSC_0 0.01 0.01 0.02 2.50 4.51 7.39E-01 9.17E-01 H3K27ac_Hnisz.extend.500_0 0.42 0.45 0.08 1.06 0.18 7.64E-01 9.38E-01 FetalDHS_Trynka_0 0.08 0.06 0.12 0.67 1.41 8.15E-01 9.53E-01 Repressed_Hoffman_0 0.46 0.43 0.15 0.93 0.33 8.26E-01 9.53E-01 H3K4me3_Trynka_0 0.13 0.12 0.08 0.87 0.62 8.36E-01 9.53E-01 UTR_5_UCSC.extend.500_0 0.03 0.02 0.04 0.70 1.49 8.40E-01 9.53E-01 H3K4me3_Trynka.extend.500_0 0.26 0.27 0.09 1.05 0.34 8.75E-01 9.62E-01 FetalDHS_Trynka.extend.500_0 0.29 0.29 0.12 1.02 0.41 9.64E-01 9.98E-01 H3K4me3_peaks_Trynka_0 0.04 0.04 0.08 1.05 1.90 9.79E-01 9.98E-01 CTCF_Hoffman.extend.500_0 0.07 0.07 0.08 0.99 1.11 9.94E-01 9.98E-01 Enhancer_Hoffman_0 0.06 0.06 0.09 1.01 1.47 9.95E-01 9.98E-01 H3K9ac_peaks_Trynka_0 0.04 0.04 0.09 1.00 2.26 9.98E-01 9.98E-01 base_0 1.00 1.00 0.00 1.00 0.00 NA NA This table provides the results of the partitioned heritability analyses for the friendship satisfaction GWAS. P = P-value, Enrich = enrichment, SE = Standard error, h2 = heritability.

157

Table 7: Partitioned heritability analyses for tissue-specific expression in the brain Phenotype Name Coefficient Coefficient Coefficient FDR P SE P Family Brain_Cortex 5.41E-09 1.67E-09 5.86E-04 1.52E-02 Family Brain_Anterior_cingulate_cortex_BA24 3.65E-09 1.41E-09 4.88E-03 3.17E-02 Family Brain_Frontal_Cortex_BA9 3.31E-09 1.40E-09 8.88E-03 4.62E-02 Family Brain_Cerebellar_Hemisphere 2.69E-09 1.84E-09 7.15E-02 2.07E-01 Family Brain_Cerebellum 2.30E-09 2.05E-09 1.30E-01 3.38E-01 Family Brain_Putamen_(basal_ganglia) 8.49E-10 1.78E-09 3.17E-01 7.49E-01 Family Brain_Nucleus_accumbens_(basal_ganglia) 4.34E-10 1.68E-09 3.98E-01 8.62E-01 Family Brain_Caudate_(basal_ganglia) -3.13E-10 1.70E-09 5.73E-01 9.93E-01 Family Brain_Substantia_nigra -1.37E-09 1.62E-09 8.01E-01 9.97E-01 Family Brain_Hippocampus -1.64E-09 1.56E-09 8.54E-01 9.97E-01 Family Brain_Spinal_cord_(cervical_c-1) -2.14E-09 1.65E-09 9.03E-01 9.97E-01 Family Brain_Amygdala -2.52E-09 1.62E-09 9.40E-01 9.97E-01 Family Brain_Hypothalamus -3.79E-09 1.39E-09 9.97E-01 9.97E-01 Friendship Brain_Cortex 4.88E-09 1.72E-09 2.33E-03 2.74E-02 Friendship Brain_Anterior_cingulate_cortex_BA24 3.95E-09 1.45E-09 3.17E-03 2.74E-02 Friendship Brain_Frontal_Cortex_BA9 3.28E-09 1.48E-09 1.32E-02 5.72E-02 Friendship Brain_Cerebellar_Hemisphere 3.58E-09 1.89E-09 2.92E-02 1.09E-01 Friendship Brain_Cerebellum 3.06E-09 2.08E-09 7.00E-02 2.07E-01 Friendship Brain_Nucleus_accumbens_(basal_ganglia) -9.09E-11 1.78E-09 5.20E-01 9.93E-01 Friendship Brain_Putamen_(basal_ganglia) -2.10E-10 1.89E-09 5.44E-01 9.93E-01 Friendship Brain_Caudate_(basal_ganglia) -6.67E-10 1.84E-09 6.41E-01 9.97E-01 Friendship Brain_Substantia_nigra -1.63E-09 1.71E-09 8.30E-01 9.97E-01 Friendship Brain_Amygdala -2.23E-09 1.72E-09 9.02E-01 9.97E-01 Friendship Brain_Hippocampus -2.20E-09 1.60E-09 9.15E-01 9.97E-01 Friendship Brain_Spinal_cord_(cervical_c-1) -2.63E-09 1.73E-09 9.36E-01 9.97E-01 Friendship Brain_Hypothalamus -3.88E-09 1.48E-09 9.96E-01 9.97E-01

158

Table 8: Cell type specific enrichment partitioned heritability Phenotype Name Coefficient Standard error P-value FDR P Friendship Neuron 3.43E-09 1.98E-09 0.04 0.18 Friendship Oligodendrocyte 1.29E-09 2.03E-09 0.26 0.40 Friendship Astrocyte -5.23E-10 1.88E-09 0.61 0.69 Family Neuron 2.91E-09 1.90E-09 0.06 0.18 Family Oligodendrocyte 1.19E-09 1.95E-09 0.27 0.40 Family Astrocyte -9.22E-10 1.84E-09 0.69 0.69 This table provides the results of the cell type specific enrichment analyses using partitioned heritability.

Table 9: Cell type specific enrichment MAGMA Phenotype Name Beta Beta_std Standard error P-value Friendship Neuron 0.05 0.0076 0.0464 0.137 Friendship Oligodendrocyte -0.041 -0.0061 0.0448 0.822 Friendship Astrocyte -0.047 -0.0069 0.0467 0.845 Family Neuron 0.038 0.0058 0.0463 0.202 Family Oligodendrocyte -0.031 -0.0046 0.0447 0.759 Family Astrocyte -0.049 -0.0072 0.0466 0.855 This table provides the results of the cell type specific enrichment analyses using MAGMA.

159

6.3.6 Polygenic score analyses We identified a high correlation between polygenic scores for the two phenotypes in the ALSPAC cohort (r = 0.93; P < 2.2x10-16), which reflects the high genetic correlation between the phenotypes (Figure 15). Polygenic score analyses were conducted for three childhood questionnaires and 5 subscales (Methods). After FDR correction, polygenic scores did not significantly predict variance in any of the phenotypes (Table 10). The variance explained was small, with the highest variance explained being for childhood prosocial behaviour (Table 10 and Figures 16 and 17).

Figure 15: Correlation in Polygenic scores

Pearson’s correlation of scaled polygenic risk scores in 8104 unrelated individuals from ALSPAC. Correlation coefficient = 0.937, 95%CI: 0.934 - 0.939. P < 2.2x10-16

160

Figure 16: Distribution of phenotypes tested in polygenic regression analysis

Plotting the distribution of all the eight phenotypes revealed non-gaussian distribution, which is expected from count data. All eight phenotypes had poisson distributions with overdispersion. Given the distribution of the phenotypes, polygenic regression analyses were conducted using a negative binomial regression model with sex, and the first two genetic principal components included as covariates. We did not include age as a covariate as the phenotypes were measured at specific ages. In the top row, from left, the phenotypes are: 1. Total SDQ total score (age = 115 months, n = 5646); 2. SDQ peer relationship score (age = 115 months, n = 5660); 3. SDQ conduct problems score (age = 115 months, n = 5650); 4. SDQ emotional symptoms score (age = 115 months, n = 5650). In the bottom row, from left, the phenotypes are: 5 SDQhyperactivity score (age = 115 months, n = 5670); SDQ prosocial score* (age =115 months, n = 5669); 7. Children’s Communication Checklist Score* (age = 115 months, n = 5583); 8. Social and Communication Difficulties questionnaire score (age = 91 months, n = 5447). *These phenotypes were reverse coded to allow for analysis using a negative binomial regression model..

161

Figure 17: Polygenic regression analyses

Regression beta (effect) of polygenic scores after accounting for sex and the first two genetic principal components for the eight phenotypes tested. Lines indicate 95% confidence intervals. *reverse coded phenotypes.

162

Table 10: Polygenic score regression analyses

Family Friendship

Phenotype Regression SE P Nagelkerke's FDR P Regression SE P Nagelkerke's FDR P Sample Age Beta pseudo R2 Beta pseudo R2 size SDQtotal -0.011 0.01 0.22 3.79E-04 1.69E-01 -0.010 0.01 0.27 3.19E-04 1.69E-01 5646 115 months SDQpeer -0.038 0.02 0.03 1.20E-03 8.90E-02 -0.041 0.02 0.02 1.35E-03 8.90E-02 5660 115 months SDQhyperactivity -0.002 0.01 0.81 1.35E-05 4.30E-01 0.002 0.01 0.83 1.05E-05 7.16E-01 5670 115 months SDQemotional 0.008 0.02 0.62 6.33E-05 9.60E-01 -0.001 0.02 0.94 1.25E-06 8.00E-01 5650 115 months SDQconduct -0.019 0.06 0.75 8.72E-05 9.60E-01 -0.013 0.06 0.84 3.51E-05 9.60E-01 5650 115 months SDQprosocial* -0.028 0.01 0.03 1.12E-03 1.60E-01 -0.027 0.01 0.04 1.00E-03 1.60E-01 5669 115 months SCDC -0.023 0.01 0.20 4.35E-04 2.20E-01 -0.029 0.01 0.12 6.55E-04 1.60E-01 5447 115 months CCC* 0.000 0.00 0.96 4.30E-07 8.00E-01 0.003 0.01 0.76 2.52E-05 9.60E-01 5583 91 months

This table provides the incremental pseudo R2 for the phenotypes tested. *Indicates reverse coded phenotypes. SE = standard error.

163

6.4 Discussion We present the results of a large-scale genome-wide association study of social relationship satisfaction in the UK Biobank measured using family relationship satisfaction and friendship satisfaction. Despite the modest phenotypic correlations, there was a significant and high genetic correlation between the two phenotypes, suggesting a similar genetic architecture between the two phenotypes.

We first investigated how the two phenotypes are genetically correlated with psychiatric conditions and psychological phenotypes. As predicted, most, if not all, psychiatric conditions had a significant negative correlation for the two phenotypes. We replicated the correlation with autism using a second dataset (autism_iPSYCH). We observed significant negative genetic correlation between the two phenotypes and a large cross-disorder psychiatric GWAS (Cross-Disorder Group of the Psychiatric Genomics Consortium, 2013). This underscores the importance of social relationship dissatisfaction in psychiatric conditions. The genetic correlations identified here are very similar to those identified between psychiatric conditions, psychological phenotypes and subjective well-being (Okbay, Baselmans, et al., 2016). One notable exception is the negative genetic correlation between measures of cognition (Okbay, Beauchamp, et al., 2016; Sniekers et al., 2017) and the two phenotypes. Whilst subjective wellbeing is positively genetically correlated with measures of cognition, we identify a small yet statistically significant negative correlation between measures of correlation and the two phenotypes. A recent study highlighted that people with very high IQ scores tend to report lower satisfaction with life with more frequent socialization with friends (Li & Kanazawa, 2016), and our genetic correlation may be reflecting this. It is also likely that the relationship between IQ and social relationship satisfaction is not strictly linear and this needs to be further investigated. This also highlights the distinctions between subjective wellbeing and social relationship satisfaction. Subjective wellbeing encompasses several aspects of wellbeing including social relationships satisfaction.

We leveraged the high genetic correlation between the two phenotypes to identify significant genes using a method known as MTAG (Turley et al., 2017). MTAG makes several assumptions, and the key assumption being that the shared variance-covariance matrix is uniform across all SNPs. Despite this, we were comfortable applying MTAG to the two phenotypes due to both the high genetic correlation measured using two different methods, and the similar statistical power of the two GWAS, identified using the similar mean chi-square.

164

Our MTAG GWAS identified two significant loci. The first (rs1483617) is, as far as we are aware, a novel locus that has not been implicated previously in any GWAS of a neuropsychiatric condition. eQTL and chromatin interactions prioritize several genes that interact with this locus including NLGN1, which codes for the post-synaptic protein neuroligin1. NLGNs along with NRXNs and SHANKs are integral component of the synaptic complex, and mutations in these group of proteins have been implicated in several neuropsychiatric conditions including autism (Bourgeron, 2015). NLGN1 is located in the post-synaptic membrane of glutamatergic synapses, and can modulate the development of glutamatergic synapses in an activity dependent manner (Bourgeron, 2015). Another gene prioritized at this locus, KCNMB2, encodes a protein that is a component of the BK channels which are large conductance potassium channels that are sensitive to both intracellular voltage changes and changes in calcium ions.

The second locus (rs2189373) is in the MHC, a region of complex LD structure. This locus has been implicated previously in schizophrenia, and is nominally associated with the cross-disorder psychiatric GWAS. Interestingly, this locus was not investigated in the recent large-scale GWAS of schizophrenia (Ripke et al., 2014), as inferred from the summary statistics available. The effect direction of the SNPs when considering the haplotype structure of this region aligns with the genetic correlation between social relationship satisfaction and psychiatric risk. In other words, alleles that increase the risk for schizophrenia are in the same haplotype as alleles that decrease friendship satisfaction. The functional consequences of this locus must be formally tested.

Multiple different lines of evidence also point to the central role of brain tissues for the two phenotypes. First, using partitioned heritability, we identified a significant, threefold- enrichment in brain specific annotations for both phenotypes, accounting for about 30% of the total heritability. This mirrored the high enrichment in brain specific expression identified by FUMA, including the pituitary. We also sought to identify where within the brain tissue GWAS signals for the two phenotypes are enriched in. Here, FUMA and partitioned heritability identified enrichment in different tissues. While FUMA prioritized the cerebellum, partitioned heritability prioritized the cortex. The discordance in results is due to the different definition of tissue specific expression used by the two methods. While partitioned heritability compares focal brain region expression against other brain regions, FUMA compares the focal brain

165

region’s expression against all tissue types. Hence, partitioned heritability is more likely to tag brain region specific expression patterns.

In addition to enrichment for brain tissues, we also identified an enrichment for genes that are intolerant to loss-of-function mutations (Lek et al., 2016). A fifth of the heritability can be attributed to pLI > 0.9 genes. We utilized pLI quantified from the non-psychiatric population cohort for this analysis given the considerable genetic correlation between the two phenotypes and various psychiatric conditions. Loss-of-function mutations in these genes lead to severe biochemical consequences, and are implicated in several neuropsychiatric conditions. For example, de novo loss-of-function mutations in pLI intolerant genes confer significant risk for autism (Kosmicki et al., 2017). Our results suggest that pLI > 0.9 genes also contribute to difficulties in the social domain which is correlated with the risk for psychiatric conditions.

Whilst we were unable to investigate the polygenic score association in other samples with same phenotypic measures, we investigated polygenic score association across a range of different phenotypes in children. We chose three specific questionnaires that have been associated with various psychiatric conditions (Geurts et al., 2004; Goodman et al., 2000; Weiner et al., 2017), and subdomains in one of them - the SDQ. After FDR correction, polygenic scores for the two phenotypes were not associated with any of the childhood questionnaires investigated. The variance explained was small. This may be due to three reasons – the modest genetic correlation between social relationship satisfaction and the childhood phenotypes, the modest statistical power of this study due to both low per-SNP variance explained and low additive heritability, and the relatively small sample size of the ALSPAC cohort. Encouragingly, the phenotype for which the variance explained was the highest is childhood prosocial behaviour.

In conclusion, we identify two significant loci association with social relationship satisfaction, one of which has been previously implicated in schizophrenia. Comprehensive functional annotation prioritizes specific genes for functional follow-up, and identifies tissue specific enrichment pattern. Genetic correlation analyses highlight the importance of social relationship dissatisfaction in psychiatric conditions, and enrichment in loss-of-function intolerance genes suggest a degree of overlap between rare and common variants in a set of genes for psychiatric conditions. In the next chapter, we turn to report a GWAS of a non-social autism-related trait, systemizing.

166

7. Genetics of Systemizing

7.1 Introduction The hypersystemizing theory of autism suggests that autistic individuals, on average, have superior attention to detail, and a stronger drive to systemize. Systemizing involves identifying input-operation-output relationships in order to analyse and build systems and to understand the laws that govern specific systems (Baron-Cohen, 2006). Several lines of evidence suggest that autistic individuals are at least intact and even superior at systemizing. This idea was noted in the earliest papers describing autism. In his 1944 paper describing autism (Asperger, 1944), Hans Asperger notes a proclivity for patterns and order in autistic children. Of one child, he writes, “He orders his facts into a system and forms his own theories even if they are occasionally abstruse.” He observes that another child had ‘specialised technological interests and knew an incredible amount about complex machinery’, while a third child ‘was preoccupied by numbers’. In Leo Kanner’s 1942 paper, he writes that the children with autism have “precise recollection of complex patterns and sequences” (Kanner, 1943).

This initial observation has been quantified using different measures. For example, on a self-report measure of systemizing (the Systemizing Quotient – Revised, or the SQ-R) (Wheelwright et al., 2006), autistic individuals, on average, score significantly higher than non- autistic individuals (Baron-Cohen et al., 2003; Wheelwright et al., 2006). The same pattern of results is seen using the parent-report version of the SQ (Auyeung et al., 2006). Systemizing is also highly correlated with aptitude in science, technology, engineering and mathematics (STEM) (Nettle, 2007). Fathers and grandfathers of children with autism are significantly overrepresented in the field of engineering (Baron-Cohen, Wheelwright, Burtenshaw, & Hobson, 2007). Further, autistic individuals are more likely to enrol in STEM majors (34.31%) compared to the general population (22.8%) and those with other learning disabilities (18.6%) (Wei, Yu, Shattuck, McCracken, & Blackorby, 2013). STEM professionals also score significantly higher on measures of autistic traits (mean = 21.92, SD = 8.92) compared non- STEM professionals (mean = 18.92, SD = 8.48) (Ruzich et al., 2015). Finally, unpublished work from Sweden suggests that high Technical IQ in fathers increases risk for autism in children (see: https://www.scientificamerican.com/article/children-of-smart-fathers-have- higher-risk-of-autism/). A few studies have also investigated systemizing in other psychiatric traits and conditions, including schizotypy (Russell-Smith, Bayliss, Maybery, & Tomkinson, 2013) and anorexia nervosa (Hambrook et al., 2008).

167

It is unclear if the link between autism and systemizing is due to underlying genetics or due to other non-biological factors (for example, people high in systemizing may be more aware of autism and hence more likely to seek a diagnosis). It is also unclear if the underlying genetics of systemizing also contributes to risk for other psychiatric conditions. Here we directly test this using genome-wide association data from n = 51,564 participants who were customers of 23andMe Inc and who completed the SQ-R. This study has the following aims: 1. To investigate the genetic correlates of systemizing, using the SQ-R; 2. To identify the genetic correlation between the SQ-R and psychiatric conditions (including autism) and psychological phenotypes; 3. To identify enrichment in tissues, gene groups, and biological pathways.

We identify three genome-wide significant loci associated with the SQ-R, and identify potential candidate genes for these loci using chromatin interactions and eQTLs in neural tissues. We identify a significant, replicable, and positive genetic correlation between the SQ- R and autism. This genetic correlation is independent of the genetic correlates of educational attainment, which is also genetically correlated with autism. We further demonstrate that social phenotypes which are genetically correlated with autism are not genetically correlated with the SQ-R, suggesting that there may be least two independent sources of genetic risk for autism. In addition, the SQ-R is also positively correlated with schizophrenia and measures of cognition. We also find that systemizing has a modest heritability, is enriched in evolutionary conserved sites and fetal DNAse hypersensitivity sites, and has a high genetic correlation between males and females.

7.2 Methods 7.2.1 Participants Participants were customers of 23andMe, Inc. a personal genetics company and are described in detail elsewhere (Do et al., 2011; Tung et al., 2011). All participants provided informed consent and answered surveys online according to a human subjects research protocol, which was reviewed and approved by Ethical & Independent Review Services, an AAHRPP-accredited private institutional review board (http://www.eandireview.com). All participants completed the online version of the questionnaire on the 23andMe participant portal. Only participants who were primarily of European ancestry (97% European Ancestry) were selected for the analysis using existing methods (Eriksson et al., 2012). Unrelated

168

individuals were selected using a segmental identity-by-descent algorithm (Henn et al., 2012). A total of 51,564 participants completed the SQ-R (males = 26,063, and females = 25,501).

7.2.2 Phenotype The SQ-R is self-report measure of systemizing drive, or interest in rule-based patterns (Wheelwright et al., 2006). There are 75 items on the SQ-R, with a maximum score of 150 and a minimum score of 0. Scores on the test are normally distributed. The SQ-R has got good cross-cultural stability, and good psychometric properties with Cronbach’s alpha ranging from 0.79 to 0.94 in different studies (Groen, Fuermaier, Den Heijer, Tucha, & Althaus, 2015). Test- retest reliability available in a Dutch sample indicated a high reliability of 0.79 (Pearson correlation) (Groen et al., 2015). This was supported by another study in 4058 individuals which identified a high internal cohesion (Allison, Baron-Cohen, Stone, & Muncer, 2015). Exploratory followed by confirmatory factor analysis using Rasch modelling suggests that the SQ-R is largely unidimensional (Allison et al., 2015). Sex difference has been observed in multiple studies with males scoring significantly higher than females. Criterion validity shows that the SQ-R has a modest but significant correlation with the Mental Rotation Test (r =.25, P =.013), as well as its subscales (Ling, Burton, Salt, & Muncer, 2009). Autistic individuals, on average, score higher on the SQ-R on multiple different studies (Baron-Cohen et al., 2014; Wakabayashi et al., 2007). Further, the SQ-R also predicts autistic traits, with a combination of the SQ-R and the Empathy Quotient predicting as much as 75% of the variance on the Autism Spectrum Quotient, a measure of autistic traits (Wheelwright et al., 2006). In our database of 5,663 individuals, there were clear differences in scores on the SQ-R based on diagnosis, with autistic individuals scoring significantly higher on the SQ-R than non-autistic individuals, and the SQ-R being significantly and positively correlated with the AQ (reported in the Results).

7.2.3 Genotyping, imputation, and quality control Details of genotyping, imputation and quality control are given elsewhere (Section 3.2.3). Briefly, unrelated participants were included if they had a call rate of greater than 98.5%, and were of primarily European ancestry (97% European ancestry). A total of 1,030,430 SNPs (including InDels) were genotyped. SNPs were excluded if: they failed the Hardy-Weinberg Equilibrium Test at P < 10-20; had a genotype rate of less than 90%; they failed the parent- offspring transmission test using trio data in the larger 23andMe customer database; or if allele frequencies were significantly different from the European 1000 Genomes reference data (chi- square test, P < 10-20). Phasing was conducted using Beagle (version 3.3.1) (Browning &

169

Browning, 2007) in batches of 8000-9000 individuals. This was followed by imputation against all-ethnicity 1000 Genomes haplotypes (excluding monomorphic and singleton sites) using Minimac2 (Fuchsberger et al., 2015), Genetic association analyses was restricted to SNPs with a minor allele frequency > 1%.

7.2.4 Genetic association Our primary analysis was an additive model of genetic effects was conducted using a linear regression with age, sex, and the first five ancestry principle components included as covariates. In addition, given the modest sex difference, we also conducted sex-stratified analyses. SNPs were considered significant at a genome-wide threshold of P <5x10-8. Leading SNPs were identified after LD-pruning using Plink (r2 > 0.8). Winner’s curse correction was conducted using an FDR based shrinking (Bigdeli et al., 2016).

We calculate variance explained by first standardizing the regression estimates and then squaring the estimates. This is equivalent to:

2(푀퐴퐹 )(1 − 푀퐴퐹 ) 2 ̂ 2 푗 푗 푅 = 퐵푗 2 휎 푦

2 ̂ 2 Where R is the proportion of variance explained for SNP j. 퐵푗 is the non-standardized 2 regression coefficient, MAF is the minor allele frequency for SNP j, and 휎 푦 is the variance of the SQ-R.

We outline the method used for standardizing the SQ-R regression estimates, and, as an extension, calculation of the variance explained.

We chose to standardize the estimates of SQ-R to make them comparable to the standardized regression estimates for educational attainment. This provides a uniform scale on which GWAS for both the phenotypes were conducted, lending them to analyses using GWIS (Nieuwboer, Pool, Dolan, Boomsma, & Nivard, 2016).

In linear regression, standardized estimates can be obtained from non-standardized estimates using the following formula (Bring, 1994):

퐵̂휎푥 퐵̂푠푡푑 = 휎푦

170

Where, 퐵̂푠푡푑 is the standardized estimate of the regression coefficient, 퐵̂ is the non- standardized estimate of the regression coefficient, 휎푥 is the standard deviation for the independent variable, and 휎푦 is the standard deviation for the independent variable. In the

GWAS analyses, y is the phenotype (SQ-R), and x is the genotype. We know 휎푦 and 퐵̂.

However, 휎푥 = √2(푀퐴퐹)(1 − 푀퐴퐹), which has been derived before (Okbay, Beauchamp, et al., 2016; Rietveld et al., 2014), but we shall derive again below.

2 휎푥 = √휎푥

Σ( 푥 − 푥̅) 휎2 = 푥 푁

Let’s assume that the genotype frequencies are in Hardy-Weinberg equilibrium. Let 푥푖= 0,1, and 2 for the three genotypes.

2 P(푥푖 = 0) = q

P(푥푖 = 1) = 2pq

2 P(푥푖 = 2) = p

2 2 푥̅ = q (푥푖 = 0) + 2pq(푥푖 = 1) + p (푥푖 = 2)

= 0 + 2p(1-p) + 2 p2

= 2p

2 2 2 2 2 2 휎푥 = q (0 − 2푝) + 2푝푞 (1 − 2푝) + 푝 (2 − 2푝)

= 2푝(1 − 푝)

= 2(1 − 푀퐴퐹)(푀퐴퐹)

Therefore,

휎푥 = √2(푀퐴퐹)(1 − 푀퐴퐹)

And hence,

171

퐵̂휎푥 퐵̂푠푡푑 = 휎푦

2 2 The variance explained per SNP is R = 퐵̂푠푡푑

Therefore,

̂ 2 2 2 퐵 휎푥 푅 = 2 휎푦

7.2.5 GWAS of SCDC The Children’s Social and Communication Disorder Checklist is a questionnaire that measures difficulties in verbal and nonverbal communication and social interaction including reciprocal interaction. The questionnaire consists of 12 questions, with scores ranging from 0 – 24, with higher scores reflecting difficulties in social interaction and communication. The SCDC has good internal consistency (0.93) and good test-retest reliability (0.81) (Skuse, Mandy, & Scourfield, 2005). The SCDC has reasonable specificity and sensitivity in distinguishing clinically diagnosed autism from control individuals (Bölte, Westerwald, Holtmann, Freitag, & Poustka, 2011). Details of the ALSPAC cohort, and information about genotyping and imputation are provided in sections 5.2.1 and 5.2.2.

We used mother-reported SCDC scores on children aged 8. Whilst the SCDC has been measured at different ages in the ALSPAC cohort, we chose SCDC scores at age 8 as these had the largest sample size and have high SNP heritability (St Pourcain et al., 2014) (h2 = 0.24 ± 0.07). A previous study has identified a significant genetic correlation between the rank transformed SCDC scores and autism (Robinson et al., 2016). This study constrained the intercept to obtain low standard errors, given the relatively small sample size. Another study has shown that polygenic scores for autism are significantly associated with SCDC scores(St Pourcain et al., 2017). Further, this study showed that the effect of transformation on the SCDC scores were minimal. Here, we used log-transformed SCDC scores with sex and the first two genetic principal components as covariates. We used log-transformation as the resulting GWAS estimates are easier to interpret.

In total, SCDC scores were available on 7825 children. From this, we removed individuals for whom complete SCDC scores were not available. After excluding related individuals and individuals with no genetic data, data was available on a total of 5421 unrelated

172

individuals. The SCDC scores were not normally distributed. So, we log-transformed the scores and ran regression analyses using the first 2 genetic principal components and sex as the covariates using Plink 2.0 (Purcell et al., 2007).We excluded SNPs which deviated from Hardy- Weinberg Equilibrium P < 1x10-6), with minor allele frequency < 0.01, with missing call rates > 2%. We further excluded individuals with genotype missing rates > 5%.

The log-transformed SCDC scores (henceforth, SCDC scores) had a modest, but significant SNP heritability as quantified using LDSR (h2 = 0.12 ±0.05). LDSR intercept (0.99) suggested that there was no inflation in GWAS estimates due to population stratification. The

λGC was 1.013. We replicated the previously identified genetic correlation (constrained intercept) with autism using our SCDC GWAS (autism_PGC: rg = 0.46 ±0.20, P = 0.019; autism_iPSYCH: rg = 0.45±0.18, P = 0.01). In addition, we also identified a negative genetic correlation between educational attainment and SCDC (rg = -0.30± 0.11, P = 0.007).

7.2.6 Genomic inflation factor, heritability, and functional enrichment LDSR (Bulik-Sullivan, Finucane, et al., 2015; Bulik-Sullivan, Loh, et al., 2015) was used to calculate for inflation in test statistics due to unaccounted population stratification. Heritability, and genetic correlations for all the phenotypes were calculated using LDSR using the north-west European LD scores. For all the phenotypes except the SCDC, we performed genetic correlation without constraining the intercept. We constrained the intercept for the SCDC to mirror the analyses conducted in the original paper that identified a positive genetic correlation between the SCDC and autism (Robinson et al., 2016). Constraining the intercept decreases the standard errors. However, the intercept should only be constrained if the participant overlap between the two phenotypes is known and there is limited population stratification in both phenotypes. We identified significant genetic correlations using a Bonferroni adjusted P-value < 0.05. For autism, we replicated the genetic correlation using the iPSYCH (Pedersen et al., 2017) autism GWAS dataset.

Heritability and genetic correlation was performed using extended methods in LDSR (Bulik-Sullivan, Finucane, et al., 2015). Difference in heritability between males and females was quantified using:

2 2 ℎ푚푎푙푒푠 − ℎ푓푒푚푎푙푒푠 푍 = 2 2 √푆퐸푚푎푙푒푠 + 푆퐸푓푒푚푎푙푒푠

173

2 2 where Z is the Z score for the difference in heritability for a phenotype, (h males - h females) is the difference SNP heritability estimate in males and females, and SE is the standard errors for heritability. Two-tailed P-values were calculated, and reported as significant if P < 0.05.

7.2.7 Functional annotations For the primary GWAS (non-stratified analyses), we conducted functional annotation using FUMA (Watanabe et al., 2017). We restricted our analyses to the non-stratified analyses due to the high genetic correlation between the sexes, and the low statistical power of the sex- stratified GWAS. We conducted gene-based association analyses using MAGMA (de Leeuw et al., 2015) within FUMA and report significant genes after using a stringent Bonferroni corrected P < 0.05. In addition, we conducted enrichment for tissue specific expression, and pathway analyses within FUMA. For the significant SNPs, we investigated enrichment for eQTLs using brain tissues in the BRAINEAC and GTEx (Ardlie et al., 2015) database, within FUMA. In addition, we also investigated for chromatin interactions from the dorsolateral prefrontal cortex, the hippocampus, and neural progenitor cells. Significant results were identified using a Benjamini-Hochberg FDR corrected P < 0.05. We further conducted partitioned heritability using extended methods in LDSR (Finucane et al., 2015). We conducted partitioned heritability for baseline categories.

7.2.8 GWIS To investigate if the SQ-R is genetically correlated with autism independent of the genetic effects of cognition, we constructed a unique SQ-R phenotype after conditioning on the genetic effects of educational attainment using GWIS (Nieuwboer et al., 2016). GWIS takes into account the genetic covariance between the two phenotypes to calculate the unique component of the phenotypes as a function of the genetic covariance and the SNP heritability of the phenotypes. Prior to performing GWIS, we standardized the beta coefficients for the SQ- R GWAS by using the following formula:

2(푀퐴퐹)(1 − 푀퐴퐹) ̂ ̂ 퐵푠푡푑 = 퐵√ 2 휎 푦

Where 퐵̂푠푡푑 is the standardized regression coefficients, 퐵̂ is the regression coefficient 2 obtained from the non-standardized GWAS, MAF is the minor allele frequency, 휎 푦 is the variance of the SQ-R.

174

7.2.9 Phenotypic regression analyses To identify if the SQ-R is associated attention to detail and rigidity as defined by Bralten and colleagues (Bralten et al., 2017), we conducted regression analysis in 5,663 individuals from the Cambridge Autism Research Database who had completed the AQ and the SQ-R. We derived the variables ‘attention to detail’ and ‘rigidity’ based on the scoring criteria provided by Bralten and colleagues (Bralten et al., 2017). We note that this is different from the attention to detail subscale as defined in the original paper describing the AQ (Baron- Cohen, Wheelwright, Skinner, Martin, & Clubley, 2001). Linear regression analyses were conducted using ‘attention to detail’ and ‘rigidity’ scores as the dependent variable, the SQ-R as the independent variable, and age of completion of both the SQ-R and the AQ, sex, and status as covariates. Status was divided into five categories: Autistic individuals, controls, first degree relative of an autistic individual, suspected autism, and suspected autism and relative of an autistic individual.

7.2.10 Data Availability Data for the SQ-R is available from 23andMe, Inc. Top SNPs (n = 10,000) can be visualized here: https://ghfc.pasteur.fr

7.3. Results 7.3.1 Phenotypic distribution and correlates of the SQ-R in the CARD database We investigated the distribution of the SQ-R and its correlates with AQ, diagnosis, and sex in a cohort of 5663 individuals from the Cambridge Autism Research Database (CARD). The SQ-R was normally distributed in the database. We investigated mean score of the SQ-R in individuals with autism (N = 1902), control individuals (N = 2880), first degree relatives of autistic individuals (N = 561), individuals who suspect they have autism (N = 204), and individuals who suspect they have autism and are relatives of autistic individuals (N = 116) (Figure 1). After accounting for age at which the SQ-R was taken, and the sex, standardized scores on the SQ-R significantly predicted Autism status compared to controls (Beta = 0.8±0.03, P < 2.2x10-16). Nagelkerke’s pseudo R2 for the standardized SQ-R score was 0.14.

175

Figure 1: Distribution of the SQ-R scores in different groups

Violin plots showing the distribution of the SQ-R scores in different groups. The central dot in each violin plot shows the mean, with the lines showing the standard deviation. Data means (SD): Autism: 76.73 (25.50), Control: 57.24(22.09), Relative: 53.87 (23.45), Suspected: 72.53(23.70), Suspected and relative: 66.88 (26.07). Note, both males and females have been included in this graph. We also investigated the predictive power of the SQ-R on the Autism Spectrum Quotient (AQ) in this sample. We regressed standardized SQ-R scores, sex, diagnosis status, standardized age of AQ completion, standardized age of the SQ-R completion against standardized AQ scores in 5663 individuals. As expected, the SQ-R was significantly associated with AQ scores (Beta = 0.32±0.009, P < 2.2x10-6) (Figure 2). The SQ-R contributed to 10% of the variance in the AQ scores.

176

Figure 2: Correlation between SQ-R and AQ scores

Standardized SQ-R scores provided on the x-axis. Standardized AQ scores provided on the y- axis. Regression line provided in blue with standard errors.

7.3.2 Phenotypic distribution and genome-wide association analyses The mean score for all participants was 71±21 on a total of 150 on the SQ-R, with males scoring higher than females on the SQ-R (76.5±20 in males; 65.4±20.6 in females). The male-advantage was statistically significant (P < 0.001, Cohen’s d = 0.54) (Figure 3).

177

Figure 3: Mean scores and standard deviations of the SQ-R

Mean scores and standard deviations for the SQ-R scores for the three datasets. There was a significant difference in mean scores between males and females (P < 0.001, Cohen’s d = 0.54). Females: 65.4(20.6), Males: 76.5(20), Non-stratified: 71(21). Total score ranges from 0 – 150.

Genome-wide association analyses identified three significant loci (Figure 4). Two of these were significant in the non-stratified GWAS: rs4146336 on chromosome 3 (P = 2.58x10-8) and rs1559586 on chromosome 18 (P = 4.78x10-8). In addition, we also identified a significant locus in the males-only GWAS (rs8005092 on chromosome 14, P = 3.74x10-8). LDSR intercept suggested that there was minimal inflation due to population stratification (Figure 4). Table 1 provides a list of all independent SNPs with P < 1x10-6.

178

Figure 4: Manhattan and QQ-plots for the three GWAS

Manhattan plot for the three SQ-R GWAS: non-stratified (A), males-only (C), females-only (E). Significant SNPs are highlighted in red. QQ-plots for the three SQ-R GWAS: non-stratified (B), males-only (D), females-only (F). SQ-R nonstratified (N = 51,564): λGC = 1.10, LDSR intercept = 0.99, SQ-R males only (N = 26,063): λGC = 1.06, LDSR intercept = 0.99, SQ-R females only (N =25,051 ): λGC = 1.05, LDSR intercept = 1.01.

179

Table 1: List of SNPs with P < 1x10-6 SNP Chr freq.a freq.b BP EA OA P Effect SE Study Nearest Gene (GENCODE) rs4146336 3 0.38 0.62 117493600 C A 2.58E-08 -0.72824 0.130795 M+F LSAMP rs8005092 14 0.05 0.95 26130024 T A 3.74E-08 3.02835 0.55026 M SNORD37 rs1559586 18 0.43 0.57 70727724 C A 4.78E-08 -0.69696 0.127662 M+F RP11-169F17.1 rs8045744 16 0.56 0.44 6284566 T G 9E-08 -0.9881 0.184812 F RBFOX1 rs78199107 17 0.96 0.04 38077750 T C 9.89E-08 -2.39955 0.450255 M ORMDL3

22:17898753:A_AG 22 0.37 0.63 17898753 I D 1.56E-07 -1.13631 0.216607 F rs13051169 21 0.09 0.91 41669300 T C 2.05E-07 1.72923 0.332858 F DSCAM rs7567262 2 0.19 0.81 54057363 T C 2.74E-07 0.81816 0.15915 M+F ASB3 rs7151561 14 0.33 0.67 65890603 T C 2.98E-07 0.690253 0.134687 M+F FUT8 rs10206716 2 0.66 0.34 15359339 T C 3.27E-07 -1.09278 0.213958 F NBAS rs8045744 16 0.56 0.44 6284566 T G 4.3E-07 -0.64915 0.128407 M+F RBFOX1 rs145819874 19 0.99 0.01 31774157 T G 6.24E-07 -4.80191 0.963447 M TSHZ3 rs144272877 1 0.81 0.19 243354791 T C 6.71E-07 1.34986 0.271595 F CEP170 rs72808065 10 0.01 0.99 30795748 T C 6.93E-07 -3.01167 0.60677 M+F MAP3K8 rs8106663 19 0.77 0.23 44568192 T A 8.76E-07 0.745581 0.151608 M+F ZNF223 rs11094514 X 0.10 0.90 150893521 T C 9.67E-07 0.97724 0.1995 M+F FATE1 rs12083390 1 0.21 0.79 9721245 G A 1E-06 1.12859 0.230712 M PIK3CD This table provides the list of all SNPs with P < 1x10-6. OA = other allele, EQ = effect allele, Chr = chromosome, SE = Standard error, P = P- value.

180

For two of these loci (rs1559586 and rs8005092), eQTL and chromatin interactions identified 5 genes for the three loci: TIMM21, FBXO15, RP11-169F17.1, NOVA1, and STXBP6 (Figure 5 and Appendix: Table 16). Of these, STXBP6, NOVA1, and FBXO15 have relatively high expression in the brain in the GTEx database. STXBP6 encodes a protein that binds to Syntaxin 1 and help assemble the SNARE complex, found in the presynaptic terminal (Constable, Graham, Morgan, & Burgoyne, 2005). NOVA1 encodes a neuron-specific RNA binding protein that regulates neuronal miRNA (Storchel et al., 2015). FBXO15 encodes an F- box domain containing protein that is involved in protein degradation via ubiquitination, and is important for repressing differentiation in stem cells. TIMM21 is involved in the translocation of the transit peptide situated in the inner mitochondrial membrane. Finally, RP11-169F17.1 is an uncharacterized ncRNA.

The most significant SNP in each GWAS analysis explained 0.001 – 0.0002% of the total variance (Table 2) after accounting for winner’s curse suggesting considerable polygenicity and small per-SNP effect size. Local LD plots for the three significant loci are provided in Figure 6.

Table 2: Variance explained by the top SNPs Study SNP R2 R2 (WC) SQ-R_All rs4146336 0.060 4.09E-4 SQ-R_M rs8005092 0.135 2.33E-4 SQ-R_F rs8045744 0.116 1.55E-3 This table provides the variance explained (R2) by the most significant SNP in each GWAS. Winner’s curse (WC) corrected variance explained is provided in the last column.

181

Figure 5: Circos plots for the three significant loci demonstrating eQTLs and chromatin interactions

C

Circos plots showing chromatin interactions for rs8005092 (left) and rs1559586 (right). The outermost ring shows the P-values of the SNPs, and the inner ring provides the names of genes with significant chromatin interactions (FDR P < 0.05).

182

Figure 6: Regional LD plot for the three significant loci

Regional LD plots for the three significant SNPs. The top locus zoom plot is for rs8005092 in the males-only GWAS. The bottom two panels provide the regional LD plots for rs1559586 (left) and rs4146336 (right) in the non-stratified GWAS.

183

7.3.3 Genetic correlation We investigated genetic correlation between the SQ-R and psychiatric conditions, psychological phenotypes and measures of cognition (Figure 7, Table 3). The SQ-R was -3 significantly and positively correlated with autism (PGC cohort) (rg =0.22±0.06; P =1.3x10 ) -3 and schizophrenia (rg = 14±0.04; P = 1.6x10 ). We replicated the association with autism using -5 the iPSYCH autism cohort (rg = 0.26±0.06; P = 3.35x10 ). Among psychiatric conditions, the magnitude of genetic correlation between the SQ-R and autism is the largest. In comparison, the magnitude of genetic correlation between the SQ-R and schizophrenia is small. The SQ-R was also significantly and positively correlated measures of cognition (years of schooling: rg = -5 -5 0.14±0.03, P = 4.73x10 , cognitive aptitude rg = 0.19±0.04; P = 2.35x10 ).

Systemizing is thought to contribute to non-social domain of autism. To identify if systemizing contributes to social autistic traits we conducted genetic correlation using the SCDC, which is genetically correlated with autism (Robinson et al., 2016). Additionally, we also conducted genetic correlation between the SQ-R and family relationship satisfaction and friendship satisfaction (Chapter 6), which are negatively correlated with autism. We did not identify a significant genetic correlation between the SQ-R and the SCDC (rg = 0.04±0.13, P =

0.73), family relationship satisfaction (rg = -0.04±0.06, P = 0.47), and friendship satisfaction

(rg = 0.06±0.07, P = 0.42). We note that the magnitude of the genetic correlations is very small. These results suggest that the contribution of systemizing to the genetic risk of autism is distinct from the contribution of social traits.

184

Figure 7: Genetic correlation between the SQ-R and other phenotypes

Genetic correlations between the SQ-R and multiple other phenotypes provided. The bars represent 95% confidence intervals. The following genetic correlations were significant after -3 Bonferroni correction: Autism_PGC (rg =0.22±0.06; P =1.3x10 , N = 16,350) -5 Autism_iPSYCH (rg = 0.26±0.06; P = 3.35x10 , N = 19142), Years of Schooling 2016 (rg = -5 -3 0.13±0.03; P = 4.73x10 , N = 293723), College completion (rg = 0.18±0.05; P = 1.30x10 , -5 N = 95427), Cognitive aptitude (rg = 0.19±0.04; P = 2.35x10 , N = 78308), and Schizophrenia -3 (rg = 0.13±0.04; P = 1.6x10 , N = 77096)

185

Table 3: Genetic correlations Phenotype Genetic Standard P Sample PMID correlation Error size Anorexia Nervosa 0.042 0.089 6.34E-01 14477 NA ADHD 0.090 0.059 1.20E-01 5422 20732625 Autism_PGC* 0.220 0.068 1.30E-03 16350 28540026 Autism_iPSYCH* 0.260 0.060 3.35E-05 19142 NA Bipolar disorder 0.108 0.060 6.97E-02 16731 21926972 Depressive symptoms -0.029 0.053 5.76E-01 161460 27089181 Empathy 0.103 0.075 1.70E-01 46861 NA Extraversion 0.205 0.085 5.06E-02 63030 26362575 Major depressive disorder 0.062 0.081 4.43E-01 18759 22472876 Neo-conscientiousness 0.197 0.129 1.27E-01 17375 21173776 Neo-openness to 0.378 0.125 2.60E-03 17375 21173776 experience Neuroticism -0.008 0.046 8.58E-01 170911 27089181 Schizophrenia* 0.137 0.043 1.60E-03 77096 25056061 Subjective well being 0.126 0.066 5.52E-02 298420 27089181 Years of schooling 2016* 0.139 0.034 4.73E-05 293723 27225129 Childhood IQ 0.265 0.096 5.60E-03 12441 23358156 College completion* 0.181 0.056 1.30E-03 95427 23722424 Cognitive aptitude* 0.190 0.040 2.35E-05 78308 28530673 Friendship satisfaction 0.057 0.071 4.20E-01 134941 NA Family satisfaction -0.046 0.065 4.75E-01 139603 NA SCDC- 0.044 0.130 7.35E-01 5421 NA constrainedintercept This table provides the results of the genetic correlation analyses. *represents significant genetic correlations after Bonferroni correction.

Educational attainment has a positive genetic correlation with both autism and the SQ- R. To identify if the genetic correlation between autism and SQ-R is driven by educational attainment, we constructed an SQ-R GWAS conditioned on the genetic effects of educational attainment (SQminEdu GWAS) using GWIS (Methods). We confirmed that the SQminEdu

GWAS was not genetically correlated with educational attainment (rg = 0.063±0.032, P = -16 0.052), and genetically correlated with SQ-R (rg = 1.04±0.005, P < 2.2x10 ). The SQminEdu

186

GWAS was still genetically correlated with both the PGC autism dataset (rg = 0.14±0.06 , P = -4 0.032) and the autism_iPSYCH dataset (rg = 0.20±0.05, P = 5x10 ) (Figure 8, Table 4). These results suggest that the SQ-R is genetically correlated with autism independent of the genetic effects of educational attainment.

Figure 8: Genetic correlations between the SQ-R and SQminEdu GWIS estimates

Genetic correlation between the SQ-R and three phenotypes (in Red). Genetic correlation between the GWIS SQminEdu dataset and the same three phenotypes (in blue). The bars represent 95% confidence intervals. P-values provided on top of the bars.

Table 4: Genetic correlations between SQminEdu and other GWAS Phenotype Genetic Standard P-value correlation Error Systemizing 1.0463 0.005 2.20E-16 Educational attainment 0.0638 0.0328 0.052 Autism_PGC 0.1436 0.0671 0.0324 Autism_iPSYCH 0.2063 0.0589 0.0005 This table provides the results of the genetic correlations of the SQminedu GWAS dataset.

A recent study (Bralten et al., 2017) identified that polygenic scores for autism predicts autistic traits in specific subdomains of the Autism Spectrum Quotient (AQ) which they termed ‘attention to detail’, and ‘rigidity’. We queried if systemizing is associated with both these domains within the AQ. Whilst we did not have access to a cohort with both genetic data and AQ scores, we had information on 6765 individuals who had completed both the AQ and the

187

SQ-R. Our results show that the SQ-R is significantly associated with the ‘attention to detail’ and ‘rigidity’ domains on the AQ as defined by Bralten et al., 2017 (Bralten et al., 2017) (Attention to detail: Beta = 0.025±0.0009, P < 2.2x10-6; Rigidity: Beta = 0.015±0.001, P < 2.2x10-6).

7.3.4 Heritability, enrichment analysis, and gene-based analyses Additive SNP heritability calculated using LDSR was 0.12±0.012 for the SQ-R (P=1.2x10-20). Further, there was no significant difference in additive SNP heritability between males and females (P = 0.34) (Figure 9 and Table 5), which was strengthened by the high genetic correlation between males and females (1±0.17; P = 3.91x10-10), suggesting a similar genetic architecture between sexes.

Figure 9: Additive heritability for the three GWAS

Additive SNP heritability estimates for the SQ-R GWAS. Estimates provided for the non- stratified and the sex-stratified GWAS datasets. Error bars represent standard errors.

Table 5: Additive SNP heritability estimates Subset Heritability Standard Error P

Males and females 0.11 0.012 1.29E-20 Males 0.11 0.019 1.59E-08 Females 0.08 0.019 2.05E-05 This table provides the additive SNP heritability for the three GWAS datasets.

188

Partitioned heritability identified enrichment in four functional categories for the SQ-R after FDR correction: evolutionarily conserved genetic regions in mammals (FDR P = 0.027), foetal DNAse hypersensitivity sites (FDR P = 0.039), histone H3K27 acetylation (FDR P = 0.039), and Hoffman transcription start sites (FDR P = 0.039) (Table 6).

Gene based analysis conducted using MAGMA identified 4 significant genes below a Bonferroni-corrected threshold of 2.75x10-6: SDCCAG8, P = 3.7x10-9; ZSWIM6, P = 8.5x10-8; and ZNF574, P = 2x10-6 and FUT8, P = 2.13x10-6 (Appendix: Table 17). Pathway based (Appendix: Table 18) and tissue enrichment analysis did not identify any significant results (Figure 10).

189

Table 6: Partitioned heritability for the non-stratified SQ-R Category Prop Prop Prop Enrich Enrich Enrich P FDR P SNPs h2 h2 SE SE

base_0 1.00 1.00 0.00 1.00 0.00 NA NA Conserved_LindbladToh.extend.500_0 0.33 0.83 0.14 2.50 0.43 5.16E-04 2.73E-02 FetalDHS_Trynka.extend.500_0 0.29 0.78 0.16 2.72 0.55 2.19E-03 3.97E-02 H3K27ac_PGC2.extend.500_0 0.34 0.69 0.11 2.07 0.34 2.35E-03 3.97E-02 TSS_Hoffman.extend.500_0 0.03 0.24 0.07 6.81 1.93 2.99E-03 3.97E-02 H3K9ac_peaks_Trynka_0 0.04 0.39 0.12 10.18 3.13 5.07E-03 5.37E-02 Conserved_LindbladToh_0 0.03 0.35 0.12 13.28 4.52 7.02E-03 6.20E-02 Repressed_Hoffman.extend.500_0 0.72 0.54 0.07 0.75 0.10 1.26E-02 9.52E-02 FetalDHS_Trynka_0 0.08 0.54 0.20 6.33 2.39 2.34E-02 1.53E-01 Promoter_UCSC.extend.500_0 0.04 0.17 0.06 4.39 1.52 2.60E-02 1.53E-01 UTR_5_UCSC_0 0.01 0.09 0.04 16.65 7.58 3.74E-02 1.98E-01 Promoter_UCSC_0 0.03 0.19 0.08 6.17 2.68 5.11E-02 2.46E-01 H3K4me3_Trynka.extend.500_0 0.26 0.51 0.13 1.99 0.53 6.61E-02 2.92E-01 DGF_ENCODE_0 0.14 0.50 0.22 3.61 1.59 9.62E-02 3.54E-01 Transcribed_Hoffman_0 0.35 0.07 0.16 0.20 0.48 9.63E-02 3.54E-01 WeakEnhancer_Hoffman_0 0.02 0.15 0.08 7.34 3.89 1.06E-01 3.54E-01 H3K4me1_Trynka.extend.500_0 0.61 0.82 0.13 1.34 0.21 1.07E-01 3.54E-01 Enhancer_Andersson.extend.500_0 0.02 0.11 0.06 5.72 3.13 1.32E-01 3.91E-01 Coding_UCSC.extend.500_0 0.06 -0.03 0.07 -0.49 1.03 1.41E-01 3.91E-01 DHS_Trynka.extend.500_0 0.50 0.79 0.20 1.58 0.40 1.50E-01 3.91E-01 PromoterFlanking_Hoffman.extend.500_0 0.03 0.14 0.07 4.25 2.22 1.50E-01 3.91E-01 DHS_peaks_Trynka_0 0.11 0.45 0.24 4.04 2.13 1.55E-01 3.91E-01 WeakEnhancer_Hoffman.extend.500_0 0.09 -0.05 0.11 -0.58 1.21 1.82E-01 4.39E-01 Enhancer_Andersson_0 0.00 0.07 0.05 16.50 12.07 1.96E-01 4.49E-01 UTR_5_UCSC.extend.500_0 0.03 0.10 0.06 3.55 1.99 2.04E-01 4.49E-01 SuperEnhancer_Hnisz_0 0.17 0.23 0.05 1.37 0.29 2.15E-01 4.56E-01 CTCF_Hoffman_0 0.02 0.14 0.10 5.99 4.28 2.44E-01 4.97E-01 H3K4me1_Trynka_0 0.43 0.67 0.22 1.57 0.51 2.62E-01 5.14E-01 SuperEnhancer_Hnisz.extend.500_0 0.17 0.22 0.04 1.28 0.26 2.88E-01 5.44E-01 Repressed_Hoffman_0 0.46 0.67 0.20 1.45 0.44 3.02E-01 5.44E-01 H3K4me1_peaks_Trynka_0 0.17 -0.05 0.22 -0.28 1.27 3.08E-01 5.44E-01 TSS_Hoffman_0 0.02 0.09 0.07 4.91 3.97 3.27E-01 5.59E-01 H3K27ac_Hnisz_0 0.39 0.47 0.09 1.20 0.22 3.64E-01 6.03E-01 PromoterFlanking_Hoffman_0 0.01 0.06 0.06 6.99 6.94 3.87E-01 6.21E-01 Transcribed_Hoffman.extend.500_0 0.76 0.85 0.10 1.11 0.13 4.19E-01 6.42E-01

190

H3K27ac_Hnisz.extend.500_0 0.42 0.51 0.11 1.20 0.25 4.24E-01 6.42E-01 H3K9ac_Trynka.extend.500_0 0.23 0.31 0.12 1.35 0.50 4.89E-01 7.20E-01 DHS_Trynka_0 0.17 0.34 0.25 2.00 1.51 5.09E-01 7.29E-01 TFBS_ENCODE.extend.500_0 0.34 0.44 0.18 1.29 0.51 5.67E-01 7.90E-01 Intron_UCSC_0 0.39 0.35 0.07 0.90 0.18 5.83E-01 7.93E-01 Enhancer_Hoffman_0 0.06 0.12 0.12 1.94 1.86 6.17E-01 8.17E-01 UTR_3_UCSC_0 0.01 0.03 0.05 2.79 4.07 6.61E-01 8.29E-01 H3K9ac_Trynka_0 0.13 0.18 0.13 1.45 1.03 6.67E-01 8.29E-01 Intron_UCSC.extend.500_0 0.40 0.37 0.06 0.94 0.14 6.73E-01 8.29E-01 TFBS_ENCODE_0 0.13 0.22 0.21 1.64 1.61 6.91E-01 8.32E-01 H3K4me3_peaks_Trynka_0 0.04 0.11 0.18 2.59 4.30 7.12E-01 8.39E-01 DGF_ENCODE.extend.500_0 0.54 0.50 0.17 0.91 0.31 7.82E-01 8.84E-01 CTCF_Hoffman.extend.500_0 0.07 0.10 0.12 1.40 1.68 8.09E-01 8.84E-01 Enhancer_Hoffman.extend.500_0 0.15 0.12 0.12 0.81 0.81 8.12E-01 8.84E-01 Coding_UCSC_0 0.01 0.03 0.07 2.06 4.58 8.17E-01 8.84E-01 H3K27ac_PGC2_0 0.27 0.29 0.15 1.06 0.57 9.13E-01 9.68E-01 H3K4me3_Trynka_0 0.13 0.14 0.17 1.05 1.27 9.69E-01 9.73E-01 UTR_3_UCSC.extend.500_0 0.03 0.03 0.05 1.07 1.91 9.73E-01 9.73E-01 This table provides the partitioned heritability analyses results for the SQ-R (non-stratified) GWAS results. h2 = heritability, enrich = enrichment, SE = standard error, P = P-value.

191

Figure 10: Tissue specific heritability

Tissue specific heritability estimates for the SQ-R as provided within FUMA. Tissue expression was constructed using GTEx. General tissue-specific heritability estimates (top), and specific tissue-specific heritability estimates (bottom). Height of the bar represents significance. None of the heritability estimates were significant after Bonferroni correction.

7.4 Discussion Here we present the results of a genome-wide association study of systemizing. We identify three genome-wide significant loci associated with the SQ-R, two in the non-stratified GWAS and one in the males-only GWAS. Chromatin interaction analyses in neural tissues prioritize five genes for further functional follow up. Three of these genes have high expression in brain tissues. One of these candidate genes is STXBP6. STXBP6 binds to STX1 in the presynaptic boutons to help inhibit Ca2+ mediated exocytosis of neurotransmitters (Constable et al., 2005). Several studies have identified genes involved in synapse formation and

192

maintenance that are disproportionately mutated in autism (Bourgeron, 2015). Further, transcriptome studies of the post-mortem brain have identified downregulated genes that are enriched for synaptic transmission (Parikshak et al., 2016; Voineagu et al., 2011).

The idea that autistic individuals, on average, have a stronger interest in and aptitude for systems is not new. Several theories have sought to understand this link. For example the hyper-systemizing theory suggests that this could explain the ‘resistance to change’ seen in many autistic individuals in that to systemize one needs to vary one variable at a time, whilst holding all others constant, in order to observe how manipulating one variable changes the output (Baron-Cohen, 2006). In this way, resistance to change may appear as a ‘symptom’ but in fact may reflect a different learning style in which the autistic person is systematically seeking to understand a rule-governed domain. The prediction error theory may also be relevant to this hypothesis in that a strong systemizer may be more attracted to highly lawful, more predictable domains, like mathematics or computers, with the social world being the extreme opposite of this (Sinha et al., 2014). Note that versions of this theory may not be well-supported by evidence, since this theory suggests autistic people will have difficulty making predictions across the board, whilst the hyper-systemizing theory suggests this will only be true in domains that are not lawful. While a few studies have investigated the genetic correlates of social traits (Chapters 2 – 6), or repetitive behaviour (Cantor et al., 2017), no study, to our knowledge, has investigated the genetic correlates of non-social cognitive traits associated with autism.

Genetic correlation analyses identified, as hypothesized, a positive, replicable correlation with autism. The correlation was not trivial, and this was independent of the genetic contributors to educational attainment. To further understand the genetic correlates of the SQ- R, we tested the genetic correlation between the SQ-R and three social traits that are all genetically correlated with autism: social and communication difficulties in childhood which is positively genetically correlated with autism, and friendship and family relationship satisfaction in adults, which is negatively genetically correlated with autism. We did not identify a significant correlation between the SQ-R and the three traits. These results suggest that there may be at least two different, independent domains genetically correlated with autism: a social domain and a non-social domain. Further research is needed to investigate if the genetic correlation between these domains are merely pleiotropic or causal. However, our results lend themselves to the idea of multiple underlying traits that are associated with autism, genetic and other that contributes to risk for autism.

193

Approximately 11% of variance in systemizing was explained by SNPs, and this was similar between males and females. To our knowledge, there is no study examining heritability of the SQ-R in twins. Given the modest sex-difference observed in the phenotypic scores, we conducted sex-stratified GWAS. Genetic correlation analyses suggested a near identical genetic architecture between the sexes. This was also supported by a similar additive SNP heritability between the sexes. Taken together, these results suggest that there is, by and large, a similar genetic architecture between sexes. The observed phenotypic differences can then be attributed to differences in environment, or other biological factors (such as prenatal hormones) which we were unable to fully test in this study.

In conclusion, we conduct the first GWAS of systemizing and identify three genome- wide significant loci. Genetic correlation analyses identified significant and replicable positive correlation with autism. This is independent of educational attainment, and is not correlated with social autistic phenotypes.

194

8. Discussion At the beginning of the PhD we set out to create a comprehensive map of the genetic correlates of autism by investigating the genetics of traits related to autism. Our approach was informed by two observations. First, the results from our work on the candidate gene studies outlined in Chapter 2, clearly indicated that candidate gene studies in small samples were far from ideal. They lacked the statistical power to clearly identify genetic risk for autism, and would be even more underpowered for other complex phenotypes. Second, given these findings we had to maximize sample sizes and investigate the genetic architecture in a hypothesis naive manner i.e. using a genome-wide approach rather than a candidate gene approach. Given this, we decided to investigate the genetics of phenotypes related to autism. As we were logistically and financially unprepared to conduct phenotyping and genotyping on a large cohort, we decided to find publicly available datasets with phenotypes of interests. Some of these phenotypes, notably the three phenotypes with 23andMe, are highly relevant to autism. Some of the other phenotypes, such as the social relationship satisfaction phenotypes, are less ideal. Ultimately, we had a trade-off between shallow phenotyping in large samples (UK Biobank) which increases statistical power, and deep phenotyping in smaller samples (akin to what has been conducted in the ALSPAC cohort).

The results from Chapter 2 makes it clear that sample size is important for understanding the role of common genetic variants in autism and other complex conditions. From the mid-1980s to till about the mid-2000s, the majority of the studies investigating the genetics of autism used a candidate gene approach. There were several reasons while candidate gene association studies were conducted including financial and technological viability. However, given the ample evidence against this approach, we strongly recommend avoiding candidate gene studies for future investigations.

We make it clear that whilst we have not found statistical evidence for association with autism after meta-analysis in Chapter 2, this does not preclude the possibility that these genes do contribute to the risk for autism, albeit with small effect sizes. Our results merely suggest that the current evidence stemming from candidate gene association studies is not statistically reliable or sufficient for implicating these genes in autism. We have investigated only common variants, and it is likely that many of these genes may contribute to risk for autism through other means, such as having an excess of rare variants in autistic individuals compared to controls, or being regulated by genes implicated in autism such as CHD8 and FMR1 (Bernier et al., 2014; Pinto et al., 2014). Our results also suggest that curated gene lists from databases

195

such as SFARI gene must be cautiously used for gene enrichment analyses, as these databases integrate evidence from multiple different sources.

The issue of statistical power is also pertinent to Chapter 5. One of the issues with many of the GWAS on neuropsychiatric conditions and related phenotypes is the inability to conduct deep phenotyping. Deep phenotyping can obtain more precise estimates of the phenotype, reducing measurement error. Deep phenotyping is time consuming and arduous. Further, for many of these phenotypes, there is considerable heterogeneity in effects owing to age, sex, and other factors. For example, it is well known that there are age-related differences in cognitive aptitude (Bouchard, 2013). In Chapter 4, we conducted a GWAS of cognitive empathy using a relatively shallow phenotyping approach without stratifying based on age. In Chapter 5, we conducted a GWAS of a related phenotype, theory of mind measured using the Emotional Triangles Task (which is more time-consuming than the Eyes Test used in Chapter 4), but restricted it to specific age group. We were unable to significantly estimate SNP heritability, or identify significant SNPs for the latter. These results suggest that, given current constraints on phenotyping, a shallow phenotyping approach over a larger sample size is statistically more powerful.

Chapters 3 to 7 present the results of GWAS studies for a few phenotypes related to autism. Together the GWAS studies described above and other studies provide an early map of the pleiotropy/causal risk for autism stemming from common variants (Figure 1). Several interesting findings emerge from this. First, hierarchical clustering based on similarities place systemizing very close to the two autism GWAS. This is likely due to the fact that systemizing is not genetically correlated with most of the other phenotypes save the two measures of intelligence – educational attainment and cognitive aptitude – and schizophrenia. The genetic correlations between systemizing and these three phenotypes are smaller in magnitude when compared with the genetic correlation between autism and systemizing. It is also clear that intelligence (measured using both cognitive aptitude and educational attainment) has a central role in many of these phenotypes tested. Intelligence is genetically correlated with empathy, social relationship satisfaction, cognitive empathy, and systemizing. Cognitive empathy, theory of mind, and intelligence cluster together suggesting that there is an underlying latent phenotype that contributes to all these phenotypes. This is not surprising as there is considerable evidence that shows that people with higher intelligence tend to perform better on tasks of theory of mind/cognitive empathy (Buitelaar et al., 1999). This could possibly explain why there is a non-significant genetic correlation between cognitive empathy and autism (the

196

estimate of the genetic correlation is positive, though not statistically different from 0) when several studies have showed that autistic individuals have difficulties in cognitive empathy/theory of mind. To truly delineate the role of cognitive empathy in autism, we should investigate the contributions of cognitive empathy independent of intelligence given the modest, positive genetic correlation between autism and intelligence. There is also a need to disentangle the role of intelligence in autism. There is a paradoxical relationship between intelligence and autism. Approximately 40% of autistic individuals are co-morbid for intellectual disability (Lai et al., 2013). While de novo loss-of-function mutations and CNVs associated with autism also decrease intelligence, common genetic variation associated with autism contributes to higher intelligence. Can the positive genetic correlation between intelligence and autism be explained due to pleiotropy with other phenotypes (for example, the negative genetic correlation with the EQ and social relationship satisfaction for both the phenotypes and the positive genetic correlation with the SQ-R for both phenotypes)? A recent study suggests that the genetic correlation between autism and educational attainment is due to pleiotropy and not causative (O’Connor & Price, 2017). This is a question that needs further investigation. The correlation heatmap also suggests that social relationship satisfaction, subjective wellbeing and self-reported empathy are correlated with each other, clustering together in the plot. This is probably because there is an underlying social interaction phenotype that contributes to all these traits.

197

Figure 1: Genetic correlation heatmap of the phenotypes investigated in this thesis and other relevant phenotypes

This figure provides a correlation heatmap of all the phenotypes investigated in this thesis and other relevant phenotypes. *indicates significant genetic correlations after Bonferroni correction. The intercept has not been constrained.

Our current analyses do not address the issue of causation vs pleiotropy. This is difficult to delineate both mathematically and biologically. Both autism and the phenotypes investigated are complex, neurodevelopmental traits. It is therefore difficult to see an clear chain of causality. All the phenotypes investigated are emergent properties of neural networks. It is possible that the genetic variants associated with these phenotypes orchestrate the development of neural circuits, which in turn contribute to the risk for neuropsychiatric conditions and the development of psychological phenotypes. This causal chain of events can be formally investigated using recently developed statistical methods (O’Connor & Price, 2017) in cohorts such as the UK Biobank and ENIGMA that have neuroimaging and genetic datasets.

198

Overall, for the GWAS, we are more confident about the results of the broader analyses. Genetic correlations, SNP heritabilities, and sex differences have a high probability of being replicated in independent samples. However, we are less confident about the genome-wide associations for several reasons. First, none of these cohorts are without biases. For example, in the 23andMe sample, the mean SQ-R score was significantly higher than the mean reported in the general population, possibly because people who have scientifically-minded are more likely to participate in direct-to-consumer genetic testing programmes. Similarly, there is an attrition of people with psychiatric conditions in voluntary biobanks. This is demonstrated by the significantly lower number of people with schizophrenia and autism in the UK Biobank compared to the general population means. Participants in both these cohorts are better educated and, on average, are richer. Further, some of the phenotypes are also likely to be culturally sensitive - there are cultural differences in both empathy and cognitive empathy. These results suggest that there are likely to be differences in results between cohorts. These differences are likely to be small for the global analyses such as genetic correlation. However, to gain confidence in the genome-wide significant SNP results, these results must be replicated in a statistically well-powered independent sample.

Where possible, we are embarking on replicating the results. We are collaborating with researchers from multiple institutions to collect additional data on cognitive empathy using the ‘Reading the Mind in the Eyes’ Test. Meta-analysis of the current data with new data will help strengthen the results of the current study. With some of the other phenotypes tested, replication is a bit more challenging, and so we are hoping to conduct proxy-replication to provide convergent validity. One such phenotype is systemizing. There is some evidence that systemizing is phenotypically correlated with aptitude in Science, Technology, Engineering and Mathematics (Baron-Cohen et al., 2003). We are conducting a GWAS analyses of STEM occupations in the UK Biobank to provide convergent validity for the systemizing results.

It is important to discuss the issue of multiple testing correction. Where possible we have used a more stringent Bonferroni correction. Notably, for all the genetic correlation analyses except in Chapter 6, we have used Bonferroni correction for genetic correlation. These tests are not completely independent so Bonferroni corrected results are likely to be a bit conservative. However, in Chapter 6 we have used an FDR correction given the very high genetic correlation between the two main phenotypes tested: Family relationship satisfaction and friendship satisfaction. We were of the opinion that Bonferroni correction would have been

199

very conservative in this scenario. We note that the majority of the genetic correlations would be significant even after Bonferroni correction.

Another area where we had several discussions regarding the threshold of statistical significance to use is GWAS studies where we performed sex-stratified GWAS results in addition to the non-stratified GWAS results. These tests are not completely independent of each other given the high genetic correlation between the sexes, which, with the exception of in cognitive empathy, was not statistically different from rg = 1. Further, a few studies have used a genome-wide threshold of 5x10-8 when sex-stratified analyses were conducted in addition to non-stratified analyses (Hammerschlag et al., 2017). For cognitive empathy, the only genome-wide significant SNP is significant even at a more stringent P-value threshold of 1.6x10-8, which is the Bonferroni corrected alpha after accounting for three tests.

We plan to extend the current work on the genetics of autism and related traits in three studies.

In the first study, we will conduct a large-scale genome-wide association study of autistic traits using the short-version of the Autism Spectrum Quotient (AQ-10). This builds on the success of several studies that have examined traits underlying psychiatric conditions (neuroticism, depressive symptoms, subjective wellbeing) (Nagel et al., 2017; Okbay, Baselmans, et al., 2016). We focus on autistic traits rather than autism so as to phenotype the general population and thus increase sample size and statistical power; and because autistic traits capture greater variation in the underlying latent trait that does a case-control design. We propose to conduct a large-scale genome-wide association study (GWAS) meta-analysis of autistic traits using the AQ-10 in multiple cohorts, including the UK Biobank. The Autism Spectrum Quotient-10, or the AQ-10, was developed by using the 10 items from the 50-item Autism Spectrum Quotient (AQ) that showed the greatest discrimination between autism and the general population (20). On the AQ-10 participants have to choose one of four options (definitely agree, agree, disagree, and definitely disagree). The score ranges from 0 to 10. It can be completed in less than five minutes, making this an easy to administer, quick measure of autistic traits. In adults, a score above 6 had a sensitivity of 0.88, specificity of 0.91, and positive predictive value (PPV) was 0.85 at diagnosing autism (Allison, Auyeung, Baron- Cohen, Bolton, & Brayne, 2012). Further, the AQ-10 has questions relating to both the social and the non-social domain and will help us to further parse the distinctions between social traits related to autism and non-social traits related to autism. Meta-analysis using both the PGC and

200

the iPSYCH autism datasets will help identify significant loci, which can then be followed with fine-mapping and functional analyses.

Whilst the first study focusses on autistic traits, the second study will investigate the genetic correlates of intermediate phenotypes associated with autism derived from imaging datasets in the UK Biobank. We hypothesize that there is a causal chain that links genetics with the observed phenotype: Altered gene networks alter neural development and neural network formation which then increases the risk for autism. It is well known that structural changes in the brain contribute to cognitive processes associated with autism and morphological and functional difference in brain organization in autism are well documented (Hazlett et al., 2017; Padmanabhan, Lynch, Schaer, & Menon, 2017). Neural network organization (structural or resting state) has proven to be a useful endophenotype, as it links changes at the miscroscopic level (genes, molecular and cellular pathways) with the mesoscopic (neural morphology) and the mascroscopic (cognition, behaviour and clinical phenotypes). One such example of altered neural networks in autism is the default mode network (DMN) which has received substantial attention as it is thought to play a vital role in processing information about self and others, including aspects of higher cognition such as theory of mind, social communication, and emotional interaction, all of which are important for social interaction (Padmanabhan et al., 2017). There is considerable evidence that the DMN is the most strongly disrupted neural network in autism, owing in part to its importance in aspects of social interaction. We will conduct a GWAS of imaging phenotypes related to autism in the UK Biobank.

In the third study, we will integrate multiple sources of common variants using a multiple polygenic risk score method. Using GWAS data for multiple phenotypes related to autism, we will first attempt to quantify each individual’s genetic predisposition to phenotypes related to autism (such as intellectual disability, epilepsy, and depression). This will be done by constructing polygenic scores in two richly phenotyped cohorts of autistic individuals – the Simon’s Simplex Collection, and the Autism Genetic Resource Exchange. This will help understand if the predictive power from polygenic scores is substantially higher after including multiple related traits. Further, individual polygenic scores in autistic and non-autistic individuals can be used to cluster individuals into subgroups. Multiple polygenic risk scores should also be integrated with rare genetic variants and CNVs to identify how allelic variation across the frequency spectrum together contribute to risk for autism and other co-morbid conditions. For example, it is well established that both CNVs and de novo loss of function mutations are associated with reduced IQ both in the general population and in autistic cohorts

201

(Kosmicki et al., 2017; Samocha et al., 2014; Sanders et al., 2015). However, it is less clear if the presence of IQ increasing alleles modifies the effect of these genetic variants and to what extent.

In conclusion, we present the results of studies investigating the genetic correlates of autism and phenotypes related to autism. Meta-analysis of candidate gene association studies in autism do not identify any replicable association highlighting that these studies are statistically underpowered. We identify novel genetic variants associated with these phenotypes, and identify genetic correlation with autism for most of these phenotypes. These phenotypes are not uniquely genetically correlated with autism, but many of them are also genetically correlated with other psychiatric conditions and intelligence. Our research provides a genetic map of phenotypes related to autism, and highlight potential avenues for future research.

202

References Abu-Akel, A., & Shamay-Tsoory, S. (2011). Neuroanatomical and neurochemical bases of theory of mind. Neuropsychologia, 49(11), 2971–2984. http://doi.org/10.1016/j.neuropsychologia.2011.07.012 Allison, C., Auyeung, B., Baron-Cohen, S., Bolton, P. F., & Brayne, C. (2012). Toward brief “Red Flags” for autism screening: The Short Autism Spectrum Quotient and the Short Quantitative Checklist for Autism in toddlers in 1,000 cases and 3,000 controls [corrected]. Journal of the American Academy of Child and Adolescent Psychiatry, 51(2), 202–212.e7. http://doi.org/10.1016/j.jaac.2011.11.003 Allison, C., Baron-Cohen, S., Stone, M. H., & Muncer, S. J. (2015). Rasch modeling and confirmatory factor analysis of the systemizing quotient-revised (SQ-R) scale. The Spanish Journal of Psychology, 18, E16. http://doi.org/10.1017/sjp.2015.19 Allison, C., Baron-Cohen, S., Wheelwright, S. J., Stone, M. H., & Muncer, S. J. (2011). Psychometric analysis of the Empathy Quotient (EQ). Personality and Individual Differences, 51(7), 829–835. http://doi.org/10.1016/j.paid.2011.07.005 American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Ang, G. K., & Pridmore, S. (2009). Theory of Mind and Psychiatry: An Introduction. Australasian Psychiatry, 17(2), 117–122. http://doi.org/10.1080/10398560802375982 Anney, R. J. L., Klei, L. L., Pinto, D., Regan, R., Conroy, J., Magalhaes, T. R., … Bolte, S. (2010). A genome-wide scan for common alleles affecting risk for autism. Human Molecular Genetics, 19(20), 4072–82. http://doi.org/10.1093/hmg/ddq307 Ardlie, K. G., Deluca, D. S., Segre, A. V., Sullivan, T. J., Young, T. R., Gelfand, E. T., … The GTEx Consortium. (2015). The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science, 348(6235), 648–60. http://doi.org/10.1126/science.1262110 Asperger, H. (1944). “Autistic psychopathy” in childhood. In U. Frith (Ed.), Autism and Asperger syndrome (pp. 37–92). Cambridge: Cambridge University Press. http://doi.org/10.1017/CBO9780511526770.002 Auyeung, B., Baron-Cohen, S., Chapman, E., Knickmeyer, R., Taylor, K., & Hackett, G. (2006). Foetal testosterone and the child systemizing quotient. European Journal of Endocrinology, 155(suppl_1), S123–S130. http://doi.org/10.1530/eje.1.02260 Auyeung, B., Lombardo, M. V., & Baron-Cohen, S. (2013). Prenatal and postnatal hormone effects on the human brain and cognition. Pflügers Archiv : European Journal of Physiology, 465(5), 557–71. http://doi.org/10.1007/s00424-013-1268-2 Baker, C. A., Peterson, E., Pulos, S., & Kirkland, R. A. (2014). Eyes and IQ: A meta-analysis of the relationship between intelligence and “Reading the Mind in the Eyes.” Intelligence, 44(1), 78– 92. http://doi.org/10.1016/j.intell.2014.03.001 Barbeira, A., Shah, K. P., Torres, J. M., Wheeler, H. E., Torstenson, E. S., Edwards, T., … Im, H. K. (2016). MetaXcan: Summary statistics based gene-Level association method infers accurate PrediXcan results. bioRxiv. Baron-Cohen, S. (1991). Precursors to a theory of mind: Understanding attention in others. In A. Whiten (Ed.), Natural theories of mind: Evolution, development and simulation of everyday mindreading (pp. 233–251). Cambridge: Basil Blackwell. Baron-Cohen, S. (2006). The hyper-systemizing, assortative mating theory of autism. Progress in Neuro-Psychopharmacology & Biological Psychiatry, 30(5), 865–72.

203

http://doi.org/10.1016/j.pnpbp.2006.01.010 Baron-Cohen, S. (2009). Autism: The Empathizing-Systemizing (E-S) Theory. Annals of the New York Academy of Sciences, 1156(1), 68–80. http://doi.org/10.1111/j.1749-6632.2009.04467.x Baron-Cohen, S. (2010). Empathizing, systemizing, and the extreme male brain theory of autism. Progress in Brain Research, 186, 167–75. http://doi.org/10.1016/B978-0-444-53630-3.00011-7 Baron-Cohen, S. (2011). Zero degrees of empathy : a new theory of human cruelty. London: Allen Lane. Baron-Cohen, S., Baldwin, D. A., & Crowson, M. (1997). Do Children with Autism Use the Speaker’s Direction of Gaze Strategy to Crack the Code of Language? Child Development, 68(1), 48–57. http://doi.org/10.2307/1131924 Baron-Cohen, S., Bowen, D. C., Holt, R. J., Allison, C., Auyeung, B., Lombardo, M. V., … Lai, M.- C. (2015). The “Reading the Mind in the Eyes” test: complete absence of typical sex difference in ~400 men and women with autism. PloS One, 10(8), e0136521. http://doi.org/10.1371/journal.pone.0136521 Baron-Cohen, S., Cassidy, S., Auyeung, B., Allison, C., Achoukhi, M., Robertson, S., … Lai, M.-C. (2014). Attenuation of typical sex differences in 800 adults with autism vs. 3,900 controls. PloS One, 9(7), e102251. http://doi.org/10.1371/journal.pone.0102251 Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a “theory of mind”? Cognition, 21(1), 37–46. Baron-Cohen, S., Richler, J., Bisarya, D., Gurunathan, N., & Wheelwright, S. J. (2003). The systemizing quotient: an investigation of adults with Asperger syndrome or high-functioning autism, and normal sex differences. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 358(1430), 361–74. http://doi.org/10.1098/rstb.2002.1206 Baron-Cohen, S., & Wheelwright, S. J. (2004). The Empathy Quotient: an investigation of adults with Asperger syndrome or high functioning autism, and normal sex differences. Journal of Autism and Developmental Disorders, 34(2), 163–75. Baron-Cohen, S., Wheelwright, S. J., Burtenshaw, A., & Hobson, E. (2007). Mathematical talent is linked to autism. Human Nature, 18(2), 125–131. http://doi.org/10.1007/s12110-007-9014-0 Baron-Cohen, S., Wheelwright, S. J., Hill, J., Raste, Y., & Plumb, I. (2001). The “Reading the Mind in the Eyes” Test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 42(2), 241–51. Baron-Cohen, S., Wheelwright, S. J., Skinner, R., Martin, J., & Clubley, E. (2001). The autism- spectrum quotient (AQ): evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31(1), 5–17. Baron-Cohen, S., Wheelwright, S. J., Stott, C., Bolton, P. F., & Goodyer, I. M. (1997). Is there a link between engineering and autism? Autism, 1(1), 101–109. http://doi.org/10.1177/1362361397011010 Beadle, J. N., Paradiso, S., Salerno, A., & McCormick, L. M. (2013). Alexithymia, emotional empathy, and self-regulation in anorexia nervosa. Annals of Clinical Psychiatry : Official Journal of the American Academy of Clinical Psychiatrists, 25(2), 107–20. Benyamin, B., Pourcain, B., Davis, O. S. P., Davies, G., Hansell, N. K., Brion, M.-J. A., … Visscher, P. M. (2014). Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Molecular Psychiatry, 19(2), 253–8. http://doi.org/10.1038/mp.2012.184

204

Berlim, M. T., McGirr, A., Beaulieu, M.-M., & Turecki, G. (2012). Theory of mind in subjects with major depressive disorder: is it influenced by repetitive transcranial magnetic stimulation? The World Journal of Biological Psychiatry : The Official Journal of the World Federation of Societies of Biological Psychiatry, 13(6), 474–9. http://doi.org/10.3109/15622975.2011.615861 Bernier, R., Golzio, C., Xiong, B., Stessman, H. A., Coe, B. P., Penn, O., … Eichler, E. E. (2014). Disruptive CHD8 mutations define a subtype of autism aarly in development. Cell, 158(2), 263– 276. http://doi.org/10.1016/j.cell.2014.06.017 Bigdeli, T. B., Lee, D., Webb, B. T., Riley, B. P., Vladimirov, V. I., Fanous, A. H., … Bacanu, S.-A. (2016). A simple yet accurate correction for winner’s curse can predict signals discovered in much larger genome scans. Bioinformatics, 32(17), 2598–2603. http://doi.org/10.1093/bioinformatics/btw303 Bishop, D. (1998). Development of the Children’s Communication Checklist (CCC): a method for assessing qualitative aspects of communicative impairment in children. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 39(6), 879–91. Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., … Fox, J. (2011). OpenMx: An open source extended structural equation modeling framework. Psychometrika, 76(2), 306–317. http://doi.org/10.1007/s11336-010-9200-6 Bölte, S., Westerwald, E., Holtmann, M., Freitag, C. M., & Poustka, F. (2011). Autistic traits and autism spectrum disorders: the clinical validity of two measures presuming a continuum of social communication skills. Journal of Autism and Developmental Disorders, 41(1), 66–72. http://doi.org/10.1007/s10803-010-1024-9 Bora, E., Gökçen, S., & Veznedaroglu, B. (2008). Empathic abilities in people with schizophrenia. Psychiatry Research, 160(1), 23–9. http://doi.org/10.1016/j.psychres.2007.05.017 Boraston, Z., Blakemore, S.-J., Chilvers, R., & Skuse, D. (2007). Impaired sadness recognition is linked to social interaction deficit in autism. Neuropsychologia, 45(7), 1501–1510. http://doi.org/10.1016/j.neuropsychologia.2006.11.010 Bos, E. H., Snippe, E., de Jonge, P., Jeronimus, B. F., Duguid, J., & Bohlmeijer, E. (2016). Preserving subjective wellbeing in the face of psychopathology: buffering effects of personal strengths and resources. PLOS ONE, 11(3), e0150867. http://doi.org/10.1371/journal.pone.0150867 Bouchard, T. J. (2013). The Wilson Effect: the increase in heritability of IQ with age. Twin Research and Human Genetics : The Official Journal of the International Society for Twin Studies, 16(5), 923–30. http://doi.org/10.1017/thg.2013.54 Bourgeron, T. (2015). From the genetic architecture to synaptic plasticity in autism spectrum disorder. Nature Reviews Neuroscience, 16(9), 551–563. http://doi.org/10.1038/nrn3992 Bourgeron, T. (2016). Current knowledge on the genetics of autism and propositions for future research. Comptes Rendus Biologies, 339(7), 300–307. http://doi.org/10.1016/j.crvi.2016.05.004 Boyd, A., Golding, J., Macleod, J., Lawlor, D. A., Fraser, A., Henderson, J., … Davey-Smith, G. (2013). Cohort Profile: The “Children of the 90s”—the index offspring of the Avon Longitudinal Study of Parents and Children. International Journal of Epidemiology, 42(1), 111– 127. http://doi.org/10.1093/ije/dys064 Bralten, J., van Hulzen, K. J., Martens, M. B., Galesloot, T. E., Arias Vasquez, A., Kiemeney, L. A. L. M., … Poelmans, G. (2017). Autism spectrum disorders and autistic traits share genetics and biology. Molecular Psychiatry. http://doi.org/10.1038/mp.2017.98 Brewer, R., Cook, R., Cardi, V., Treasure, J., & Bird, G. (2015). Emotion recognition deficits in eating disorders are explained by co-occurring alexithymia. Royal Society Open Science, 2(1), 140382. http://doi.org/10.1098/rsos.140382

205

Bring, J. (1994). How to standardize regression coefficients. The American Statistician, 48(3), 209. http://doi.org/10.2307/2684719 Browning, S. R., & Browning, B. L. (2007). Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies By use of localized haplotype clustering. The American Journal of Human Genetics, 81(5), 1084–1097. http://doi.org/10.1086/521987 Buitelaar, J. K., van der Wees, M., Swaab-Barneveld, H., & van der Gaag, R. J. (1999). Verbal memory and Performance IQ predict theory of mind and emotion recognition ability in children with autistic spectrum disorders and in psychiatric control children. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 40(6), 869–81. Bulik-Sullivan, B. K., Finucane, H. K., Anttila, V., Gusev, A., Day, F. R., Loh, P.-R., … Neale, B. M. (2015). An atlas of genetic correlations across human diseases and traits. Nature Genetics, 47(11), 1236–41. http://doi.org/10.1038/ng.3406 Bulik-Sullivan, B. K., Loh, P.-R., Finucane, H. K., Ripke, S., Yang, J., Patterson, N., … Neale, B. M. (2015). LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics, 47(3), 291–295. http://doi.org/10.1038/ng.3211 Bycroft, C., Freeman, C., Petkova, D., Band, G., Delaneau, O., Connell, J. O., … Welsh, S. (2017). Genome-wide genetic data on ~ 500 , 000 UK Biobank participants. bioRxiv. http://doi.org/http://dx.doi.org/10.1101/166298 Cable, N., Bartley, M., Chandola, T., & Sacker, A. (2013). Friends are equally important to men and women, but family matters more for men’s well-being. Journal of Epidemiology and Community Health, 67(2), 166–171. http://doi.org/10.1136/jech-2012-201113 Calderoni, S., Fantozzi, P., Maestro, S., Brunori, E., Narzisi, A., Balboni, G., & Muratori, F. (2013). Selective cognitive empathy deficit in adolescents with restrictive anorexia nervosa. Neuropsychiatric Disease and Treatment, 9, 1583–9. http://doi.org/10.2147/NDT.S50214 Campanella, F., Shallice, T., Ius, T., Fabbro, F., & Skrap, M. (2014). Impact of brain tumour location on emotion and personality: a voxel-based lesion-symptom mapping study on mentalization processes. Brain : A Journal of Neurology, 137(Pt 9), 2532–45. http://doi.org/10.1093/brain/awu183 Cantor, R. M., Navarro, L., Won, H., Walker, R. L., Lowe, J. K., & Geschwind, D. H. (2017). ASD restricted and repetitive behaviors associated at 17q21.33: genes prioritized by expression in fetal brains. Molecular Psychiatry. http://doi.org/10.1038/mp.2017.114 Carter, C., & Evans, K. (1996). Inheritance of congenital pyloric stenosis. J Med Genet., 6(3), 233– 54. Chakrabarti, B., Dudbridge, F., Kent, L., Wheelwright, S. J., Hill-Cawthorne, G., Allison, C., … Baron-Cohen, S. (2009). Genes related to sex steroids, neural growth, and social-emotional behavior are associated with autistic traits, empathy, and Asperger syndrome. Autism Research : Official Journal of the International Society for Autism Research, 2(3), 157–77. http://doi.org/10.1002/aur.80 Chapman, E., Baron-Cohen, S., Auyeung, B., Knickmeyer, R., Taylor, K., & Hackett, G. (2006). Fetal testosterone and empathy: evidence from the Empathy Quotient (EQ) and the “Reading the Mind in the Eyes” test. Social Neuroscience, 1(2), 135–48. http://doi.org/10.1080/17470910600992239 Charlton, R. A., Barrick, T. R., Markus, H. S., & Morris, R. G. (2009). Theory of mind associations with other cognitive functions and brain imaging in normal aging. Psychology and Aging, 24(2), 338–48. http://doi.org/10.1037/a0015225 Chaste, P., Klei, L. L., Sanders, S. J., Hus, V., Murtha, M. T., Lowe, J. K., … Devlin, B. (2015). A

206

genome-wide association study of autism using the Simons Simplex Collection: Does reducing phenotypic heterogeneity in autism increase genetic homogeneity? Biological Psychiatry, 77(9), 775–84. http://doi.org/10.1016/j.biopsych.2014.09.017 Chen, C.-Y., Lopes-Ramos, C. M., Kuijjer, M. L., Paulson, J. N., Sonawane, A. R., Fagny, M., … DeMeo, D. L. (2016). Sexual dimorphism in gene expression and regulatory networks across human tissues. bioRxiv. Ching, M. S. L., Shen, Y., Tan, W.-H., Jeste, S. S., Morrow, E. M., Chen, X., … Children’s Hospital Boston Genotype Phenotype Study Group, on behalf of the C. H. B. G. P. S. (2010). Deletions of NRXN1 (neurexin-1) predispose to a wide spectrum of developmental disorders. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the International Society of Psychiatric Genetics, 153B(4), 937–47. http://doi.org/10.1002/ajmg.b.31063 Christov-Moore, L., Simpson, E. A., Coudé, G., Grigaityte, K., Iacoboni, M., & Ferrari, P. F. (2014). Empathy: gender effects in brain and behavior. Neuroscience and Biobehavioral Reviews, 46 Pt 4(Pt 4), 604–27. http://doi.org/10.1016/j.neubiorev.2014.09.001 Chung, B. H.-Y., Tao, V. Q., & Tso, W. W.-Y. (2014). Copy number variation and autism: New insights and clinical implications. Journal of the Formosan Medical Association, 113(7), 400– 408. http://doi.org/10.1016/j.jfma.2013.01.005 Clarke, T.-K., Lupton, M. K., Fernandez-Pujals, A. M., Starr, J., Davies, G., Cox, S. R., … McIntosh, A. M. (2015). Common polygenic risk for autism spectrum disorder (ASD) is associated with cognitive ability in the general population. Molecular Psychiatry, 21(3), 419–25. Colvert, E., Tick, B., McEwen, F., Stewart, C., Curran, S. R., Woodhouse, E., … Bolton, P. F. (2015). Heritability of autism spectrum disorder in a UK population-based twin sample. JAMA Psychiatry, 72(5), 415–23. http://doi.org/10.1001/jamapsychiatry.2014.3028 Connellan, J., Baron-Cohen, S., Wheelwright, S. J., Batki, A., & Ahluwalia, J. (2000). Sex differences in human neonatal social perception. Infant Behavior and Development, 23(1), 113–118. http://doi.org/10.1016/S0163-6383(00)00032-1 Constable, J. R. L., Graham, M. E., Morgan, A., & Burgoyne, R. D. (2005). Amisyn regulates exocytosis and fusion pore stability by both Syntaxin-dependent and Syntaxin-independent mechanisms. Journal of Biological Chemistry, 280(36), 31615–31623. http://doi.org/10.1074/jbc.M505858200 Constantino, J. N., Zhang, Y., Frazier, T. W., Abbacchi, A. M., & Law, P. (2010). Sibling recurrence and the genetic epidemiology of autism. American Journal of Psychiatry, 167(11), 1349–1356. http://doi.org/10.1176/appi.ajp.2010.09101470 Cross-Disorder Group of the Psychiatric Genomics Consortium. (2013). Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet, 381(9875), 1371–9. http://doi.org/10.1016/S0140-6736(12)62129-1 Cusi, A. M., Macqueen, G. M., & McKinnon, M. C. (2012). Patients with bipolar disorder show impaired performance on complex tests of social cognition. Psychiatry Research, 200(2–3), 258–64. http://doi.org/10.1016/j.psychres.2012.06.021 Davis, M. H., Luce, C., & Kraus, S. J. (1994). The heritability of characteristics associated with dispositional empathy. Journal of Personality, 62(3), 369–91. Dawson, G., Meltzoff, A. N., Osterling, J., Rinaldi, J., & Brown, E. (1998). Children with autism fail to orient to naturally occurring social stimuli. Journal of Autism and Developmental Disorders, 28(6), 479–485. http://doi.org/10.1023/A:1026043926488 de Leeuw, C. A., Mooij, J. M., Heskes, T., & Posthuma, D. (2015). MAGMA: generalized gene-set

207

analysis of GWAS data. PLoS Computational Biology, 11(4), 1–19. http://doi.org/10.1371/journal.pcbi.1004219 de Moor, M. H. M., Costa, P. T., Terracciano, A., Krueger, R. F., de Geus, E. J. C., Toshiko, T., … Boomsma, D. I. (2012). Meta-analysis of genome-wide association studies for personality. Molecular Psychiatry, 17(3), 337–49. http://doi.org/10.1038/mp.2010.128 de Moor, M. H. M., van den Berg, S. M., Verweij, K. J. H., Krueger, R. F., Luciano, M., Arias- Vasquez, A., … Boomsma, D. I. (2015). Meta-analysis of Genome-wide Association Studies for Neuroticism, and the Polygenic Association With Major Depressive Disorder. JAMA Psychiatry, 72(7), 642. http://doi.org/10.1001/jamapsychiatry.2015.0554 De Rubeis, S., He, X., Goldberg, A. P., Poultney, C. S., Samocha, K., Ercument Cicek, A., … UK10K Consortium. (2014). Synaptic, transcriptional and chromatin genes disrupted in autism. Nature, 515(7526), 209–15. http://doi.org/10.1038/nature13772 de Zeeuw, E. L., van Beijsterveldt, C. E. M., Hoekstra, R. A., Bartels, M., & Boomsma, D. I. (2017). The etiology of autistic traits in preschoolers: a population-based twin study. Journal of Child Psychology and Psychiatry, 58(8), 893–901. http://doi.org/10.1111/jcpp.12741 Decety, J. (2010). The neurodevelopment of empathy in humans. Developmental Neuroscience, 32(4), 257–67. http://doi.org/10.1159/000317771 Decety, J., Bartal, I. B.-A., Uzefovsky, F., & Knafo-Noam, A. (2015). Empathy as a driver of prosocial behaviour: highly conserved neurobehavioural mechanisms across species. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 371(1686), 20150077. Decety, J., & Moriguchi, Y. (2007). The empathic brain and its dysfunction in psychiatric populations: implications for intervention across different clinical conditions. BioPsychoSocial Medicine, 1, 22. http://doi.org/10.1186/1751-0759-1-22 Delaneau, O., Marchini, J., & Zagury, J.-F. (2011). A linear complexity phasing method for thousands of genomes. Nature Methods, 9(2), 179–181. http://doi.org/10.1038/nmeth.1785 Demontis, D., Walters, R. K., Martin, J., Mattheisen, M., Als, T. D., Agerbo, E., … Neale, B. M. (2017). Discovery of the first genome-wide significant risk loci for ADHD. bioRxiv. Derntl, B., Seidel, E.-M., Schneider, F., & Habel, U. (2012). How specific are emotional deficits? A comparison of empathic abilities in schizophrenia, bipolar and depressed patients. Schizophrenia Research, 142(1–3), 58–64. http://doi.org/10.1016/j.schres.2012.09.020 Di Napoli, A., Warrier, V., Baron-Cohen, S., & Chakrabarti, B. (2014). Genetic variation in the oxytocin (OXTR) gene is associated with Asperger Syndrome. Molecular Autism, 5(1), 48. http://doi.org/10.1186/2040-2392-5-48 Diener, E., Suh, E. M., Lucas, R. E., & Smith, H. L. (1999). Subjective Well-Being: Three decades of progress. Psychological Bulletin, 125(2), 276–302. Dinsdale, N., Mokkonen, M., & Crespi, B. (2016). The “extreme female brain”: increased cognitive empathy as a dimension of psychopathology. Evolution and Human Behavior, 37(4), 323–336. http://doi.org/10.1016/j.evolhumbehav.2016.02.003 Do, C. B., Tung, J. Y., Dorfman, E., Kiefer, A. K., Drabant, E. M., Francke, U., … Eriksson, N. (2011). Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson’s disease. PLoS Genetics, 7(6), e1002141. http://doi.org/10.1371/journal.pgen.1002141 DSM-IV. (1994). Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition. Washington, DC: American Psychiatric Publishing, Inc.

208

Duncan, L. E., Yilmaz, Z., Gaspar, H., Walters, R. K., Goldstein, J. I., Anttila, V., … Bulik, C. M. (2017). Significant locus and metabolic genetic correlations revealed in genome-wide association study of anorexia nervosa. American Journal of Psychiatry, appi.ajp.2017.1. http://doi.org/10.1176/appi.ajp.2017.16121402 Durand, C. M., Betancur, C., Boeckers, T. M., Bockmann, J., Chaste, P., Fauchereau, F., … Bourgeron, T. (2007). Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nature Genetics, 39(1), 25–27. http://doi.org/10.1038/ng1933 Durdiaková, J., Warrier, V., Banerjee-Basu, S., Baron-Cohen, S., & Chakrabarti, B. (2014). STX1A and Asperger syndrome: a replication study. Molecular Autism, 5(1), 14. http://doi.org/10.1186/2040-2392-5-14 Durdiaková, J., Warrier, V., Baron-Cohen, S., & Chakrabarti, B. (2014). Single nucleotide polymorphism rs6716901 in SLC25A12 gene is associated with Asperger syndrome. Molecular Autism, 5(1), 25. http://doi.org/10.1186/2040-2392-5-25 Ellis, S. E., Panitch, R., West, A. B., & Arking, D. E. (2016). Transcriptome analysis of cortical tissue reveals shared sets of downregulated genes in autism and schizophrenia. Translational Psychiatry, 6(5), e817. http://doi.org/10.1038/tp.2016.87 Emde, R. N., Plomin, R., Robinson, J. A., Corley, R. P., DeFries, J. C., Fulker, D. W., … Zahn- Waxler, C. (1992). Temperament, emotion, and cognition at fourteen months: the MacArthur Longitudinal Twin Study. Child Development, 63(6), 1437–55. Eriksson, N., Tung, J. Y., Kiefer, A. K., Hinds, D. A., Francke, U., Mountain, J. L., & Do, C. B. (2012). Novel associations for hypothyroidism include known autoimmune risk loci. PloS One, 7(4), e34442. http://doi.org/10.1371/journal.pone.0034442 Euesden, J., Lewis, C. M., & O’Reilly, P. F. (2015). PRSice: Polygenic Risk Score software. Bioinformatics, 31(9), 1466–1468. http://doi.org/10.1093/bioinformatics/btu848 Fernández-Abascal, E. G., Cabello, R., Fernández-Berrocal, P., & Baron-Cohen, S. (2013). Test-retest reliability of the “Reading the Mind in the Eyes” test: a one-year follow-up study. Molecular Autism, 4(1), 33. http://doi.org/10.1186/2040-2392-4-33 Finucane, H. K., Bulik-Sullivan, B. K., Gusev, A., Trynka, G., Reshef, Y., Loh, P.-R., … Price, A. L. (2015). Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature Genetics, 47(11), 1228–1235. http://doi.org/10.1038/ng.3404 Finucane, H. K., Reshef, Y., Anttila, V., Slowikowski, K., Gusev, A., Byrnes, A., … Price, A. (2017). Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. bioRxiv, 1–38. http://doi.org/https://doi.org/10.1101/103069 Folstein, S. E., & Rutter, M. (1977). Infantile autism: a genetic study of 21 twin pairs. Journal of Child Psychology and Psychiatry, 18(4), 297–321. http://doi.org/10.1111/j.1469- 7610.1977.tb00443.x Frans, E. M., Sandin, S., Reichenberg, A., Långström, N., Lichtenstein, P., McGrath, J. J., & Hultman, C. M. (2013). Autism risk across generations: a population-based study of advancing grandpaternal and paternal age. JAMA Psychiatry, 70(5), 516–21. http://doi.org/10.1001/jamapsychiatry.2013.1180 Frazier, T. W., Youngstrom, E. A., Hardan, A. Y., Georgiades, S., Constantino, J. N., & Eng, C. (2015). Quantitative autism symptom patterns recapitulate differential mechanisms of genetic transmission in single and multiple incidence families. Molecular Autism, 6(1), 58. http://doi.org/10.1186/s13229-015-0050-z Frith, C. D., & Frith, U. (2005). Theory of mind. Current Biology, 15(17), R644–R645.

209

http://doi.org/10.1016/j.cub.2005.08.041 Fromer, M., Roussos, P., Sieberts, S. K., Johnson, J. S., Kavanagh, D. H., Perumal, T. M., … Sklar, P. (2016). Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nature Neuroscience, 19(11), 1442–1453. http://doi.org/10.1038/nn.4399 Fuchsberger, C., Abecasis, G. R., & Hinds, D. A. (2015). minimac2: faster genotype imputation. Bioinformatics, 31(5), 782–784. http://doi.org/10.1093/bioinformatics/btu704 Gandal, M. J., Haney, J., Parikshak, N., Leppa, V., Horvath, S., & Geschwind, D. H. (2016). Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. bioRxiv. http://doi.org/10.1101/040022 Gaugler, T., Klei, L. L., Sanders, S. J., Bodea, C. A., Goldberg, A. P., Lee, A. B., … Buxbaum, J. D. (2014). Most genetic risk for autism resides with common variation. Nature Genetics, 46(8), 881–5. http://doi.org/10.1038/ng.3039 Ge, T., Chen, C.-Y., Neale, B. M., Sabuncu, M. R., Smoller, J. W., & Tirrell, L. S. (2017). Phenome- wide heritability analysis of the UK Biobank. PLOS Genetics, 13(4), e1006711. http://doi.org/10.1371/journal.pgen.1006711 Geschwind, D. H., & Flint, J. (2015). Genetics and genomics of psychiatric disease. Science, 349(6255), 1489–1494. http://doi.org/10.1126/science.aaa8954 Geschwind, D. H., & State, M. W. (2015). Gene hunting in autism spectrum disorder: on the path to precision medicine. The Lancet Neurology, 14(11), 1109–1120. http://doi.org/10.1016/S1474- 4422(15)00044-7 Geurts, H. M., Verte, S., Oosterlaan, J., Roeyers, H., Hartman, C. A., Mulder, E. J., … Sergeant, J. A. (2004). Can the Children’s Communication Checklist differentiate between children with autism, children with ADHD, and normal controls? Journal of Child Psychology and Psychiatry, 45(8), 1437–1453. http://doi.org/10.1111/j.1469-7610.2004.00850.x Goodman, R. (1997). The Strengths and Difficulties Questionnaire: A Research Note. Journal of Child Psychology and Psychiatry, 38(5), 581–586. http://doi.org/10.1111/j.1469- 7610.1997.tb01545.x Goodman, R., Ford, T., Simmons, H., Gatward, R., & Meltzer, H. (2000). Using the Strengths and Difficulties Questionnaire (SDQ) to screen for child psychiatric disorders in a community sample. The British Journal of Psychiatry, 177(6), 534–539. Gratten, J., Wray, N. R., Keller, M. C., & Visscher, P. M. (2014). Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nature Neuroscience, 17(6), 782–790. http://doi.org/10.1038/nn.3708 Gratten, J., Wray, N. R., Peyrot, W. J., McGrath, J. J., Visscher, P. M., & Goddard, M. E. (2016). Risk of psychiatric illness from advanced paternal age is not predominantly from de novo mutations. Nature Genetics, 48(7), 718–724. http://doi.org/10.1038/ng.3577 Greenberg, S., Rosenblum, K. L., McInnis, M. G., & Muzik, M. (2014). The role of social relationships in bipolar disorder: A review. Psychiatry Research, 219(2), 248–254. http://doi.org/10.1016/j.psychres.2014.05.047 Groen, Y., Fuermaier, A. B. M., Den Heijer, A. E., Tucha, O., & Althaus, M. (2015). The Empathy and Systemizing Quotient: The psychometric properties of the Dutch version and a review of the cross-cultural stability. Journal of Autism and Developmental Disorders, 45(9), 2848–64. http://doi.org/10.1007/s10803-015-2448-z Grønborg, T. K., Schendel, D. E., & Parner, E. T. (2013). Recurrence of autism spectrum disorders in full- and half-siblings and trends over time: a population-based cohort study. JAMA Pediatrics,

210

167(10), 947–53. http://doi.org/10.1001/jamapediatrics.2013.2259 Gupta, S., Ellis, S. E., Ashar, F. N., Moes, A., Bader, J. S., Zhan, J., … Arking, D. E. (2014). Transcriptome analysis reveals dysregulation of innate immune response genes and neuronal activity-dependent genes in autism. Nature Communications, 5, 5748. http://doi.org/10.1038/ncomms6748 Hambrook, D., Tchanturia, K., Schmidt, U., Russell, T., & Treasure, J. (2008). Empathy, systemizing, and autistic traits in anorexia nervosa: a pilot study. The British Journal of Clinical Psychology / the British Psychological Society, 47(Pt 3), 335–9. http://doi.org/10.1348/014466507X272475 Hammerschlag, A. R., Stringer, S., de Leeuw, C. A., Sniekers, S., Taskesen, E., Watanabe, K., … Posthuma, D. (2017). Genome-wide association analysis of insomnia complaints identifies risk genes and genetic overlap with psychiatric and metabolic traits. Nature Genetics. http://doi.org/10.1038/ng.3888 Harkness, K., Sabbagh, M., Jacobson, J., Chowdrey, N., & Chen, T. (2005). Enhanced accuracy of mental state decoding in dysphoric college students. Cognition & Emotion, 19(7), 999–1025. http://doi.org/10.1080/02699930541000110 Harrison, A., Sullivan, S., Tchanturia, K., & Treasure, J. Emotion recognition and regulation in anorexia nervosa. Clinical Psychology & Psychotherapy, 16(4), 348–56. http://doi.org/10.1002/cpp.628 Hatemi, P. K., Smith, K., Alford, J. R., Martin, N. G., & Hibbing, J. R. (2015). The genetic and environmental foundations of political, psychological, social, and economic behaviors: a panel study of twins and families. Twin Research and Human Genetics : The Official Journal of the International Society for Twin Studies, 18(3), 243–55. http://doi.org/10.1017/thg.2015.13 Hazlett, H. C., Gu, H., Munsell, B. C., Kim, S. H., Styner, M., Wolff, J. J., … Piven, J. (2017). Early brain development in infants at high risk for autism spectrum disorder. Nature Publishing Group, 542(7641), 348–351. http://doi.org/10.1038/nature21369 Henn, B. M., Hon, L. S., Macpherson, J. M., Eriksson, N., Saxonov, S., Pe’er, I., & Mountain, J. L. (2012). Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PloS One, 7(4), e34267. http://doi.org/10.1371/journal.pone.0034267 Hibar, D. P., Stein, J. L., Renteria, M. E., Arias-Vasquez, A., Desrivières, S., Jahanshad, N., … Ikram, M. A. (2015). Common genetic variants influence human subcortical brain structures. Nature, 520(7546), 224–9. http://doi.org/10.1038/nature14101 Holgado Tello, F. P., Delgado Egido, B., Carrasco Ortiz, M. A., & Del Barrio Gandara, M. V. (2013). Interpersonal Reactivity Index: analysis of invariance and gender differences in spanish youths. Child Psychiatry & Human Development, 44(2), 320–333. http://doi.org/10.1007/s10578-012- 0327-9 Horan, W. P., Reise, S. P., Kern, R. S., Lee, J., Penn, D. L., & Green, M. F. (2015). Structure and correlates of self-reported empathy in schizophrenia. Journal of Psychiatric Research, 66–67, 60–6. http://doi.org/10.1016/j.jpsychires.2015.04.016 Howie, B. N., Donnelly, P., & Marchini, J. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics, 5(6), e1000529. http://doi.org/10.1371/journal.pgen.1000529 Huang, C. H., & Santangelo, S. L. (2008). Autism and serotonin transporter gene polymorphisms: a systematic review and meta-analysis. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the International Society of Psychiatric Genetics, 147B(6), 903–13. http://doi.org/10.1002/ajmg.b.30720 Hughes, C., Jaffee, S. R., Happé, F., Taylor, A., Caspi, A., & Moffitt, T. E. (2005). Origins of

211

Individual Differences in Theory of Mind: From Nature to Nurture? Child Development, 76(2), 356–370. http://doi.org/10.1111/j.1467-8624.2005.00850_a.x Ibanez, A., Huepe, D., Gempp, R., Gutiérrez, V., Rivera-Rei, A., & Toledo, M. I. (2013). Empathy, sex and fluid intelligence as predictors of theory of mind. Personality and Individual Differences, 54(5), 616–621. http://doi.org/10.1016/j.paid.2012.11.022 Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., … Wang, P. (2010). Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. The American Journal of Psychiatry, 167(7), 748–51. http://doi.org/10.1176/appi.ajp.2010.09091379 International Molecular Genetic Study of Autism Consortium. (1998). A full genome screen for autism with evidence for linkage to a region on chromosome 7q. International Molecular Genetic Study of Autism Consortium. Hum Mol Genet., 7(3), 571–8. Iossifov, I., O’Roak, B. J., Sanders, S. J., Ronemus, M., Krumm, N., Levy, D., … Wigler, M. (2014). The contribution of de novo coding mutations to autism spectrum disorder. Nature, 515(7526), 216–221. http://doi.org/10.1038/nature13908 Irish Schizophrenia Genomics Consortium and the Wellcome Trust Case Control Consortium 2. (2012). Genome-wide association study implicates HLA-C*01:02 as a risk factor at the major histocompatibility complex locus in schizophrenia. Biological Psychiatry, 72(8), 620–8. http://doi.org/10.1016/j.biopsych.2012.05.035 Jamain, S., Quach, H., Betancur, C., Råstam, M., Colineaux, C., Gillberg, I. C., … Van Maldergem, L. (2003). Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4 are associated with autism. Nature Genetics, 34(1), 27–29. http://doi.org/10.1038/ng1136 Jokiranta-Olkoniemi, E., Cheslack-Postava, K., Sucksdorff, D., Suominen, A., Gyllenberg, D., Chudal, R., … Sourander, A. (2016). Risk of psychiatric and neurodevelopmental disorders among siblings of probands with Autism Spectrum Disorders. JAMA Psychiatry, 73(6), 622. http://doi.org/10.1001/jamapsychiatry.2016.0495 Jolliffe, T., & Baron-Cohen, S. (1997). Are people with autism and Asperger syndrome faster than normal on the Embedded Figures Test? Journal of Child Psychology and Psychiatry, and Allied Disciplines, 38(5), 527–34. Jolliffe, T., & Baron-Cohen, S. (1999). The strange stories test: A replication with high-functioning adults with autism or Asperger syndrome. Journal of Autism and Developmental Disorders, 29(5), 395–406. http://doi.org/10.1023/A:1023082928366 Jones, W., & Klin, A. (2013). Attention to eyes is present but in decline in 2–6-month-old infants later diagnosed with autism. Nature, 504(7480), 427–431. http://doi.org/10.1038/nature12715 Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child: Journal of Psychopathology, Psychotherapy, Mental Hygiene, and Guidance of the Child 2, 217–50. Kazeem, G. R., & Farrall, M. (2005). Integrating case-control and TDT studies. Annals of Human Genetics, 69(Pt 3), 329–35. http://doi.org/10.1046/j.1529-8817.2005.00156.x Kemp, J., Berthel, M.-C., Dufour, A., Després, O., Henry, A., Namer, I. J., … Sellal, F. (2013). Caudate nucleus and social cognition: neuropsychological and SPECT evidence from a patient with focal caudate lesion. Cortex; a Journal Devoted to the Study of the Nervous System and Behavior, 49(2), 559–71. http://doi.org/10.1016/j.cortex.2012.01.004 Kirkland, R. A., Peterson, E., Baker, C. A., Miller, S., & Pulos, S. (2013). Meta-analysis reveals adult female superiority in “Reading the mind in the eyes test.” North American Journal of Psychology, 15(1), 121–146.

212

Klei, L. L., Sanders, S. J., Murtha, M. T., Hus, V., Lowe, J. K., Willsey, A. J., … Devlin, B. (2012). Common genetic variants, acting additively, are a major source of risk for autism. Molecular Autism, 3(1), 9. http://doi.org/10.1186/2040-2392-3-9 Knafo-Noam, A., & Uzefovsky, F. (2013). Variation in empathy: The interplay of genetic and environmental factors. In M. Legerstee, D. W. Haley, & M. H. Bornstein (Eds.), The infant mind: Origins of the social Brain (pp. 97–121). New York: The Guilford Press. Knafo-Noam, A., Uzefovsky, F., Israel, S., Davidov, M., & Zahn-Waxler, C. (2015). The prosocial personality and its facets: genetic and environmental architecture of mother-reported behavior of 7-year-old twins. Frontiers in Psychology, 6, 112. http://doi.org/10.3389/fpsyg.2015.00112 Kong, A., Frigge, M. L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., … Stefansson, K. (2012). Rate of de novo mutations and the importance of father’s age to disease risk. Nature, 488(7412), 471–5. http://doi.org/10.1038/nature11396 Kosmicki, J. A., Samocha, K. E., Howrigan, D. P., Sanders, S. J., Slowikowski, K., Lek, M., … Daly, M. J. (2017). Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nature Genetics, 49(4), 504–510. http://doi.org/10.1038/ng.3789 Lai, M.-C., Lombardo, M. V., & Baron-Cohen, S. (2013). Autism. Lancet. http://doi.org/10.1016/S0140-6736(13)61539-1 Lam, B. Y. H., Raine, A., & Lee, T. M. C. (2014). The relationship between neurocognition and symptomatology in people with schizophrenia: social cognition as the mediator. BMC Psychiatry, 14(1), 138. http://doi.org/10.1186/1471-244X-14-138 Lawrence, E. J., Shaw, P., Baker, D., Baron-Cohen, S., & David, A. S. (2004). Measuring empathy: reliability and validity of the Empathy Quotient. Psychological Medicine, 34(5), 911–919. http://doi.org/10.1017/S0033291703001624 Lehmann, A., Bahçesular, K., Brockmann, E.-M., Biederbick, S.-E., Dziobek, I., Gallinat, J., & Montag, C. (2014). Subjective experience of emotions and emotional empathy in paranoid schizophrenia. Psychiatry Research, 220(3), 825–33. http://doi.org/10.1016/j.psychres.2014.09.009 Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., … Consortium, E. A. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature, 536(7616), 285–291. http://doi.org/10.1038/nature19057 Leppa, V. M., Kravitz, S. N., Martin, C. L., Andrieux, J., Le Caignec, C., Martin-Coignard, D., … Geschwind, D. H. (2016). Rare Inherited and De Novo CNVs Reveal Complex Contributions to ASD Risk in Multiplex Families. The American Journal of Human Genetics, 99(3), 540–554. http://doi.org/10.1016/j.ajhg.2016.06.036 Levinson, D. F., Duan, J., Oh, S., Wang, K., Sanders, A. R., Shi, J., … Gejman, P. V. (2011). Copy number variants in schizophrenia: confirmation of five previous findings and new evidence for 3q29 microdeletions and VIPR2 duplications. The American Journal of Psychiatry, 168(3), 302– 16. http://doi.org/10.1176/appi.ajp.2010.10060876 Levy, D., Ronemus, M., Yamrom, B., Lee, Y., Leotta, A., Kendall, J., … Wigler, M. (2011). Rare De Novo and Transmitted Copy-Number Variation in Autistic Spectrum Disorders. Neuron, 70(5), 886–897. http://doi.org/10.1016/j.neuron.2011.05.015 Li, N. P., & Kanazawa, S. (2016). Country roads, take me home… to my friends: How intelligence, population density, and friendship affect modern happiness. British Journal of Psychology, 107(4), 675–697. http://doi.org/10.1111/bjop.12181 Ling, J., Burton, T. C., Salt, J. L., & Muncer, S. J. (2009). Psychometric analysis of the systemizing

213

quotient (SQ) scale. British Journal of Psychology, 100(3), 539–552. http://doi.org/10.1348/000712608X368261 Liu, L., Lei, J., Sanders, S. J., Willsey, A. J., Kou, Y., Cicek, A. E., … Roeder, K. (2014). DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics. Molecular Autism, 5(1), 22. http://doi.org/10.1186/2040-2392-5-22 Lombardo, M. V., Lai, M.-C., Auyeung, B., Holt, R. J., Allison, C., Smith, P., … Ellie Wilson, C. (2016). Unsupervised data-driven stratification of mentalizing heterogeneity in autism. Scientific Reports, 6(1), 35333. http://doi.org/10.1038/srep35333 Lonsdale, J., Thomas, J., Salvatore, M., Phillips, R., Lo, E., Shad, S., … Moore, H. F. (2013). The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45(6), 580–585. http://doi.org/10.1038/ng.2653 LoParo, D., & Waldman, I. D. (2015). The oxytocin receptor gene (OXTR) is associated with autism spectrum disorder: a meta-analysis. Molecular Psychiatry, 20(5), 640–6. http://doi.org/10.1038/mp.2014.77 Lubke, G., Laurin, C., Amin, N., Hottenga, J.-J., Willemsen, G., van Grootheest, G., … Boomsma, D. I. (2014). Genome-wide analyses of borderline personality features. Molecular Psychiatry, 19(8), 923–929. http://doi.org/10.1038/mp.2013.109 Lykken, D., & Tellegen, A. (1996). Happiness is a stochastic phenomenon. Psychological Science, 7(3), 186–189. http://doi.org/10.1111/j.1467-9280.1996.tb00355.x Ma, D., Salyakina, D., Jaworski, J. M., Konidari, I., Whitehead, P. L., Andersen, A. N., … Pericak- Vance, M. A. (2009). A genome-wide association study of autism reveals a common novel risk locus at 5p14.1. Annals of Human Genetics, 73(3), 263–273. http://doi.org/10.1111/j.1469- 1809.2009.00523.x Magalhães, E., Costa, P., & Costa, M. J. (2012). Empathy of medical students and personality: Evidence from the Five-Factor Model. Medical Teacher, 34(10), 807–812. http://doi.org/10.3109/0142159X.2012.702248 Maoz, H., Gvirts, H. Z., Sheffer, M., & Bloch, Y. (2017). Theory of Mind and empathy in children with ADHD. Journal of Attention Disorders, 108705471771076. http://doi.org/10.1177/1087054717710766 McCarthy, S., Das, S., Kretzschmar, W., Delaneau, O., Wood, A. R., Teumer, A., … Marchini, J. (2016). A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics, 48(10), 1279–1283. http://doi.org/10.1038/ng.3643 McClure, E. B. (2000). A meta-analytic review of sex differences in facial expression processing and their development in infants, children, and adolescents. Psychological Bulletin, 126(3), 424–453. http://doi.org/10.1037//0033-2909.126.3.424 McGrath, J. J., Petersen, L., Agerbo, E., Mors, O., Mortensen, P. B., & Pedersen, C. B. (2014). A comprehensive assessment of parental age and psychiatric disorders. JAMA Psychiatry, 71(3), 301–9. http://doi.org/10.1001/jamapsychiatry.2013.4081 McRae, J. F., Clayton, S., Fitzgerald, T. W., Kaplanis, J., Prigmore, E., Rajan, D., … Hurles, M. E. (2017). Prevalence and architecture of de novo mutations in developmental disorders. Nature, 542(7642), 433–438. http://doi.org/10.1038/nature21062 Melchers, M. C., Li, M., Haas, B. W., Reuter, M., Bischoff, L., & Montag, C. (2016). Similar personality patterns are associated with empathy in four different countries. Frontiers in Psychology, 7, 290. http://doi.org/10.3389/fpsyg.2016.00290 Melchers, M. C., Montag, C., Markett, S., & Reuter, M. (2015). Assessment of empathy via self-

214

report and behavioural paradigms: data on convergent and discriminant validity. Cognitive Neuropsychiatry, 20(2), 157–71. http://doi.org/10.1080/13546805.2014.991781 Mestre, M. V., Samper, P., Frías, M. D., & Tur, A. M. (2009). Are women more empathetic than men? A longitudinal study in adolescence. The Spanish Journal of Psychology, 12(1), 76–83. http://doi.org/10.1017/S1138741600001499 Michaels, T. M., Horan, W. P., Ginger, E. J., Martinovich, Z., Pinkham, A. E., & Smith, M. J. (2014). Cognitive empathy contributes to poor social functioning in schizophrenia: Evidence from a new self-report measure of cognitive and affective empathy. Psychiatry Research, 220(3), 803–10. http://doi.org/10.1016/j.psychres.2014.08.054 Miller, S. A. (2009). Children’s understanding of second-order mental states. Psychological Bulletin, 135(5), 749–773. http://doi.org/10.1037/a0016854 Mitra, I., Tsang, K., Ladd-Acosta, C., Croen, L. A., Aldinger, K. A., Hendren, R. L., … Weiss, L. A. (2016). Pleiotropic mechanisms indicated for sex differences in autism. PLoS Genetics, 12(11), e1006425. http://doi.org/10.1371/journal.pgen.1006425 Morelli, S. A., Rameson, L. T., & Lieberman, M. D. (2014). The neural components of empathy: Predicting daily prosocial behavior. Social Cognitive and Affective Neuroscience, 9(1), 39–47. http://doi.org/10.1093/scan/nss088 Morris, R., Bramham, J., Smith, E., & Tchanturia, K. (2014). Empathy and social functioning in anorexia nervosa before and after recovery. Cognitive Neuropsychiatry, 19(1), 47–57. http://doi.org/10.1080/13546805.2013.794723 Munafò, M. R., Clark, T. G., Moore, L. R., Payne, E., Walton, R., & Flint, J. (2003). Genetic polymorphisms and personality in healthy adults: a systematic review and meta-analysis. Molecular Psychiatry, 8(5), 471–84. http://doi.org/10.1038/sj.mp.4001326 Mutter, B., Alcorn, M. B., & Welsh, M. (2006). Theory of mind and executive function: working- memory capacity and inhibitory control as predictors of false-belief task performance. Perceptual and Motor Skills, 102(3), 819–835. http://doi.org/10.2466/pms.102.3.819-835 Nagel, M., Jansen, P. R., Stringer, S., Watanabe, K., de Leeuw, C. A., Bryois, J., … Posthuma, D. (2017). GWAS Meta-Analysis of Neuroticism (N=449,484) Identifies Novel Genetic Loci and Pathways. bioRxiv. Neale, B. M., Kou, Y., Liu, L., Ma’ayan, A., Samocha, K. E., Sabo, A., … Daly, M. J. (2012). Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature, 485(7397), 242–245. http://doi.org/10.1038/nature11011 Nettle, D. (2007). Empathizing and systemizing: what are they, and what do they contribute to our understanding of psychological sex differences? British Journal of Psychology (London, England : 1953), 98(Pt 2), 237–55. http://doi.org/10.1348/000712606X117612 Nieuwboer, H. A., Pool, R., Dolan, C. V., Boomsma, D. I., & Nivard, M. G. (2016). GWIS: Genome- Wide Inferred Statistics for functions of multiple phenotypes. American Journal of Human Genetics, 99(4), 917–927. http://doi.org/10.1016/j.ajhg.2016.07.020 Nyffeler, J., Walitza, S., Bobrowski, E., Gundelfinger, R., & Grünblatt, E. (2014). Association study in siblings and case-controls of serotonin- and oxytocin-related genes with high functioning autism. Journal of Molecular Psychiatry, 2(1), 1. http://doi.org/10.1186/2049-9256-2-1 O’Connor, L., & Price, A. L. (2017). Distinguishing genetic correlation from causation across 52 diseases and complex traits. Doi.org, 205435. http://doi.org/10.1101/205435 O’Roak, B. J., & State, M. W. (2008). Autism genetics: strategies, challenges, and opportunities. Autism Research : Official Journal of the International Society for Autism Research, 1(1), 4–17.

215

http://doi.org/10.1002/aur.3 O’Roak, B. J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B. P., … Eichler, E. E. (2012). Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature, 485(7397), 246–250. http://doi.org/10.1038/nature10989 Okbay, A., Baselmans, B. M. L., De Neve, J.-E., Turley, P., Nivard, M. G., Fontana, M. A., … Cesarini, D. (2016). Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nature Genetics, 48(6), 624–633. http://doi.org/10.1038/ng.3552 Okbay, A., Beauchamp, J. P., Fontana, M. A., Lee, J. J., Pers, T. H., Rietveld, C. A., … Benjamin, D. J. (2016). Genome-wide association study identifies 74 loci associated with educational attainment. Nature, 533(7604), 539–542. http://doi.org/10.1038/nature17671 Onishi, K. H., & Baillargeon, R. (2005). Do 15-month-old infants understand false beliefs? Science (New York, N.Y.), 308(5719), 255–8. http://doi.org/10.1126/science.1107621 Otowa, T., Hek, K., Lee, M., Byrne, E. M., Mirza, S. S., Nivard, M. G., … Hettema, J. M. (2016). Meta-analysis of genome-wide association studies of anxiety disorders. Molecular Psychiatry. http://doi.org/10.1038/mp.2015.197 Padmanabhan, A., Lynch, C. J., Schaer, M., & Menon, V. (2017). The Default Mode Network in Autism. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 2(6), 476–486. Parikshak, N. N., Luo, R., Zhang, A., Won, H., Lowe, J. K., Chandran, V., … Geschwind, D. H. (2013). Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell, 155(5), 1008–21. http://doi.org/10.1016/j.cell.2013.10.031 Parikshak, N. N., Swarup, V., Belgard, T. G., Irimia, M., Ramaswami, G., Gandal, M. J., … Geschwind, D. H. (2016). Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature, 540(7633), 423–427. http://doi.org/10.1038/nature20612 Park, J., Ro, M., Pyun, J.-A., Nam, M., Bang, H. J., Yang, J. W., … Kwack, K. (2014). MTHFR 1298A>C is a risk factor for autism spectrum disorder in the Korean population. Psychiatry Research, 215(1), 258–9. http://doi.org/10.1016/j.psychres.2013.11.006 Paus, T., Keshavan, M. S., & Giedd, J. N. (2008). Why do many psychiatric disorders emerge during adolescence? Nature Reviews. Neuroscience, 9(12), 947–57. http://doi.org/10.1038/nrn2513 Pedersen, C. B., Bybjerg-Grauholm, J., Pedersen, M. G., Grove, J., Agerbo, E., Bækved-Hansen, M., … Mortensen, P. B. (2017). The iPSYCH2012 case–cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Molecular Psychiatry. http://doi.org/10.1038/mp.2017.196 Peterson, E., & Miller, S. F. (2012). The Eyes Test as a measure of individual differences: How much of the variance reflects verbal IQ? Frontiers in Psychology, 3(JUL), 220. http://doi.org/10.3389/fpsyg.2012.00220 Peyrot, W. J., Boomsma, D. I., Penninx, B. W. J. H., & Wray, N. R. (2016). Disease and polygenic architecture: Avoid trio design and appropriately account for unscreened control subjects for common disease. American Journal of Human Genetics, 98(2), 382–391. http://doi.org/10.1016/j.ajhg.2015.12.017 Pickrell, J. K., Berisa, T., Liu, J. Z., Ségurel, L., Tung, J. Y., & Hinds, D. A. (2016). Detection and interpretation of shared genetic influences on 42 human traits. Nature Genetics, 48(7), 709–717. http://doi.org/10.1038/ng.3570 Pinto, D., Delaby, E., Merico, D., Barbosa, M., Merikangas, A., Klei, L. L., … Scherer, S. W. (2014). Convergence of genes and cellular pathways dysregulated in autism spectrum disorders.

216

American Journal of Human Genetics, 94(5), 677–94. http://doi.org/10.1016/j.ajhg.2014.03.018 Polimanti, R., & Gelernter, J. (2017). Widespread signatures of positive selection in common risk alleles associated to autism spectrum disorder. PLOS Genetics, 13(2), e1006618. http://doi.org/10.1371/journal.pgen.1006618 Popolo, R., Dimaggio, G., Luther, L., Vinci, G., Salvatore, G., & Lysaker, P. H. (2016). Theory of Mind in Schizophrenia. The Journal of Nervous and Mental Disease, 204(3), 240–243. http://doi.org/10.1097/NMD.0000000000000454 Power, R. A., Steinberg, S., Bjornsdottir, G., Rietveld, C. A., Abdellaoui, A., Nivard, M. M., … Stefansson, K. (2015). Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nature Neuroscience, 18(7), 953–955. http://doi.org/10.1038/nn.4040 Pu, D., Shen, Y., & Wu, J. (2013). Association between MTHFR gene polymorphisms and the risk of autism spectrum disorders: a meta-analysis. Autism Research : Official Journal of the International Society for Autism Research, 6(5), 384–92. http://doi.org/10.1002/aur.1300 Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., … Sham, P. C. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81(3), 559–75. http://doi.org/10.1086/519795 Ramoz, N., Reichert, J. G., Smith, C. J., Silverman, J. M., Bespalova, I. N., Davis, K. L., & Buxbaum, J. D. (2004). Linkage and association of the mitochondrial Aspartate/Glutamate carrier SLC25A12 gene With autism. American Journal of Psychiatry, 161(4), 662–669. http://doi.org/10.1176/appi.ajp.161.4.662 Rietveld, C. A., Esko, T., Davies, G., Pers, T. H., Turley, P., Benyamin, B., … Koellinger, P. D. (2014). Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proceedings of the National Academy of Sciences, 111(38), 13790– 13794. http://doi.org/10.1073/pnas.1404623111 Rietveld, C. A., Medland, S. E., Derringer, J., Yang, J., Esko, T., Martin, N. W., … Koellinger, P. D. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science (New York, N.Y.), 340(6139), 1467–71. http://doi.org/10.1126/science.1235488 Ripke, S., Neale, B. M., Corvin, A. P., Walters, J. T. R., Farh, K.-H., Holmans, P. A., … O’Donovan, M. C. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510), 421–7. http://doi.org/10.1038/nature13595 Ripke, S., Sanders, A. R., Kendler, K. S., Levinson, D. F., Sklar, P., Holmans, P. A., … Gejman, P. V. (2011). Genome-wide association study identifies five new schizophrenia loci. Nature Genetics, 43(10), 969–976. http://doi.org/10.1038/ng.940 Ripke, S., Wray, N. R., Lewis, C. M., Hamilton, S. P., Weissman, M. M., Breen, G., … Sullivan, P. F. (2013). A mega-analysis of genome-wide association studies for major depressive disorder. Molecular Psychiatry, 18(4), 497–511. http://doi.org/10.1038/mp.2012.21 Robinson, E. B., Lichtenstein, P., Anckarsäter, H., Happé, F., & Ronald, A. (2013). Examining and interpreting the female protective effect against autistic behavior. Proceedings of the National Academy of Sciences, 110(13), 5258–5262. http://doi.org/10.1073/pnas.1211070110 Robinson, E. B., Samocha, K. E., Kosmicki, J. A., McGrath, L., Neale, B. M., Perlis, R. H., & Daly, M. J. (2014). Autism spectrum disorder severity reflects the average contribution of de novo and familial influences. Proceedings of the National Academy of Sciences, 111(42), 15161–15165. http://doi.org/10.1073/pnas.1409204111 Robinson, E. B., St Pourcain, B., Anttila, V., Kosmicki, J. A., Bulik-Sullivan, B. K., Grove, J., … Daly, M. J. (2016). Genetic risk for autism spectrum disorders and neuropsychiatric variation in

217

the general population. Nature Genetics, 48(5), 552–5. http://doi.org/10.1038/ng.3529 Ronald, A., Happé, F., Hughes, C., & Plomin, R. (2005). Nice and nasty Theory of Mind in preschool children: nature and nurture. Social Development, 14(4), 664–684. http://doi.org/10.1111/j.1467- 9507.2005.00323.x Ronald, A., & Hoekstra, R. A. (2011). Autism spectrum disorders and autistic traits: A decade of new twin studies. American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, 156(3), 255–274. http://doi.org/10.1002/ajmg.b.31159 Ronald, A., Viding, E., Happé, F., & Plomin, R. (2006). Individual differences in theory of mind ability in middle childhood and links with verbal ability and autistic traits: a twin study. Social Neuroscience, 1(3–4), 412–25. http://doi.org/10.1080/17470910601068088 Rothstein, H. R., Borenstein, M., Hedges, L. V., & Higgins, J. P. T. (2013). Introduction to meta- analysis. Wiley. Russell-Smith, S. N., Bayliss, D. M., Maybery, M. T., & Tomkinson, R. L. (2013). Are the autism and positive schizotypy spectra diametrically opposed in empathizing and systemizing? Journal of Autism and Developmental Disorders, 43(3), 695–706. http://doi.org/10.1007/s10803-012-1614- 9 Ruzich, E., Allison, C., Chakrabarti, B., Smith, P., Musto, H., Ring, H., & Baron-Cohen, S. (2015). Sex and STEM occupation predict Autism-Spectrum Quotient (AQ) scores in half a million people. PloS One, 10(10), e0141229. http://doi.org/10.1371/journal.pone.0141229 Salyakina, D., Ma, D. Q., Jaworski, J. M., Konidari, I., Whitehead, P. L., Henson, R., … Pericak- Vance, M. A. (2010). Variants in several genomic regions associated with asperger disorder. Autism Research : Official Journal of the International Society for Autism Research, 3(6), 303– 10. http://doi.org/10.1002/aur.158 Samocha, K. E., Robinson, E. B., Sanders, S. J., Stevens, C., Sabo, A., McGrath, L. M., … Daly, M. J. (2014). A framework for the interpretation of de novo mutation in human disease. Nature Genetics, 46(9), 944–50. http://doi.org/10.1038/ng.3050 Sanders, S. J., He, X., Willsey, A. J., Ercan-Sencicek, A. G., Samocha, K. E., Cicek, A. E., … State, M. W. (2015). Insights into Autism Spectrum Disorder genomic architecture and biology from 71 risk loci. Neuron, 87(6), 1215–33. http://doi.org/10.1016/j.neuron.2015.09.016 Sandin, S., Lichtenstein, P., Kuja-Halkola, R., Larsson, H., Hultman, C. M., & Reichenberg, A. (2014). The familial risk of autism. JAMA : The Journal of the American Medical Association, 311(17), 1770–7. http://doi.org/10.1001/jama.2014.4144 Sandin, S., Schendel, D., Magnusson, P., Hultman, C. M., Surén, P., Susser, E., … Reichenberg, A. (2015). Autism risk associated with parental age and with increasing difference in age between the parents. Molecular Psychiatry, 21(5), 693–700. http://doi.org/10.1038/mp.2015.70 Schaaf, C. P., Boone, P. M., Sampath, S., Williams, C., Bader, P. I., Mueller, J. M., … Cheung, S. W. (2012). Phenotypic spectrum and genotype-phenotype correlations of NRXN1 exon deletions. European Journal of Human Genetics : EJHG, 20(12), 1240–7. http://doi.org/10.1038/ejhg.2012.95 Schmidt, R. J., Hansen, R. L., Hartiala, J., Allayee, H., Schmidt, L. C., Tancredi, D. J., … Hertz- Picciotto, I. (2011). Prenatal vitamins, one-carbon metabolism gene variants, and risk for autism. Epidemiology (Cambridge, Mass.), 22(4), 476–85. http://doi.org/10.1097/EDE.0b013e31821d0e30 Scourfield, J., Martin, N., Lewis, G., & McGuffin, P. (1999). Heritability of social cognitive skills in children and adolescents. BRITISH JOURNAL OF PSYCHIATRY, 175(6), 559–564. http://doi.org/10.1192/bjp.175.6.559

218

Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., … Wigler, M. (2007). Strong association of de novo copy number mutations with autism. Science (New York, N.Y.), 316(5823), 445–9. http://doi.org/10.1126/science.1138659 Seiradake, E., Coles, C. H., Perestenko, P. V, Harlos, K., McIlhinney, R. A. J., Aricescu, A. R., & Jones, E. Y. (2011). Structural basis for cell surface patterning through NetrinG-NGL interactions. The EMBO Journal, 30(21), 4479–4488. http://doi.org/10.1038/emboj.2011.346 Shi, L., Zhang, Z., Su, B., Thompson, P. M., & Thiel, G. (2016). Sex biased gene expression profiling of human brains at major developmental stages. Scientific Reports, 6(1), 21181. http://doi.org/10.1038/srep21181 Siegal, M., & Varley, R. (2002). Neural systems involved in “theory of mind.” Nature Reviews. Neuroscience, 3(6), 463–71. http://doi.org/10.1038/nrn844 Singh, T., Walters, J. T. R., Johnstone, M., Curtis, D., Suvisaari, J., Torniainen, M., … Barrett, J. C. (2016). Rare schizophrenia risk variants are enriched in genes shared with neurodevelopmental disorders. bioRxiv. Sinha, P., Kjelgaard, M. M., Gandhi, T. K., Tsourides, K., Cardinaux, A. L., Pantazis, D., … Held, R. M. (2014). Autism as a disorder of prediction. Proceedings of the National Academy of Sciences of the United States of America, 111(42), 15220–5. http://doi.org/10.1073/pnas.1416797111 Sklar, P., Ripke, S., Scott, L. J., Andreassen, O. A., Cichon, S., Craddock, N., … Purcell, S. M. (2011). Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nature Genetics, 43(10), 977–983. http://doi.org/10.1038/ng.943 Skuse, D. H., Mandy, W. P. L., & Scourfield, J. (2005). Measuring autistic traits: heritability, reliability and validity of the Social and Communication Disorders Checklist. The British Journal of Psychiatry, 187(6), 568–572. http://doi.org/10.1192/bjp.187.6.568 Sniekers, S., Stringer, S., Watanabe, K., Jansen, P. R., Coleman, J. R. I., Krapohl, E., … Posthuma, D. (2017). Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nature Genetics. http://doi.org/10.1038/ng.3869 Song, R.-R., Zou, L., Zhong, R., Zheng, X.-W., Zhu, B.-B., Chen, W., … Miao, X.-P. (2011). An integrated meta-analysis of two variants in HOXA1/HOXB1 and their effect on the risk of autism spectrum disorders. PloS One, 6(9), e25603. http://doi.org/10.1371/journal.pone.0025603 Song, Y. S., Lee, H.-J., Prosselkov, P., Itohara, S., & Kim, E. (2013). Trans-induced cis interaction in the tripartite NGL-1, netrin-G1 and LAR adhesion complex promotes development of excitatory synapses. Journal of Cell Science, 126(21), 4926–4938. http://doi.org/10.1242/jcs.129718 Spreng, R. N., McKinnon, M. C., Mar, R. A., & Levine, B. (2009). The Toronto Empathy Questionnaire: scale development and initial validation of a factor-analytic solution to multiple empathy measures. Journal of Personality Assessment, 91(1), 62–71. http://doi.org/10.1080/00223890802484381 St Pourcain, B., Robinson, E. B., Anttila, V., Sullivan, B. B., Maller, J., Golding, J., … Davey-Smith, G. (2017). ASD and schizophrenia show distinct developmental profiles in common genetic overlap with population-based social communication difficulties. Molecular Psychiatry. http://doi.org/10.1038/mp.2016.198 St Pourcain, B., Skuse, D. H., Mandy, W. P., Wang, K., Hakonarson, H., Timpson, N. J., … Smith, G. D. (2014). Variability in the common genetic architecture of social-communication spectrum phenotypes during childhood and adolescence. Molecular Autism, 5(1), 18. http://doi.org/10.1186/2040-2392-5-18 St Pourcain, B., Whitehouse, A. J. O., Ang, W. Q., Warrington, N. M., Glessner, J. T., Wang, K., …

219

Smith, G. (2013). Common variation contributes to the genetic architecture of social communication traits. Molecular Autism, 4(1), 34. http://doi.org/10.1186/2040-2392-4-34 Stevens, A. K., McNichol, J., & Magalhaes, L. (2009). Social relationships in schizophrenia: A review. Personality and Mental Health, 3(3), 203–216. http://doi.org/10.1002/pmh.82 Storchel, P. H., Thummler, J., Siegel, G., Aksoy-Aksel, A., Zampa, F., Sumer, S., & Schratt, G. (2015). A large-scale functional screen identifies Nova1 and Ncoa3 as regulators of neuronal miRNA function. The EMBO Journal, 34(17), 2237–2254. http://doi.org/10.15252/embj.201490643 Stubbe, J. H., Posthuma, D., Boomsma, D. I., & de Geus, E. J. C. (2005). Heritability of life satisfaction in adults: a twin-family study. Psychological Medicine, 35(11), 1581. http://doi.org/10.1017/S0033291705005374 Tang, G., Gudsnuk, K., Kuo, S. H., Cotrina, M. L., Rosoklija, G., Sosunov, A., … Sulzer, D. (2014). Loss of mTOR-Dependent Macroautophagy Causes Autistic-like Synaptic Pruning Deficits. Neuron, 83(5), 1131–1143. http://doi.org/10.1016/j.neuron.2014.07.040 Tapajóz P de Sampaio, F., Soneira, S., Aulicino, A., Martese, G., Iturry, M., & Allegri, R. F. (2013). Theory of mind and central coherence in eating disorders: two sides of the same coin? Psychiatry Research, 210(3), 1116–22. http://doi.org/10.1016/j.psychres.2013.08.051 Tapajóz Pereira de Sampaio, F., Soneira, S., Aulicino, A., & Allegri, R. F. (2013). Theory of mind in eating disorders and their relationship to clinical profile. European Eating Disorders Review : The Journal of the Eating Disorders Association, 21(6), 479–87. http://doi.org/10.1002/erv.2247 Taylor, S. (2013). Molecular genetics of obsessive-compulsive disorder: a comprehensive meta- analysis of genetic association studies. Molecular Psychiatry, 18(7), 799–805. http://doi.org/10.1038/mp.2012.76 Taylor, S. (2016). Disorder-specific genetic factors in obsessive-compulsive disorder: A comprehensive meta-analysis. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the International Society of Psychiatric Genetics, 171B(3), 325–32. http://doi.org/10.1002/ajmg.b.32407 Teo, A. R., Choi, H. J., & Valenstein, M. (2013). Social relationships and depression: Ten-year follow-Up from a nationally representative study. PLoS ONE, 8(4), e62396. http://doi.org/10.1371/journal.pone.0062396 The Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium. (2017). Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Molecular Autism, 8(1), 21. http://doi.org/10.1186/s13229-017-0137-9 Thoma, P., Schmidt, T., Juckel, G., Norra, C., & Suchan, B. (2015). Nice or effective? Social problem solving strategies in patients with major depressive disorder. Psychiatry Research, 228(3), 835– 42. Tick, B., Bolton, P. F., Happé, F., Rutter, M., & Rijsdijk, F. (2016). Heritability of autism spectrum disorders: A meta-analysis of twin studies. Journal of Child Psychology and Psychiatry and Allied Disciplines, 57(5), 585–595. http://doi.org/10.1111/jcpp.12499 Tone, E. B., & Tully, E. C. (2014). Empathy as a “risky strength”: a multilevel examination of empathy and risk for internalizing disorders. Development and Psychopathology, 26(4 Pt 2), 1547–65. http://doi.org/10.1017/S0954579414001199 Tossell, K., Andreae, L. C., Cudmore, C., Lang, E., Muthukrishnan, U., Lumsden, A., … Irving, C. (2011). Lrrn1 is required for formation of the midbrain-hindbrain boundary and organiser through regulation of affinity differences between midbrain and hindbrain cells in chick.

220

Developmental Biology, 352(2), 341–52. http://doi.org/10.1016/j.ydbio.2011.02.002 Tung, J. Y., Do, C. B., Hinds, D. A., Kiefer, A. K., Macpherson, J. M., Chowdry, A. B., … Eriksson, N. (2011). Efficient replication of over 180 genetic associations with self-reported medical data. PloS One, 6(8), e23473. http://doi.org/10.1371/journal.pone.0023473 Turley, P., Walters, R. K., Maghzian, O., Okbay, A., Lee, J. J., Fontana, M. A., … Benjamin, D. J. (2017). MTAG: Multi-Trait Analysis of GWAS. bioRxiv, 1–31. http://doi.org/118810 Uzefovsky, F., Shalev, I., Israel, S., Edelman, S., Raz, Y., Perach-Barzilay, N., … Ebstein, R. P. (2014). The dopamine D4 receptor gene shows a gender-sensitive association with cognitive empathy: evidence from two independent samples. Emotion (Washington, D.C.), 14(4), 712–21. http://doi.org/10.1037/a0036555 van den Berg, S. M., de Moor, M. H. M., Verweij, K. J. H., Krueger, R. F., Luciano, M., Arias- Vasquez, A., … Boomsma, D. I. (2016). Meta-analysis of genome-wide association studies for extraversion: findings from the genetics of personality consortium. Behavior Genetics, 46(2), 170–82. http://doi.org/10.1007/s10519-015-9735-5 Vellante, M., Baron-Cohen, S., Melis, M., Marrone, M., Petretto, D. R., Masala, C., & Preti, A. (2013). The “Reading the Mind in the Eyes” test: systematic review of psychometric properties and a validation study in Italy. Cognitive Neuropsychiatry, 18(4), 326–54. http://doi.org/10.1080/13546805.2012.721728 Voineagu, I., Wang, X., Johnston, P., Lowe, J. K., Tian, Y., Horvath, S., … Geschwind, D. H. (2011). Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature, 474(7351), 380–384. http://doi.org/10.1038/nature10110 Wakabayashi, A., Baron-Cohen, S., Uchiyama, T., Yoshida, Y., Kuroda, M., & Wheelwright, S. (2007). Empathizing and systemizing in adults with and without autism spectrum conditions: Cross-cultural stability. Journal of Autism and Developmental Disorders, 37(10), 1823–1832. http://doi.org/10.1007/s10803-006-0316-6 Wang, K., Gaitsch, H., Poon, H., Cox, N. J., & Rzhetsky, A. (2017). Classification of common human diseases derived from shared genetic and environmental determinants. Nature Genetics, 49(9), 1319–1325. http://doi.org/10.1038/ng.3931 Wang, K., Zhang, H., Ma, D., Bucan, M., Glessner, J. T., Abrahams, B. S., … Hakonarson, H. (2009). Common genetic variants on 5p14.1 associate with autism spectrum disorders. Nature, 459(7246), 528–33. http://doi.org/10.1038/nature07999 Wang, Z., Hong, Y., Zou, L., Zhong, R., Zhu, B.-B., Shen, N., … Miao, X. (2014). Reelin gene variants and risk of autism spectrum disorders: an integrated meta-analysis. American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics : The Official Publication of the International Society of Psychiatric Genetics, 165B(2), 192–200. http://doi.org/10.1002/ajmg.b.32222 Warrier, V., Baron-Cohen, S., & Chakrabarti, B. (2013). Genetic variation in GABRB3 is associated with Asperger syndrome and multiple endophenotypes relevant to autism. Molecular Autism, 4(1), 48. http://doi.org/10.1186/2040-2392-4-48 Warrier, V., Chakrabarti, B., Murphy, L., Chan, A., Craig, I. W., Mallya, U., … Baron-Cohen, S. (2015). A Pooled Genome-Wide Association Study of Asperger Syndrome. PloS One, 10(7), e0131202. http://doi.org/10.1371/journal.pone.0131202 Warrier, V., Chee, V., Smith, P., Chakrabarti, B., & Baron-Cohen, S. (2015). A comprehensive meta- analysis of common genetic variants in autism spectrum conditions. Molecular Autism, 6(1), 49. http://doi.org/10.1186/s13229-015-0041-0 Warrier, V., Grasby, K., Uzefovsky, F., Toro, R., Smith, P., Chakrabarti, B., … Baron-Cohen, S.

221

(2017). Genome-wide meta-analysis of cognitive empathy: heritability, and correlates with sex, neuropsychiatric conditions and cognition. Molecular Psychiatry. Warrier, V., Toro, R., Chakrabarti, B., Litterman, N., Hinds, D. A., Bourgeron, T., & Baron-Cohen, S. (2016). Genome-wide analyses of empathy and systemizing: heritability and correlates with sex, education, and psychiatric risk. bioRxiv. Watanabe, K., Taskesen, E., van Bochoven, A., & Posthuma, D. (2017). FUMA: Functional mapping and annotation of genetic associations. bioRxiv. Wei, X., Yu, J. W., Shattuck, P., McCracken, M., & Blackorby, J. (2013). Science, technology, engineering, and mathematics (STEM) participation among college students with an autism spectrum disorder. Journal of Autism and Developmental Disorders, 43(7), 1539–46. http://doi.org/10.1007/s10803-012-1700-z Weightman, M. J., Air, T. M., & Baune, B. T. (2014). A Review of the Role of Social Cognition in Major Depressive Disorder. Frontiers in Psychiatry, 5, 179. http://doi.org/10.3389/fpsyt.2014.00179 Weiner, D. J., Wigdor, E. M., Ripke, S., Walters, R. K., Kosmicki, J. A., Grove, J., … Robinson, E. B. (2017). Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nature Genetics. http://doi.org/10.1038/ng.3863 Weiss, L. A., Arking, D. E., Daly, M. J., Chakravarti, A., Brune, C. W., West, K., … Peltonen, L. (2009). A genome-wide linkage and association scan reveals novel loci for autism. Nature, 461(7265), 802–808. http://doi.org/10.1038/nature08490 Wellman, H. M., Cross, D., & Watson, J. (2001). Meta-analysis of Theory-of-Mind development: The truth about false belief. Child Development, 72(3), 655–684. http://doi.org/10.1111/1467- 8624.00304 Werling, D. M., & Geschwind, D. H. (2015). Recurrence rates provide evidence for sex-differential, familial genetic liability for autism spectrum disorders in multiplex families and twins. Molecular Autism, 6(1), 27. http://doi.org/10.1186/s13229-015-0004-5 Werling, D. M., Parikshak, N. N., & Geschwind, D. H. (2016). Gene expression in human brain implicates sexually dimorphic pathways in autism spectrum disorders. Nature Communications, 7, 10717. http://doi.org/10.1038/ncomms10717 Wheelwright, S. J., Auyeung, B., Allison, C., & Baron-Cohen, S. (2010). Defining the broader, medium and narrow autism phenotype among parents using the Autism Spectrum Quotient (AQ). Molecular Autism, 1, 10. http://doi.org/10.1186/2040-2392-1-10 Wheelwright, S. J., & Baron-Cohen, S. (2001). The Link Between Autism and Skills such as Engineering, Maths, Physics and Computing: A Reply to Jarrold and Routh, Autism,1998,2 (3):281-9. Autism, 5(2), 223–227. http://doi.org/10.1177/1362361301005002010 Wheelwright, S. J., Baron-Cohen, S., Goldenfeld, N., Delaney, J., Fine, D., Smith, R., … Wakabayashi, A. (2006). Predicting Autism Spectrum Quotient (AQ) from the Systemizing Quotient-Revised (SQ-R) and Empathy Quotient (EQ). Brain Research, 1079(1), 47–56. http://doi.org/10.1016/j.brainres.2006.01.012 Whitaker, K. J., Vértes, P. E., Romero-Garcia, R., Váša, F., Moutoussis, M., Prabhu, G., … Bullmore, E. T. (2016). Adolescence is associated with transcriptionally patterned consolidation of the hubs of the human brain connectome. Proceedings of the National Academy of Sciences, 2–7. http://doi.org/10.1073/PNAS.1601745113 Willer, C. J., Li, Y., & Abecasis, G. R. (2010). METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics (Oxford, England), 26(17), 2190–1.

222

http://doi.org/10.1093/bioinformatics/btq340 Willsey, A. J., Sanders, S. J., Li, M., Dong, S., Tebbenkamp, A. T., Muhle, R. A., … State, M. W. (2013). Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell, 155(5), 997–1007. http://doi.org/10.1016/j.cell.2013.10.020 World Health Organization. (1992). The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines. Geneva. Xia, K., Guo, H., Hu, Z., Xun, G., Zuo, L., Peng, Y., … Zhang, F. (2014). Common genetic variants on 1p13.2 associate with risk of autism. Mol Psychiatry, 19(11), 1212–1219. http://doi.org/10.1038/mp.2013.146 Yang, J., Lee, S. H., Goddard, M. E., & Visscher, P. M. (2011). GCTA: a tool for genome-wide complex trait analysis. American Journal of Human Genetics, 88(1), 76–82. http://doi.org/10.1016/j.ajhg.2010.11.011 Yuen, R. K. C., Merico, D., Bookman, M., L Howe, J., Thiruvahindrapuram, B., Patel, R. V, … Scherer, S. W. (2017). Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nature Neuroscience, 20(4), 602–611. http://doi.org/10.1038/nn.4524 Yuen, R. K. C., Thiruvahindrapuram, B., Merico, D., Walker, S., Tammimies, K., Hoang, N., … Scherer, S. W. (2015). Whole-genome sequencing of quartet families with autism spectrum disorder. Nature Medicine, 21(2), 185–91. http://doi.org/10.1038/nm.3792 Zarrei, M., MacDonald, J. R., Merico, D., & Scherer, S. W. (2015). A copy number variation map of the . Nature Reviews Genetics, 16(3), 172–183. http://doi.org/10.1038/nrg3871 Zhang, Y., Chen, K., Sloan, S. A., Bennett, M. L., Scholze, A. R., O’Keeffe, S., … Wu, J. Q. (2014). An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. Journal of Neuroscience, 34(36), 11929–11947. http://doi.org/10.1523/JNEUROSCI.1860-14.2014 Zheng, J., Erzurumluoglu, A. M., Elsworth, B. L., Kemp, J. P., Howe, L., Haycock, P. C., … Neale, B. M. (2016). LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics, 33(2), 272–279. http://doi.org/10.1093/bioinformatics/btw613 Zobel, I., Werden, D., Linster, H., Dykierek, P., Drieling, T., Berger, M., & Schramm, E. (2010). Theory of mind deficits in chronically depressed patients. Depression and Anxiety, 27(9), 821– 828. http://doi.org/10.1002/da.20713 Zoghbi, H. Y., & Bear, M. F. (2012). Synaptic dysfunction in neurodevelopmental disorders associated with autism and intellectual disabilities. Cold Spring Harbor Perspectives in Biology, 4(3). http://doi.org/10.1101/cshperspect.a009886

223

Appendix 1: Studies included and study characteristics (Chapter 2)

Gene Study Ancestry Type of study Sample size Diagnosis criteria Hardy Notes ADO ADOS ADI CAR DSM DS Weinbe S -G -R S -IV M rg III-R Equilib rium MTHFR Park et al., Korean Case Control 251 cases, 425 X Yes 20141 controls Liu et al., Caucasian Case Control 512 cases, 384 X X X Yes Includes a 20112 controls proband from 205 simplex families and a random proband from 307 multiplex families Guo et al., Chinese (Han) Case Control 186 cases, 186 Yes 20123 controls dos Santos European Case Control 151 cases, 100 X Yes et al., 20104 derived (Brazil) controls James et al., Caucasian Case Control 356 cases, 205 X X X Yes 20065 (97%) controls Pasca et al., Caucasian Case Control 39 cases, 80 controls X Yes 20096 (Romanian) Mohammad Indian Case Control 138 cases, 138 X Yes et al., 20097 controls Divyakolu et Indian Case Control 50 cases, 50 controls HWE al., 20138 not given, manuall y checked (Yes) Boris et al., Caucasian Case Control 168 cases, 5389 X Yes 20049 controls

224

Schmidt et Mixed Case Control 429 cases, 278 X X Yes al., 201110 controls EN2 Gharani et Caucasian Family based 167 families X Yes al., 200411 (AGRE sample) Yang et al., Chinese (Han) Case Control 193 cases, 309 X Yes 201012 controls Sen et al., Indian Family based 128 families of ASD X Yes 201013 children comprising of 105 trios and 23 duos Yang et al., Chinese (Han) Case Control 184 cases, 634 X Yes Controls made 200814 controls of two groups, both the groups were combined in the analysis Prandini et Italian Family based 227 families X al., 200815

Benayed et Caucasian Family based 518 families X Yes al., 200516 (AGRE sample) Warrier et Caucasian Case Control 118 cases, 412 X Yes al., 2014 controls Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., 200917 controls Zhong et al., Caucasian Family based 204 families X X Yes 200318 (AGRE sample) GRIK2 Jamain et European and Family based 107 trios X X Yes al., 200219 American Dutta et al., Indian Family based 101 probands, 180 X X X Yes 200720 parents Shuang et Chinese (Han) Family based 174 families X al., 200421 Kim et al., Korean Family based 126 trios X X X Yes 200722 COMT Limprasert Thai Family based 188 cases, 250 X Yes Only Case- et al.,201423 controls Control data used

225

James et al., Mixed Case Control 360 cases and 205 X X X Yes 20065 controls Guo et al., Chinese Han Case Control 186 cases, 186 X X Yes 201324 controls Karam et al., Egyptian Case Control 80 cases, 100 controls X X Yes 201325 Yirmiya et N.A. Family based 35 families X X X Yes This study used al., 200126 haplotype relative risk and as a result, the data was treated as a case-control study

TPH2 Coon et al., Mixed Case Control 88 cases, 95 controls X X HWE 200527 (Caucasian) not given, manuall y checked (Yes) Ramoz et Mixed Family based 352 families X Yes al., 200628 Singh et al., Indian Case Control 136 cases, 165 X Yes 201329 controls MACROD Curran et al., Mixed Case Control 1170 cases, 35307 X X Yes 2 201130 controls Anney et al., Mixed Family based 1158 families X X Yes 201031 DRD3 Krom et al., Dutch Case Control 254 cases, 404 X Yes 200932 controls Toma et al., Spanish Case Control 326 cases, 350 X Yes 201333 controls HTR2A Veenstra- Family based 115 trios X NA VanderWeel e et al., 200234

226

Guhathakurt Indian Family based 97 trios X Yes Only Family a et al., and Case based data used 200935 Control Hranilovic et Croation Case Control 103 cases, 214 X Yes al., 201036 controls Cho et al., Korean Family based 26 trios X Yes 200737 Smith et al., Mixed family based 158 trios X X N.A. 201438 Nyffeler et Caucasian Case Control 76 cases 99 controls X X Yes al., 201439 STX1A Durdiaková Caucasian Family based 479 cases, 650 X Yes et al., 201440 controls Nakamura et Japanese Family based 378 individuals X DSM Yes al., 201141 -IV- TR Nakamura et Caucasian Family based 249 trios X X Yes al., 200842 Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., 200917 controls BDNF Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., 200917 controls Cheng et al., Chinese Case Control 174 cases, 349 X X X Yes 200943 controls Nishimura et AGRE Family based 104 trios X Yes al., 200744 ITGB3 Singh et al., Indian Case Control 139 cases, 165 X DSM Yes 201329 controls -IV- TR Cochrane et Irish Family based 177 trios X X Yes al., 201045 Coutinho et Portugese Family based 186 trios X X X NA al., 200746 CNTNAP2 Sampath et Mixed Family based 2051 families X X Yes al., 201347 Toma et al., Spanish Case Control 322 cases, 524 X Yes 201348 controls

227

Li et al., Chinese Family based 322 individuals X NA 201049 RELN Sharma et South African Case Control 136 cases, 208 X Yes al., 201350 controls Fu et al., Chinese (Han) Case Control 205 cases, 210 dsm- Yes 201351 controls iv-tr He et al., Chinese (Han) Family based, 232 cases, 283 X NA 201152 Case Control controls Dutta et al., Indian Family based, 102 cases, 283 X X Yes 200853 Case Control controls Li et al., Chinese (Han) Case Control 213 cases, 160 X Yes 200854 controls Bonora et Mixed Family based 342 cases, 194 X X NA al., 200355 controls Serajee et Mixed Family based 174 cases, 349 X Yes al., 200656 controls Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., 200917 controls Warrier et Caucasian Case Control 118 cases, 412 Yes al., 2014 controls Persico et American/Italian Family based 95 cases, 186 controls X Yes al., 200157 and Case Control Krebs et al., Mixed Family based 167 families X X NA 200258 Zhang et al., Canada Case Control 126 cases, 347 X X Yes 200259 controls Li et al., Mixed Family based 107 families X X NA 200460 Dutta et al., Indian Family 55 cases, 80 controls X X Yes 200761 based/Case Control SLC25A12 Ramoz et Egyptian Family based 2000 (710,1280) X X NA al., 200462 Segurado et Irish Family based 158 trios X NA al., 200563

228

Blasi et al., Caucasian Family 531 individuals (261, X NA 200664 based/Case 174) Control Chien et al., Chinese (Han) Case Control 465 cases, 450 X X Yes 201065 controls Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., 200917 controls Correia et Italian Case Control NA X X Yes al., 200666 Palmieri et Caucasian Family based 197 families NA al., 201067 Ramoz et AJMGB Family based 334 families X X NA al., 200868

Durdiakova Caucasian Case Control 117 cases, 412 X Yes et al., 201469 controls Prandini et Caucasian Family based 227 families X al., 201215 PON1 Pasca et al., Romanians Case Control 50 cases, 85 controls X Yes 201070 D'Amelio et American Case 177 cases, 180 X X X Yes Only Case al., 200571 caucasian/Italian Control/Family controls (Italians), Control data s based 107 cases, 376 used controls (Americans) ASMT Melke et al., Caucasian Case Control 278 cases, 255 X X Yes 200872 controls Toma et al., Finnish, Italian Case Control 127 cases, 100 X X X Yes 200773 and European controls (Finnish), 69 (IMGSAC) cases, 90 controls (Italian), 194 cases, 192 controls (European - IMGSAC) Wang et al., Chinese Case Control 398 cases, 437 X X ABC Yes 201374 controls ADA Hettinger et NA Case Control 125 cases, 167 X X Yes al., 200875 controls

229

Bottini et al., Italian Case Control 118 cases, 126 X Yes 200176 controls Persico et Italian Case Control 91 cases, 152 controls X Yes al., 200077 and Family based SHANK3 Sykes et al., NA (IMGSAC Family based 308 families X Yes Case- 200978 cohort) and case- pseudocontrol pseudocontrol data was used for analysis Shao et al., Chinese Case Control 212 cases, 636 X Yes 201479 controls Waga et al., Japanese Case Control 128 cases, 228 X HWE 201180 controls not given, manuall y checked (Yes) MAOA Verma et al., Indian Case Control 194 cases, 227 X X Yes 201481 controls Salem et al., Egyptian Case Control 53 cases, 30 controls X Yes 201382 Tassone et NA Case Control 189 cases, 167 X X Yes al., 201183 controls NF1 Marui et al., Japanese Case Control 74 cases, 122 controls X Yes 200484 Mbarek et NA Case Control 85 cases, 90 controls X Yes al., 199985 Plank et al., Caucasian & Case Control 204 cases, 200 X Yes 200186 African controls MET Campbell et Italian Family based 702 cases, 189 X Yes Only Case al., 200687 and Case controls Control data Control used Jackson et South Carolina Case Control 174 cases, 369 X X X Yes al., 200988 & Italian controls (South Carolina), 65 cases, 126 controls (Italian)

230

Sousa et al., Caucasian & TDT 1621 caucasian, 84 X Yes 200989 Italian italian trios Campbell et Mixed, largely Case Control 629 cases, 312 X Yes al., 200890 Caucasian controls Thanseem et Japanese Family based 378 families X X Yes al., 201091 Zhou et al., Chinese Case Control 405 cases, 594 X X Yes 201192 controls GLO1 Wu et al., Chinese Case Control 272 cases, 310 X X Yes 200893 controls Junaid et al., Multi Case Control 71 cases, 49 controls X Yes 200494 Kovač et al., Slovenian Case Control 143 cases, 150 X Yes 201495 controls Sacco et al., Italian, Case Control 371 cases, 171 X X X Yes 200796 Caucasian- controls American OXTR Liu et al., Japanese Case Control 282 cases, 440 X Yes 201097 controls Jacob et al., Caucasian Family based 57 trios X X X Yes 200798 Tansey et Caucasian Family based 458 families X NA al., 201099 Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., controls 200917 Nyffeler et Caucasian Case Control 76 cases 99 controls X X Yes al., 201439 DiNapoli et Caucasian Case Control 118 cases, 412 X Yes al., 2014100 controls OMG Vourc'h P et Caucasian Case Control 65 cases, 101 controls X Yes al., 2003101 Martin et al., US, Canada, Family based 431 families X X X Yes 2007102 Italian HOXA1 Chakrabarti Caucasian Case Control 174 cases, 349 X Yes et al., 200917 controls Devlin et al., Mixed Family based 231 families X X X NA 2002103

231

Collins et Mixed Case Controls 204 cases, 159 X Yes We used Case al., 2003104 and Family controls in total; 187 (Caucas Control for the based families ian), No caucasian (Africa population and n Family based for Americ the African- an) american population Conciatori et Italian and Case Control 127 cases, 174 X No Only Family al., 2004105 Caucasian and Family controls based data used based Sen et al., Indian (Northern Case Control 80 cases, 149 controls X X Yes 2007106 and Eastern) Gallagher et Irish Family based 78 families X X N.A. al., 2004107 Romano et Italian Family based 85 cases, 132 controls X Yes Only Case al., 2003108 and Case Control data Control used Talebizadeh Mixed Case Control 35 cases, 35 controls X X Yes et al., 2002109 Li et al., NA Family based 110 multiplex X X Yes 2002110 Ingram et Caucasian Family based 50 families X NA al.,2000 111 and Case Control SLC6A4 Ramoz et AGRE Family based 352 families X NA al., 2006112

Devlin et et NIH Family based 390 families X X Yes al., 2005113 Kim et al., Caucasian Family based 115 trios X X X Yes 2002114 Cho et al., Korean Family based 126 trios X Yes 200737 Klauck et Caucasian (One Family based 65 trios X X X NA al., 1997115 family: Asian)

232

Cook et al., Caucasian, Family based 86 families X X NA 1997116 African- American, Hispanic- American, Asian-American Conroy et Irish Family based 84 trios X X Yes al., 2004117 Maestrini et Caucasian Family based 90 families X NA al., 1999118 Persico et Italian/American Family based 54 trios, 44 trios X Yes al., 2000119 Tordjman et Caucasian Family based 71 trios NA al., 2001120 Yirmiya et Isreal Family based 34 families X X X NA al., 200126 Betancur et Caucasian Family based 53 families with 43 X X NA al., 2002121 (Austria, trios Belgium, France, Italy, Norway, Sweden and United states) Coutinho et Portugese Family based 196 families X X X NA al., 200646 Mulder et Dutch Family based 125 trios NA al., 2005122 Koishi et al., Japanese Family based 104 trios X Yes 2006123 Guhathakurt Indian Family based 93 families X X Yes a et al., 2006124 Wu et al., Chinese Family based 175 trios X X Yes 2005125 Yoo et al., Korean Family based 151 trios Yes 2009126

233

Appendix 2: Studies excluded (Chapter 2)

Study Reason for exclusion Article name Alarcon et al. 2008127 Sample overlaps with Sampath et al. 2013 Linkage, association, and gene-expression analyses identify CNTNAP2 as an autism-susceptibility gene. 128 Anderson et al. 2008 Insufficient data Examination of association to autism of common genetic variation in genes related to dopamine.

Polyacrylamide gel-based microarray: a novel method applied to the association Study between the Cheng et al. 2009129 Article inaccessible polymorphisms of BDNF gene and autism. Egawa et al. 2012130 Minor allele frequency is 0 A detailed association analysis between the tryptophan hydroxylase 2 (TPH2) gene and autism spectrum disorders in a Japanese population.

Sample overlaps with Gaita et al. 2010131 D'amelio 2005 Decreased serum arylesterase activity in autism spectrum disorders Hutcheson et al., 2004132 Insufficient data Examination of NRCAM, LRRN3, KIAA0716, and LAMB1 as autism candidate genes 133 Kelemenova et al. 2010 Insufficient data Polymorphisms of candidate genes in Slovak autistic patients. Mei et al. 2007134 Covariates used in analysis Multifactor dimensionality reduction-phenomics: a novel method to capture genetic heterogeneity with use of phenotypic variables. 135 Petit et al. 1995 Insufficient data Association study with two markers of a human homeogene in infantile autism. 136 Rabionet et al. 2006 Insufficient data Lack of association between autism and SLC25A12. 137 Rehnstrom et al. 2007 Insufficient data No association between common variants in glyoxalase 1 and autism spectrum disorders

Sample overlaps with Serajee et al. 2004138 D'amelio 2005 Polymorphisms in xeniobiotic metabolism genes and autism

Sample overlaps with Toma 2007, Melke 2008 and Wang 2013. Further tests specifically individuals with sleep Genetic Variation in Melatonin Pathway Enzymes in Children with Autism Spectrum Disorder and Veatch et al. 2014139 issues. Comorbid Sleep Onset Delay

234

140 Weiss et al. 2006 Tests for interaction ITGB3 shows genetic and expression interaction with SLC6A4. Xu et al. 2013141 Article inaccessible Genetic polymorphisms of SNP loci in the 5' and 3' region of TPH2 gene in Northern Chinese Han population McCauley et al., 2003142 Sample overlaps with Linkage and association analysis at the serotonin transporter (SLC6A4) locus in a rigid-compulsive subset of Ramoz et al., 2006 autism Article inaccessible and Yu et al. 2004 not traceable Association study between HOXA1 A218G polymorphism and autism. Sample overlaps with Arking et al., 2008143 Sampath et al. 2013 A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. Studies mentioned in the table are studies that otherwise satisfy the inclusion criteria as mentioned in the Methods section of Chapter 2. Several other studies were excluded as they did not meet all the criteria mentioned in the Methods section of Chapter 2. These studies have not been listed in the table above.

235

References for the studies included in Appendices 1 and 2

1 Park J, Ro M, Pyun J-A, Nam M, Bang HJ, Yang JW et al. MTHFR 1298A>C is a risk factor for autism spectrum disorder in the Korean population. Psychiatry Res 2014; 215: 258–9. 2 Liu X, Solehdin F, Cohen IL, Gonzalez MG, Jenkins EC, Lewis MES et al. Population- and family-based studies associate the MTHFR gene with idiopathic autism in simplex families. J Autism Dev Disord 2011; 41: 938–44. 3 Guo T, Chen H, Liu B, Ji W, Yang C. Methylenetetrahydrofolate reductase polymorphisms C677T and risk of autism in the Chinese Han population. Genet Test Mol Biomarkers 2012; 16: 968–73. 4 Dos Santos PAC, Longo D, Brandalize APC, Schüler-Faccini L. MTHFR C677T is not a risk factor for autism spectrum disorders in South Brazil. Psychiatr Genet 2010; 20: 187–9. 5 James SJ, Melnyk S, Jernigan S, Cleves MA, Halsted CH, Wong DH et al. Metabolic endophenotype and related genotypes are associated with oxidative stress in children with autism. Am J Med Genet B Neuropsychiatr Genet 2006; 141B: 947–56. 6 Paşca SP, Dronca E, Kaucsár T, Craciun EC, Endreffy E, Ferencz BK et al. One carbon metabolism disturbances and the C677T MTHFR gene polymorphism in children with autism spectrum disorders. J Cell Mol Med 2009; 13: 4229–38. 7 Mohammad NS, Jain JMN, Chintakindi KP, Singh RP, Naik U, Akella RRD. Aberrations in folate metabolic pathway and altered susceptibility to autism. Psychiatr Genet 2009; 19: 171–6. 8 Divyakolu S, Tejaswini Y, Thomas W, Thumoju S, Sreekanth VR, Vasavi M et al. Evaluation of C677T Polymorphism of the Methylenetetrahydrofolate Reductase (MTHFR) Gene in various Neurological Disorders. J Neurol Disord 2013; 2:142 9 Boris M, Goldblatt A, Galanko J, James J. Association of MTHFR Gene Variants with Autism. J. Am. Physicians Surg. 2004; : 106 – 108. 10 Schmidt RJ, Hansen RL, Hartiala J, Allayee H, Schmidt LC, Tancredi DJ et al. Prenatal vitamins, one-carbon metabolism gene variants, and risk for autism. Epidemiology 2011; 22: 476–85. 11 Gharani N, Benayed R, Mancuso V, Brzustowicz LM, Millonig JH. Association of the , 2, 3, with autism spectrum disorder. Mol Psychiatry 2004; 9: 474–84. 12 Yang P, Shu B-C, Hallmayer JF, Lung F-W. Intronic single nucleotide polymorphisms of engrailed homeobox 2 modulate the disease vulnerability of autism in a han chinese population. Neuropsychobiology 2010; 62: 104–15.

236

13 Sen B, Singh AS, Sinha S, Chatterjee A, Ahmed S, Ghosh S et al. Family-based studies indicate association of Engrailed 2 gene with autism in an Indian population. Genes Brain Behav 2010; 9: 248–55. 14 Wang L, Jia M, Yue W, Tang F, Qu M, Ruan Y et al. Association of the ENGRAILED 2 (EN2) gene with autism in Chinese Han population. Am J Med Genet B Neuropsychiatr Genet 2008; 147B: 434–8. 15 Prandini P, Pasquali A, Malerba G, Marostica A, Zusi C, Xumerle L et al. The association of rs4307059 and rs35678 markers with autism spectrum disorders is replicated in Italian families. Psychiatr Genet 2012; 22: 177–81. 16 Benayed R, Gharani N, Rossman I, Mancuso V, Lazar G, Kamdar S et al. Support for the homeobox transcription factor gene ENGRAILED 2 as an autism spectrum disorder susceptibility locus. Am J Hum Genet 2005; 77: 851–68. 17 Chakrabarti B, Dudbridge F, Kent L, Wheelwright S, Hill-Cawthorne G, Allison C et al. Genes related to sex steroids, neural growth, and social-emotional behavior are associated with autistic traits, empathy, and Asperger syndrome. Autism Res 2009; 2: 157–77. 18 Zhong H, Serajee FJ, Nabi R, Huq AHMM. No association between the EN2 gene and autistic disorder. J Med Genet 2003; 40: e4. 19 Jamain S, Betancur C, Quach H, Philippe A, Fellous M, Giros B et al. Linkage and association of the glutamate receptor 6 gene with autism. Mol Psychiatry 2002; 7: 302–10. 20 Dutta S, Das S, Guhathakurta S, Sen B, Sinha S, Chatterjee A et al. Glutamate receptor 6 gene (GluR6 or GRIK2) polymorphisms in the Indian population: a genetic association study on autism spectrum disorder. Cell Mol Neurobiol 2007; 27: 1035–47. 21 Shuang M, Liu J, Jia MX, Yang JZ, Wu SP, Gong XH et al. Family-based association study between autism and glutamate receptor 6 gene in Chinese Han trios. Am J Med Genet B Neuropsychiatr Genet 2004; 131B: 48–50. 22 Kim SA, Kim JH, Park M, Cho IH, Yoo HJ. Family-based association study between GRIK2 polymorphisms and autism spectrum disorders in the Korean trios. Neurosci Res 2007; 58: 332–5. 23 Limprasert P, Maisrikhaw W, Sripo T, Wirojanan J, Hansakunachai T, Roongpraiwan R et al. No association of Val158Met variant in the COMT gene with autism spectrum disorder in Thai children. Psychiatr Genet 2014; 24: 230–1. 24 Guo T, Wang W, Liu B, Chen H, Yang C. Catechol-O-methyltransferase Val158Met polymorphism and risk of autism spectrum disorders. J Int Med Res 2013; 41: 725–34. 25 Karam RA, Rezk NA, Abdelrahman HM, Hassan TH, Mohammad D, Hashim HM et al. Catechol-O-methyltransferase Val158Met polymorphism and hyperactivity symptoms in Egyptian children with autism spectrum disorder. Res Dev Disabil 2013; 34: 2092–7. 26 Yirmiya N, Pilowsky T, Nemanov L, Arbelle S, Feinsilver T, Fried I et al. Evidence for an association with the serotonin transporter promoter region polymorphism and autism. Am J Med Genet 2001; 105: 381–6.

237

27 Coon H, Dunn D, Lainhart J, Miller J, Hamil C, Battaglia A et al. Possible association between autism and variants in the brain-expressed tryptophan hydroxylase gene (TPH2). Am J Med Genet B Neuropsychiatr Genet 2005; 135B: 42–6. 28 Ramoz N, Cai G, Reichert JG, Corwin TE, Kryzak LA, Smith CJ et al. Family-based association study of TPH1 and TPH2 polymorphisms in autism. Am J Med Genet B Neuropsychiatr Genet 2006; 141B: 861–7. 29 Singh AS, Chandra R, Guhathakurta S, Sinha S, Chatterjee A, Ahmed S et al. Genetic association and gene-gene interaction analyses suggest likely involvement of ITGB3 and TPH2 with autism spectrum disorder (ASD) in the Indian population. Prog Neuropsychopharmacol Biol Psychiatry 2013; 45: 131–43. 30 Curran S, Bolton P, Rozsnyai K, Chiocchetti A, Klauck SM, Duketis E et al. No association between a common single nucleotide polymorphism, rs4141463, in the MACROD2 gene and autism spectrum disorder. Am J Med Genet B Neuropsychiatr Genet 2011; 156B: 633–9. 31 Anney R, Klei L, Pinto D, Regan R, Conroy J, Magalhaes TR et al. A genome-wide scan for common alleles affecting risk for autism. Hum Mol Genet 2010; 19: 4072–82. 32 De Krom M, Staal WG, Ophoff RA, Hendriks J, Buitelaar J, Franke B et al. A common variant in DRD3 receptor is associated with autism spectrum disorder. Biol Psychiatry 2009; 65: 625–30. 33 Toma C, Hervás A, Balmaña N, Salgado M, Maristany M, Vilella E et al. Neurotransmitter systems and neurotrophic factors in autism: association study of 37 genes suggests involvement of DDC. World J Biol Psychiatry 2013; 14: 516–27. 34 Veenstra-VanderWeele J, Kim S-J, Lord C, Courchesne R, Akshoomoff N, Leventhal BL et al. Transmission disequilibrium studies of the serotonin 5-HT2A receptor gene (HTR2A) in autism. Am J Med Genet 2002; 114: 277–83. 35 Guhathakurta S, Singh AS, Sinha S, Chatterjee A, Ahmed S, Ghosh S et al. Analysis of serotonin receptor 2A gene (HTR2A): association study with autism spectrum disorder in the Indian population and investigation of the gene expression in peripheral blood leukocytes. Neurochem Int 2009; 55: 754–9. 36 Hranilovic D, Blazevic S, Babic M, Smurinic M, Bujas-Petkovic Z, Jernej B. 5-HT2A receptor gene polymorphisms in Croatian subjects with autistic disorder. Psychiatry Res 2010; 178: 556–8. 37 Cho IH, Yoo HJ, Park M, Lee YS, Kim SA. Family-based association study of 5- HTTLPR and the 5-HT2A receptor gene polymorphisms with autism spectrum disorder in Korean trios. Brain Res 2007; 1139: 34–41. 38 Smith RM, Banks W, Hansen E, Sadee W, Herman GE. Family-based clinical associations and functional characterization of the serotonin 2A receptor gene (HTR2A) in autism spectrum disorder. Autism Res 2014; 7: 459–67.

238

39 Nyffeler J, Walitza S, Bobrowski E, Gundelfinger R, Grünblatt E. Association study in siblings and case-controls of serotonin- and oxytocin-related genes with high functioning autism. J Mol Psychiatry 2014; 2: 1. 40 Durdiaková J, Warrier V, Banerjee-Basu S, Baron-Cohen S, Chakrabarti B. STX1A and Asperger syndrome: a replication study. Mol Autism 2014; 5: 14. 41 Nakamura K, Iwata Y, Anitha A, Miyachi T, Toyota T, Yamada S et al. Replication study of Japanese cohorts supports the role of STX1A in autism susceptibility. Prog Neuropsychopharmacol Biol Psychiatry 2011; 35: 454–8. 42 Nakamura K, Anitha A, Yamada K, Tsujii M, Iwayama Y, Hattori E et al. Genetic and expression analyses reveal elevated expression of syntaxin 1A ( STX1A) in high functioning autism. Int J Neuropsychopharmacol 2008; 11: 1073–84. 43 Cheng L, Ge Q, Xiao P, Sun B, Ke X, Bai Y et al. Association study between BDNF gene polymorphisms and autism by three-dimensional gel-based microarray. Int J Mol Sci 2009; 10: 2487–500. 44 Nishimura K, Nakamura K, Anitha A, Yamada K, Tsujii M, Iwayama Y et al. Genetic analyses of the brain-derived neurotrophic factor (BDNF) gene in autism. Biochem Biophys Res Commun 2007; 356: 200–6. 45 Cochrane LE, Tansey KE, Gill M, Gallagher L, Anney RJL. Lack of association between markers in the ITGA3, ITGAV, ITGA6 and ITGB3 and autism in an Irish sample. Autism Res 2010; 3: 342–4. 46 Coutinho AM, Sousa I, Martins M, Correia C, Morgadinho T, Bento C et al. Evidence for epistasis between SLC6A4 and ITGB3 in autism etiology and in the determination of platelet serotonin levels. Hum Genet 2007; 121: 243–56. 47 Sampath S, Bhat S, Gupta S, O’Connor A, West AB, Arking DE et al. Defining the contribution of CNTNAP2 to autism susceptibility. PLoS One 2013; 8: e77906. 48 Toma C, Hervás A, Torrico B, Balmaña N, Salgado M, Maristany M et al. Analysis of two language-related genes in autism: a case-control association study of FOXP2 and CNTNAP2. Psychiatr Genet 2013; 23: 82–5. 49 Li X, Hu Z, He Y, Xiong Z, Long Z, Peng Y et al. Association analysis of CNTNAP2 polymorphisms with autism in the Chinese Han population. Psychiatr Genet 2010; 20: 113–7. 50 Sharma JR, Arieff Z, Gameeldien H, Davids M, Kaur M, van der Merwe L. Association analysis of two single-nucleotide polymorphisms of the RELN gene with autism in the South African population. Genet Test Mol Biomarkers 2013; 17: 93–8. 51 Fu X, Mei Z, Sun L. Association between the g.296596G > A genetic variant of RELN gene and susceptibility to autism in a Chinese Han population. Genet Mol Biol 2013; 36: 486–9. 52 He Y, Xun G, Xia K, Hu Z, Lv L, Deng Z et al. No significant association between RELN polymorphism and autism in case-control and family-based association study in Chinese Han population. Psychiatry Res 2011; 187: 462–4.

239

53 Dutta S, Sinha S, Ghosh S, Chatterjee A, Ahmed S, Usha R. Genetic analysis of reelin gene (RELN) SNPs: no association with autism spectrum disorder in the Indian population. Neurosci Lett 2008; 441: 56–60. 54 Li H, Li Y, Shao J, Li R, Qin Y, Xie C et al. The association analysis of RELN and GRM8 genes with autistic spectrum disorder in Chinese Han population. Am J Med Genet B Neuropsychiatr Genet 2008; 147B: 194–200. 55 Bonora E, Beyer KS, Lamb JA, Parr JR, Klauck SM, Benner A et al. Analysis of reelin as a candidate gene for autism. Mol Psychiatry 2003; 8: 885–92. 56 Serajee FJ, Zhong H, Mahbubul Huq AHM. Association of Reelin gene polymorphisms with autism. Genomics 2006; 87: 75–83. 57 Persico AM, D’Agruma L, Maiorano N, Totaro A, Militerni R, Bravaccio C et al. Reelin gene alleles and haplotypes as a factor predisposing to autistic disorder. Mol Psychiatry 2001; 6: 150–9. 58 Krebs MO, Betancur C, Leroy S, Bourdel MC, Gillberg C, Leboyer M. Absence of association between a polymorphic GGC repeat in the 5’ untranslated region of the reelin gene and autism. Mol Psychiatry 2002; 7: 801–4. 59 Zhang H, Liu X, Zhang C, Mundo E, Macciardi F, Grayson DR et al. Reelin gene alleles and susceptibility to autism spectrum disorders. Mol Psychiatry 2002; 7: 1012–7. 60 Li J, Nguyen L, Gleason C, Lotspeich L, Spiker D, Risch N et al. Lack of evidence for an association between WNT2 and RELN polymorphisms and autism. Am J Med Genet B Neuropsychiatr Genet 2004; 126B: 51–7. 61 Dutta S, Guhathakurta S, Sinha S, Chatterjee A, Ahmed S, Ghosh S et al. Reelin gene polymorphisms in the Indian population: a possible paternal 5’UTR-CGG-repeat-allele effect on autism. Am J Med Genet B Neuropsychiatr Genet 2007; 144B: 106–12. 62 Ramoz N. Linkage and Association of the Mitochondrial Aspartate/Glutamate Carrier SLC25A12 Gene With Autism. Am J Psychiatry 2004; 161: 662–669. 63 Segurado R, Conroy J, Meally E, Fitzgerald M, Gill M, Gallagher L. Confirmation of association between autism and the mitochondrial aspartate/glutamate carrier SLC25A12 gene on chromosome 2q31. Am J Psychiatry 2005; 162: 2182–4. 64 Blasi F, Bacchelli E, Carone S, Toma C, Monaco AP, Bailey AJ et al. SLC25A12 and CMYA3 gene variants are not associated with autism in the IMGSAC multiplex family sample. Eur J Hum Genet 2006; 14: 123–6. 65 Chien W-H, Wu Y-Y, Gau SS-F, Huang Y-S, Soong W-T, Chiu Y-N et al. Association study of the SLC25A12 gene and autism in Han Chinese in Taiwan. Prog Neuropsychopharmacol Biol Psychiatry 2010; 34: 189–92. 66 Correia C, Coutinho AM, Diogo L, Grazina M, Marques C, Miguel T et al. Brief report: High frequency of biochemical markers for mitochondrial dysfunction in autism: no association with the mitochondrial aspartate/glutamate carrier SLC25A12 gene. J Autism Dev Disord 2006; 36: 1137–40.

240

67 Palmieri L, Papaleo V, Porcelli V, Scarcia P, Gaita L, Sacco R et al. Altered calcium homeostasis in autism-spectrum disorders: evidence from biochemical and genetic studies of the mitochondrial aspartate/glutamate carrier AGC1. Mol Psychiatry 2010; 15: 38–52. 68 Ramoz N, Cai G, Reichert JG, Silverman JM, Buxbaum JD. An analysis of candidate autism loci on chromosome 2q24-q33: evidence for association to the STK39 gene. Am J Med Genet B Neuropsychiatr Genet 2008; 147B: 1152–8. 69 Durdiaková J, Warrier V, Baron-Cohen S, Chakrabarti B. Single nucleotide polymorphism rs6716901 in SLC25A12 gene is associated with Asperger syndrome. Mol Autism 2014; 5: 25. 70 Paşca SP, Dronca E, Nemeş B, Kaucsár T, Endreffy E, Iftene F et al. Paraoxonase 1 activities and polymorphisms in autism spectrum disorders. J Cell Mol Med 2010; 14: 600–7. 71 D’Amelio M, Ricci I, Sacco R, Liu X, D’Agruma L, Muscarella LA et al. Paraoxonase gene variants are associated with autism in North America, but not in Italy: possible regional specificity in gene-environment interactions. Mol Psychiatry 2005; 10: 1006–16. 72 Melke J, Goubran Botros H, Chaste P, Betancur C, Nygren G, Anckarsäter H et al. Abnormal melatonin synthesis in autism spectrum disorders. Mol Psychiatry 2008; 13: 90–8. 73 Toma C, Rossi M, Sousa I, Blasi F, Bacchelli E, Alen R et al. Is ASMT a susceptibility gene for autism spectrum disorders? A replication study in European populations. Mol Psychiatry 2007; 12: 977–9. 74 Wang L, Li J, Ruan Y, Lu T, Liu C, Jia M et al. Sequencing ASMT identifies rare mutations in Chinese Han patients with autism. PLoS One 2013; 8: e53727. 75 Hettinger JA, Liu X, Holden JJA. The G22A polymorphism of the ADA gene and susceptibility to autism spectrum disorders. J Autism Dev Disord 2008; 38: 14–9. 76 Bottini N, De Luca D, Saccucci P, Fiumara A, Elia M, Porfirio MC et al. Autism: evidence of association with adenosine deaminase genetic polymorphism. Neurogenetics 2001; 3: 111–3. 77 Persico AM, Militerni R, Bravaccio C, Schneider C, Melmed R, Trillo S et al. Adenosine deaminase alleles and autistic disorder: case-control and family-based association studies. Am J Med Genet 2000; 96: 784–90. 78 Sykes NH, Toma C, Wilson N, Volpi E V, Sousa I, Pagnamenta AT et al. Copy number variation and association analysis of SHANK3 as a candidate gene for autism in the IMGSAC collection. Eur J Hum Genet 2009; 17: 1347–53. 79 Shao S, Xu S, Yang J, Zhang T, He Z, Sun Z et al. A commonly carried genetic variant, rs9616915, in SHANK3 gene is associated with a reduced risk of autism spectrum disorder: replication in a Chinese population. Mol Biol Rep 2014; 41: 1591–5. 80 Waga C, Okamoto N, Ondo Y, Fukumura-Kato R, Goto Y-I, Kohsaka S et al. Novel variants of the SHANK3 gene in Japanese autistic patients with severe delayed speech development. Psychiatr Genet 2011; 21: 208–11.

241

81 Verma D, Chakraborti B, Karmakar A, Bandyopadhyay T, Singh AS, Sinha S et al. Sexual dimorphic effect in the genetic association of monoamine oxidase A (MAOA) markers with autism spectrum disorder. Prog Neuropsychopharmacol Biol Psychiatry 2014; 50: 11–20. 82 Salem AM, Ismail S, Zarouk WA, Abdul Baky O, Sayed AA, Abd El-Hamid S et al. Genetic variants of neurotransmitter-related genes and miRNAs in Egyptian autistic patients. ScientificWorldJournal 2013; 2013: 670621. 83 Tassone F, Qi L, Zhang W, Hansen RL, Pessah IN, Hertz-Picciotto I. MAOA, DBH, and SLC6A4 variants in CHARGE: a case-control study of autism spectrum disorders. Autism Res 2011; 4: 250–61. 84 Marui T, Hashimoto O, Nanba E, Kato C, Tochigi M, Umekage T et al. Association between the neurofibromatosis-1 (NF1) locus and autism in the Japanese population. Am J Med Genet B Neuropsychiatr Genet 2004; 131B: 43–7. 85 Mbarek O, Marouillat S, Martineau J, Barthélémy C, Müh JP, Andres C. Association study of the NF1 gene and autistic disorder. Am J Med Genet 1999; 88: 729–32. 86 Plank SM, Copeland-Yates SA, Sossey-Alaoui K, Bell JM, Schroer RJ, Skinner C et al. Lack of association of the (AAAT)6 allele of the GXAlu tetranucleotide repeat in intron 27b of the NF1 gene with autism. Am J Med Genet 2001; 105: 404–5. 87 Campbell DB, Sutcliffe JS, Ebert PJ, Militerni R, Bravaccio C, Trillo S et al. A genetic variant that disrupts MET transcription is associated with autism. Proc Natl Acad Sci U S A 2006; 103: 16834–9. 88 Jackson PB, Boccuto L, Skinner C, Collins JS, Neri G, Gurrieri F et al. Further evidence that the rs1858830 C variant in the promoter region of the MET gene is associated with autistic disorder. Autism Res 2009; 2: 232–6. 89 Sousa I, Clark TG, Toma C, Kobayashi K, Choma M, Holt R et al. MET and autism susceptibility: family and case-control studies. Eur J Hum Genet 2009; 17: 749–58. 90 Campbell DB, Li C, Sutcliffe JS, Persico AM, Levitt P. Genetic evidence implicating multiple genes in the MET receptor tyrosine kinase pathway in autism spectrum disorder. Autism Res 2008; 1: 159–68. 91 Thanseem I, Nakamura K, Miyachi T, Toyota T, Yamada S, Tsujii M et al. Further evidence for the role of MET in autism susceptibility. Neurosci Res 2010; 68: 137–41. 92 Zhou X, Xu Y, Wang J, Zhou H, Liu X, Ayub Q et al. Replication of the association of a MET variant with autism in a Chinese Han population. PLoS One 2011; 6: e27428. 93 Wu Y-Y, Chien W-H, Huang Y-S, Gau SS-F, Chen C-H. Lack of evidence to support the glyoxalase 1 gene (GLO1) as a risk gene of autism in Han Chinese patients from Taiwan. Prog Neuropsychopharmacol Biol Psychiatry 2008; 32: 1740–4. 94 Junaid MA, Kowal D, Barua M, Pullarkat PS, Sklower Brooks S, Pullarkat RK. Proteomic studies identified a single nucleotide polymorphism in glyoxalase I as autism susceptibility factor. Am J Med Genet A 2004; 131: 11–7.

242

95 Kovač J, Podkrajšek KT, Lukšič MM, Battelino T. Weak association of glyoxalase 1 (GLO1) variants with autism spectrum disorder. Eur Child Adolesc Psychiatry 2014. doi:10.1007/s00787-014-0537-8. 96 Sacco R, Papaleo V, Hager J, Rousseau F, Moessner R, Militerni R et al. Case-control and family-based association studies of candidate genes in autistic disorder and its endophenotypes: TPH2 and GLO1. BMC Med Genet 2007; 8: 11. 97 Liu X, Kawamura Y, Shimada T, Otowa T, Koishi S, Sugiyama T et al. Association of the oxytocin receptor (OXTR) gene polymorphisms with autism spectrum disorder (ASD) in the Japanese population. J Hum Genet 2010; 55: 137–41. 98 Jacob S, Brune CW, Carter CS, Leventhal BL, Lord C, Cook EH. Association of the oxytocin receptor gene (OXTR) in Caucasian children and adolescents with autism. Neurosci Lett 2007; 417: 6–9. 99 Tansey KE, Brookes KJ, Hill MJ, Cochrane LE, Gill M, Skuse D et al. Oxytocin receptor (OXTR) does not play a major role in the aetiology of autism: genetic and molecular studies. Neurosci Lett 2010; 474: 163–7. 100 Di Napoli A, Warrier V, Baron-Cohen S, Chakrabarti B. Genetic variation in the oxytocin receptor (OXTR) gene is associated with Asperger Syndrome. Mol Autism 2014; 5: 48. 101 Vourc’h P, Martin I, Marouillat S, Adrien JL, Barthélémy C, Moraine C et al. Molecular analysis of the oligodendrocyte myelin glycoprotein gene in autistic disorder. Neurosci Lett 2003; 338: 115–8. 102 Martin I, Gauthier J, D’Amelio M, Védrine S, Vourc’h P, Rouleau GA et al. Transmission disequilibrium study of an oligodendrocyte and myelin glycoprotein gene allele in 431 families with an autistic proband. Neurosci Res 2007; 59: 426–30. 103 Devlin B, Bennett P, Cook EH, Dawson G, Gonen D, Grigorenko EL et al. No evidence for linkage of liability to autism to HOXA1 in a sample from the CPEA network. Am J Med Genet 2002; 114: 667–72. 104 Collins JS, Schroer RJ, Bird J, Michaelis RC. The HOXA1 A218G Polymorphism and Autism: Lack of Association in White and Black Patients from the South Carolina Autism Project. J Autism Dev Disord 2003; 33: 343–348. 105 Conciatori M, Stodgell CJ, Hyman SL, O’Bara M, Militerni R, Bravaccio C et al. Association between the HOXA1 A218G polymorphism and increased head circumference in patients with autism. Biol Psychiatry 2004; 55: 413–9. 106 Sen B, Sinha S, Ahmed S, Ghosh S, Gangopadhyay PK, Usha R. Lack of association of HOXA1 and HOXB1 variants with autism in the Indian population. Psychiatr Genet 2007; 17: 1. 107 Gallagher L, Hawi Z, Kearney G, Fitzgerald M, Gill M. No association between allelic variants of HOXA1/HOXB1 and autism. Am J Med Genet B Neuropsychiatr Genet 2004; 124B: 64–7.

243

108 Romano V, Calì F, Mirisola M, Gambino G, D’ Anna R, Di Rosa P et al. Lack of association of HOXA1 and HOXB1 mutations and autism in Sicilian (Italian) patients. Mol Psychiatry 2003; 8: 716–7. 109 Talebizadeh Z, Bittel DC, Miles JH, Takahashi N, Wang CH, Kibiryeva N et al. No association between HOXA1 and HOXB1 genes and autism spectrum disorders (ASD). J Med Genet 2002; 39: e70. 110 Li J, Tabor HK, Nguyen L, Gleason C, Lotspeich LJ, Spiker D et al. Lack of association between HoxA1 and HoxB1 gene variants and autism in 110 multiplex families. Am J Med Genet 2002; 114: 24–30. 111 Ingram JL, Stodgell CJ, Hyman SL, Figlewicz DA, Weitkamp LR, Rodier PM. Discovery of allelic variants of HOXA1 and HOXB1: genetic susceptibility to autism spectrum disorders. Teratology 2000; 62: 393–405. 112 Ramoz N, Reichert JG, Corwin TE, Smith CJ, Silverman JM, Hollander E et al. Lack of evidence for association of the serotonin transporter gene SLC6A4 with autism. Biol Psychiatry 2006; 60: 186–91. 113 Devlin B, Cook EH, Coon H, Dawson G, Grigorenko EL, McMahon W et al. Autism and the serotonin transporter: the long and short of it. Mol Psychiatry 2005; 10: 1110–6. 114 Kim S-J, Cox N, Courchesne R, Lord C, Corsello C, Akshoomoff N et al. Transmission disequilibrium mapping at the serotonin transporter gene (SLC6A4) region in autistic disorder. Mol Psychiatry 2002; 7: 278–88. 115 Klauck SM, Poustka F, Benner A, Lesch KP, Poustka A. Serotonin transporter (5- HTT) gene variants associated with autism? Hum Mol Genet 1997; 6: 2233–8. 116 Cook EH, Courchesne R, Lord C, Cox NJ, Yan S, Lincoln A et al. Evidence of linkage between the serotonin transporter and autistic disorder. Mol Psychiatry 1997; 2: 247– 50. 117 Conroy J, Meally E, Kearney G, Fitzgerald M, Gill M, Gallagher L. Serotonin transporter gene and autism: a haplotype analysis in an Irish autistic population. Mol Psychiatry 2004; 9: 587–93. 118 Maestrini E, Lai C, Marlow A, Matthews N, Wallace S, Bailey A et al. Serotonin transporter (5-HTT) and gamma-aminobutyric acid receptor subunit beta3 (GABRB3) gene polymorphisms are not associated with autism in the IMGSA families. The International Molecular Genetic Study of Autism Consortium. Am J Med Genet 1999; 88: 492–6. 119 Persico AM, Militerni R, Bravaccio C, Schneider C, Melmed R, Conciatori M et al. Lack of association between serotonin transporter gene promoter variants and autistic disorder in two ethnically distinct samples. Am J Med Genet 2000; 96: 123–7. 120 Tordjman S, Gutknecht L, Carlier M, Spitz E, Antoine C, Slama F et al. Role of the serotonin transporter gene in the behavioral expression of autism. Mol Psychiatry 2001; 6: 434–9.

244

121 Betancur C, Corbex M, Spielewoy C, Philippe A, Laplanche JL, Launay JM et al. Serotonin transporter gene polymorphisms and hyperserotonemia in autistic disorder. Mol Psychiatry 2002; 7: 67–71. 122 Mulder EJ, Anderson GM, Kema IP, Brugman AM, Ketelaars CEJ, de Bildt A et al. Serotonin transporter intron 2 polymorphism associated with rigid-compulsive behaviors in Dutch individuals with pervasive developmental disorder. Am J Med Genet B Neuropsychiatr Genet 2005; 133B: 93–6. 123 Koishi S, Yamamoto K, Matsumoto H, Koishi S, Enseki Y, Oya A et al. Serotonin transporter gene promoter polymorphism and autism: a family-based genetic association study in Japanese population. Brain Dev 2006; 28: 257–60. 124 Guhathakurta S, Sinha S, Ghosh S, Chatterjee A, Ahmed S, Gangopadhyay PK et al. Population-based association study and contrasting linkage disequilibrium pattern reveal genetic association of SLC6A4 with autism in the Indian population from West Bengal. Brain Res 2008; 1240: 12–21. 125 Wu S, Guo Y, Jia M, Ruan Y, Shuang M, Liu J et al. Lack of evidence for association between the serotonin transporter gene (SLC6A4) polymorphisms and autism in the Chinese trios. Neurosci Lett 2005; 381: 1–5. 126 Yoo HJ, Cho IH, Park M, Yang SY, Kim SA. No Association Study of SLC6A4 Polymorphisms with Korean Autism Spectrum Disorder. Korean J Biol Psychiatry 2009; 16: 121–126. 127 Alarcón M, Abrahams BS, Stone JL, Duvall JA, Perederiy J V, Bomar JM et al. Linkage, association, and gene-expression analyses identify CNTNAP2 as an autism- susceptibility gene. Am J Hum Genet 2008; 82: 150–9. 128 Anderson BM, Schnetz-Boutaud N, Bartlett J, Wright HH, Abramson RK, Cuccaro ML et al. Examination of association to autism of common genetic variationin genes related to dopamine. Autism Res 2008; 1: 364–9. 129 Cheng L, Ge Q, Sun B, Yu P, Ke X, Lu Z. Polyacrylamide gel-based microarray: a novel method applied to the association Study between the polymorphisms of BDNF gene and autism. J Biomed Nanotechnol 2009; 5: 542–50. 130 Egawa J, Watanabe Y, Nunokawa A, Endo T, Kaneko N, Tamura R et al. A detailed association analysis between the tryptophan hydroxylase 2 (TPH2) gene and autism spectrum disorders in a Japanese population. Psychiatry Res 2012; 196: 320–2. 131 Gaita L, Manzi B, Sacco R, Lintas C, Altieri L, Lombardi F et al. Decreased serum arylesterase activity in autism spectrum disorders. Psychiatry Res 2010; 180: 105–13. 132 Hutcheson HB, Olson LM, Bradford Y, Folstein SE, Santangelo SL, Sutcliffe JS et al. Examination of NRCAM, LRRN3, KIAA0716, and LAMB1 as autism candidate genes. BMC Med Genet 2004; 5: 12. 133 Kelemenova S, Schmidtova E, Ficek A, Celec P, Kubranska A, Ostatnikova D. Polymorphisms of candidate genes in Slovak autistic patients. Psychiatr Genet 2010; 20: 137–9.

245

134 Mei H, Cuccaro ML, Martin ER. Multifactor dimensionality reduction-phenomics: a novel method to capture genetic heterogeneity with use of phenotypic variables. Am J Hum Genet 2007; 81: 1251–61. 135 Petit E, Hérault J, Martineau J, Perrot A, Barthélémy C, Hameury L et al. Association study with two markers of a human homeogene in infantile autism. J Med Genet 1995; 32: 269–74. 136 Rabionet R, McCauley JL, Jaworski JM, Ashley-Koch AE, Martin ER, Sutcliffe JS et al. Lack of association between autism and SLC25A12. Am J Psychiatry 2006; 163: 929–31. 137 Rehnström K, Ylisaukko-Oja T, Vanhala R, von Wendt L, Peltonen L, Hovatta I. No association between common variants in glyoxalase 1 and autism spectrum disorders. Am J Med Genet B Neuropsychiatr Genet 2008; 147B: 124–7. 138 Serajee FJ, Nabi R, Zhong H, Huq M. Polymorphisms in xenobiotic metabolism genes and autism. J Child Neurol 2004; 19: 413–7. 139 Veatch OJ, Pendergast JS, Allen MJ, Leu RM, Johnson CH, Elsea SH et al. Genetic Variation in Melatonin Pathway Enzymes in Children with Autism Spectrum Disorder and Comorbid Sleep Onset Delay. J Autism Dev Disord 2014. doi:10.1007/s10803-014-2197-4. 140 Weiss LA, Ober C, Cook EH. ITGB3 shows genetic and expression interaction with SLC6A4. Hum Genet 2006; 120: 93–100. 141 Xu X-M, Ding M, Pang H, Xing J-X, Xuan J-F, Wang B-J. [Genetic polymorphisms of SNP loci in the 5’ and 3' region of TPH2 gene in Northern Chinese Han population]. Fa Yi Xue Za Zhi 2013; 29: 21–4. 142 McCauley JL, Olson LM, Dowd M, Amin T, Steele A, Blakely RD et al. Linkage and association analysis at the serotonin transporter (SLC6A4) locus in a rigid-compulsive subset of autism. Am J Med Genet B Neuropsychiatr Genet 2004; 127B: 104–12. 143 Arking DE, Cutler DJ, Brune CW, Teslovich TM, West K, Ikeda M et al. A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. Am J Hum Genet 2008; 82: 160–4.

246

Appendix Table 3: Genes with sex-specific expression (Chapter 3)

Gene ID Log FC AveExpr T P FDR_P RP4-610C12.3 1.62 4.94 17.82 9.80E-60 2.87E-55 RP4-610C12.4 1.39 5.37 14.79 9.54E-44 1.40E-39 FRG1B 0.75 9.18 13.67 3.14E-38 3.07E-34 NOX5 0.82 5.07 10.97 3.96E-26 2.90E-22 MIR4458HG 0.34 8.54 9.44 4.25E-20 2.48E-16 PRSS30P -0.72 5.31 -8.68 2.27E-17 1.11E-13 EIF2S3L -0.37 6.37 -8.61 4.24E-17 1.77E-13 TSPEAR-AS1 -0.79 2.95 -8.18 1.14E-15 4.16E-12 RP11-725P16.2 0.55 5.07 8.02 3.93E-15 1.28E-11 SPESP1 0.62 5.64 7.93 7.80E-15 2.28E-11 FAM201B 0.51 5.79 7.79 2.10E-14 5.58E-11 RP11-575H3.1 -0.82 4.23 -7.49 1.90E-13 4.63E-10 PTPN20B 0.78 4.13 7.44 2.67E-13 6.00E-10 C21orf90 -0.72 3.23 -7.20 1.41E-12 2.94E-09 FLG 0.44 6.49 7.05 4.10E-12 7.99E-09 RP11-809H16.3 0.55 1.29 6.64 5.74E-11 1.01E-07 EIF1AXP1 -0.31 7.26 -6.64 5.85E-11 1.01E-07 FLG-AS1 0.38 6.90 6.61 7.12E-11 1.16E-07 RP11-809H16.5 0.60 1.97 6.53 1.17E-10 1.80E-07 RGPD2 -0.74 4.28 -6.51 1.32E-10 1.93E-07 AC016683.6 0.80 9.03 6.41 2.53E-10 3.53E-07 RP11-333A23.4 0.80 2.08 6.33 4.04E-10 5.37E-07 ZNF44 -0.20 8.52 -6.17 1.08E-09 1.37E-06 C17orf51 0.27 10.97 6.07 1.96E-09 2.39E-06 RP11-775D22.2 -0.60 1.59 -6.06 2.08E-09 2.43E-06 TLR3 -0.48 6.27 -5.96 3.79E-09 4.26E-06 DEFA3 -0.96 3.43 -5.94 4.43E-09 4.80E-06 FAM66E 0.42 3.53 5.87 6.31E-09 6.59E-06 CREG1 -0.20 11.19 -5.85 7.22E-09 7.28E-06 PKD1P1 0.31 9.94 5.84 7.73E-09 7.54E-06 DDX43 0.66 3.93 5.80 9.85E-09 9.29E-06 DRD5P2 0.72 2.97 5.75 1.31E-08 1.19E-05 IL15 -0.40 5.62 -5.74 1.34E-08 1.19E-05 B4GALNT4 0.28 12.63 5.67 2.01E-08 1.73E-05 HLA-A -0.38 12.61 -5.66 2.09E-08 1.75E-05 RP11-14N7.2 0.30 7.28 5.62 2.61E-08 2.07E-05 DDX60 -0.37 8.18 -5.62 2.61E-08 2.07E-05 CES1 -0.61 4.81 -5.56 3.69E-08 2.84E-05 IGHGP -0.54 1.19 -5.50 5.04E-08 3.69E-05 AC004988.1 -0.39 0.39 -5.51 5.02E-08 3.69E-05 BTN3A1 -0.29 9.59 -5.49 5.49E-08 3.92E-05

247

LDHAL6A 0.56 2.89 5.47 5.97E-08 4.06E-05 APOBEC3G -0.43 6.55 -5.47 5.93E-08 4.06E-05 FCRL3 -0.61 2.07 -5.46 6.59E-08 4.38E-05 SAMD9L -0.41 8.75 -5.44 7.25E-08 4.71E-05 LINC00277 0.48 4.76 5.43 7.75E-08 4.93E-05 HAGH 0.15 11.68 5.41 8.30E-08 5.16E-05 ZNF696 0.15 8.30 5.38 9.71E-08 5.83E-05 ZNF433 -0.21 7.07 -5.38 9.96E-08 5.83E-05 HMGN2 -0.22 12.08 -5.38 9.94E-08 5.83E-05 MEMO1 0.23 6.33 5.36 1.09E-07 6.26E-05 COLEC11 -0.48 5.33 -5.36 1.13E-07 6.35E-05 FRA10AC1 -0.18 11.18 -5.35 1.18E-07 6.54E-05 AC138969.4 0.27 10.42 5.34 1.24E-07 6.71E-05 EIF3D -0.12 11.56 -5.33 1.29E-07 6.86E-05 AC009133.15 0.24 9.04 5.32 1.36E-07 7.10E-05 MX2 -0.45 7.49 -5.30 1.53E-07 7.87E-05 CCL4 -0.77 4.30 -5.25 1.95E-07 9.69E-05 GSTO2 0.27 9.00 5.25 1.94E-07 9.69E-05 TSSC2 -0.41 4.28 -5.22 2.34E-07 1.12E-04 RETN -0.61 1.39 -5.22 2.33E-07 1.12E-04 CNGA1 -0.57 3.46 -5.20 2.52E-07 1.17E-04 LRRC4B 0.22 12.88 5.20 2.50E-07 1.17E-04 BTN3A3 -0.22 9.02 -5.17 2.94E-07 1.34E-04 AC113189.5 0.18 7.25 5.17 2.97E-07 1.34E-04 DVL1 0.19 11.70 5.17 3.02E-07 1.34E-04 ZNRF2P1 0.40 3.16 5.16 3.07E-07 1.34E-04 RP11-3N2.13 -0.29 5.58 -5.16 3.13E-07 1.35E-04 RSAD2 -0.41 8.18 -5.14 3.53E-07 1.50E-04 OXER1 -0.32 5.14 -5.11 4.01E-07 1.67E-04 ENDOG 0.27 7.51 5.09 4.50E-07 1.83E-04 AC006050.2 -0.46 0.85 -5.09 4.46E-07 1.83E-04 CXCL3 -0.67 4.31 -5.07 5.00E-07 1.98E-04 IFIH1 -0.33 8.22 -5.07 4.96E-07 1.98E-04 RP11-108M9.4 -0.26 6.26 -5.05 5.41E-07 2.10E-04 C6orf25 -0.45 3.15 -5.05 5.48E-07 2.10E-04 TP53 -0.29 8.50 -5.05 5.52E-07 2.10E-04 RP11-108M9.5 -0.26 6.09 -5.04 5.80E-07 2.17E-04 OAS1 -0.44 7.85 -5.04 5.85E-07 2.17E-04 GPCPD1 -0.27 11.35 -5.03 6.00E-07 2.17E-04 BIRC3 -0.35 7.59 -5.03 6.01E-07 2.17E-04 HMGN1 -0.17 11.67 -5.03 6.22E-07 2.20E-04 STARD10 0.21 10.23 5.02 6.31E-07 2.20E-04 RP13-492C18.2 -0.44 0.96 -5.03 6.25E-07 2.20E-04 RP11-420G6.4 -0.47 5.03 -5.00 6.97E-07 2.40E-04

248

RP4-597N16.4 0.38 4.49 5.00 7.25E-07 2.47E-04 ODC1 -0.21 10.99 -4.98 7.82E-07 2.63E-04 OASL -0.56 5.04 -4.97 8.16E-07 2.69E-04 FCRL6 -0.49 3.00 -4.97 8.17E-07 2.69E-04 ZDHHC8 0.13 11.42 4.96 8.63E-07 2.80E-04 TRAT1 -0.42 0.92 -4.96 8.74E-07 2.81E-04 CETP -0.49 4.23 -4.94 9.70E-07 3.08E-04 PLGRKT -0.19 7.97 -4.93 1.01E-06 3.16E-04 RP11-262H14.5 0.38 4.88 4.92 1.04E-06 3.23E-04 ZNF627 -0.13 8.58 -4.91 1.09E-06 3.35E-04 CCRL2 -0.43 5.26 -4.91 1.11E-06 3.37E-04 HCG4P5 -0.36 9.11 -4.90 1.14E-06 3.44E-04 HLA-B -0.39 12.74 -4.89 1.21E-06 3.55E-04 TTC27 -0.16 7.97 -4.89 1.21E-06 3.55E-04 TSPEAR -0.46 3.26 -4.89 1.22E-06 3.55E-04 This table lists the 100 most significant differentially expressed genes based on sex in the brain. Expression data was identified from the GTEx. Genes with logFC > 0 are male- specific expressed genes. Genes with logFC < 0 are female-specific expressed genes.

249

Appendix Table 4: Gene based analysis of the non-stratified EQ GWAS (Chapter 3)

Gene Chr Start Stop P Symbol ENSG00000137872 15 47466298 48076420 9.14E-07 SEMA6D ENSG00000138829 5 127583601 128004878 1.03E-06 FBN2 ENSG00000142494 17 19388698 19492347 3.20E-06 SLC47A1 ENSG00000160741 1 153910145 153941101 5.95E-06 CRTC2 ENSG00000186468 5 81559177 81584396 8.85E-06 RPS23 ENSG00000198837 1 153891977 153929172 9.13E-06 DENND4B ENSG00000143614 1 153767201 153905451 1.63E-05 GATAD2B ENSG00000143552 1 153955161 154137592 1.96E-05 NUP210L ENSG00000133265 19 55763599 55801749 1.99E-05 HSPBP1 ENSG00000104856 19 45494688 45551452 2.04E-05 RELB ENSG00000205464 5 81565281 81692796 2.90E-05 ATP6AP1L ENSG00000151917 6 56809773 56902140 3.72E-05 BEND6 ENSG00000152348 5 81257844 81582676 3.78E-05 ATG10 ENSG00000143545 1 153944127 153968834 3.86E-05 RAB13 ENSG00000143570 1 153921575 153950188 4.37E-05 SLC39A1 ENSG00000157483 15 59417113 59675099 5.47E-05 MYO1E ENSG00000204669 9 74656292 74697733 5.92E-05 C9orf57 ENSG00000143578 1 153930010 153956839 6.24E-05 CREB3L4 ENSG00000123612 2 158373279 158495517 7.05E-05 ACVR1C ENSG00000228716 5 79912047 79960802 7.21E-05 DHFR ENSG00000143543 1 153936745 153960164 9.54E-05 JTB ENSG00000149294 11 112821997 113159158 9.64E-05 NCAM1 ENSG00000271043 5 79935819 79956855 0.000104 MTRNR2L2 ENSG00000183520 1 38464930 38500496 0.000114 UTP11L ENSG00000118473 1 66989066 67223982 0.000119 SGIP1 ENSG00000183034 17 72910370 72940007 0.000144 OTOP2 ENSG00000104870 19 50000073 50039590 0.000152 FCGRT ENSG00000142609 1 1843396 1945276 0.000173 C1orf222 ENSG00000008130 1 1672671 1721896 0.000177 NADK ENSG00000182938 17 72921814 72956087 0.000182 OTOP3 ENSG00000137804 15 41614892 41683248 0.000182 NUSAP1 ENSG00000104859 19 45532298 45584214 0.000187 CLASRP ENSG00000197110 19 39724246 39745646 0.00019 IFNL3 ENSG00000106070 7 50647760 50871159 0.000224 GRB10 ENSG00000053254 14 89581215 90095493 0.000227 FOXN3 ENSG00000137806 15 41669551 41704717 0.000228 NDUFAF1 ENSG00000129152 11 17731115 17753678 0.00024 MYOD1 ENSG00000124205 20 57865482 57911047 0.000279 EDN3 ENSG00000030110 6 33530329 33558019 0.000283 BAK1

250

ENSG00000169885 1 1836266 1858735 0.000288 CALML6 ENSG00000104147 15 41591466 41634819 0.000289 OIP5 ENSG00000137815 15 41690606 41785761 0.000294 RTF1 ENSG00000181409 17 79081095 79149877 0.000301 AATK ENSG00000173769 3 44273378 44383590 0.000308 TOPAZ1 ENSG00000204913 17 38087727 38111000 0.000321 LRRC3C ENSG00000178821 1 1839029 1860712 0.000321 TMEM52 ENSG00000179152 3 44369611 44460943 0.00034 TCAIM ENSG00000177885 17 73304157 73411790 0.000344 GRB2 ENSG00000067365 16 8705540 8750081 0.000345 METTL22 ENSG00000120685 13 39574003 39622252 0.000346 PROSER1 ENSG00000143549 1 154117784 154177124 0.000357 TPM3 ENSG00000107897 10 27474146 27541059 0.000366 ACBD5 ENSG00000177954 1 153953235 153974626 0.000375 RPS27 ENSG00000170426 12 57306938 57338189 0.000384 SDR9C7 ENSG00000154719 21 26947968 26989829 0.000388 MRPL39 ENSG00000164690 7 155582680 155614967 0.000393 SHH ENSG00000196712 17 29411945 29719134 0.000419 NF1 ENSG00000113318 5 79940467 80182279 0.000433 MSH3 ENSG00000205592 12 40777197 40974632 0.000462 MUC19 ENSG00000161533 17 73927588 73985515 0.000495 ACOX1 ENSG00000188906 12 40580546 40773087 0.000535 LRRK2 ENSG00000157349 16 70313566 70379186 0.000547 DDX19B ENSG00000116809 1 16258364 16312627 0.000548 ZBTB17 ENSG00000167377 16 71471500 71506998 0.000591 ZNF23 ENSG00000126861 17 29589031 29634557 0.0006 OMG ENSG00000162086 16 3345406 3378852 0.0006 ZNF75A ENSG00000104835 19 39395906 39450495 0.000615 SARS2 ENSG00000107554 10 101625334 101779676 0.000621 DNMBP ENSG00000186350 9 137198944 137342431 0.000621 RXRA ENSG00000161904 6 33728979 33766913 0.000622 LEMD2 ENSG00000267467 19 45435495 45462820 0.000637 APOC4 ENSG00000224916 19 45435495 45462822 0.000637 APOC4- APOC2 ENSG00000065526 1 16164359 16276955 0.000659 SPEN ENSG00000115183 2 159815146 160099170 0.000709 TANC1 ENSG00000196549 3 154731913 154911497 0.000715 MME ENSG00000168268 3 52548386 52579070 0.000719 NT5DC2 ENSG00000234906 19 45439243 45462822 0.000727 APOC2 ENSG00000104825 19 39380340 39409533 0.00073 NFKBIB ENSG00000142552 19 50020875 50060219 0.000809 RCN3 ENSG00000204128 2 231892205 231924434 0.000823 C2orf72 ENSG00000167914 17 38109226 38144019 0.00083 GSDMA ENSG00000088543 3 50585462 50618458 0.000836 C3orf18 ENSG00000168273 3 52558029 52623253 0.000842 SMIM4

251

ENSG00000112893 5 109015067 109215326 0.000903 MAN2A1 ENSG00000150722 2 182808968 183006125 0.000907 PPP1R1C ENSG00000065618 10 105781044 105855760 0.000912 COL17A1 ENSG00000183386 1 38452442 38481278 0.000926 FHL3 ENSG00000163092 2 167734997 168126263 0.000945 XIRP2 ENSG00000214026 11 1958508 2015752 0.000964 MRPL23 ENSG00000204580 6 30834198 30877933 0.000988 DDR1 ENSG00000090554 19 49967464 49999488 0.000999 FLT3LG ENSG00000104853 19 45447842 45506599 0.001006 CLPTM1 ENSG00000028203 12 95601522 95706566 0.001011 VEZT ENSG00000182224 17 7751064 7775600 0.001014 CYB5D1 ENSG00000114735 3 50596583 50632366 0.001025 HEMK1 ENSG00000141098 16 67698434 67763324 0.001029 GFOD2 ENSG00000065613 10 105716959 105798991 0.001042 SLK ENSG00000114737 3 50633921 50659262 0.001063 CISH ENSG00000205220 16 67958405 67980990 0.001065 PSMB10 ENSG00000268500 19 52105344 52160142 0.00108 SIGLEC5 This table provides the results (most significant 100) of the MAGMA gene-based P-value analyses (Chapter 3). Genes are significant after Bonferroni correction. Significant genes have been italicized.

252

Appendix Table 5: Pathway based analysis for the non-stratified EQ (Chapter 3)

N Beta SE P Full name Gen es 13 0.84 0.24 1.94E-04 GO_bp:go_negative_regulation_of_neurotransmitter_transport 21 0.70 0.20 2.26E-04 GO_bp:go_embryonic_placenta_morphogenesis 240 0.18 0.05 3.21E-04 GO_bp:go_positive_regulation_of_cell_cycle_process 83 0.29 0.09 4.48E-04 GO_bp:go_long_chain_fatty_acid_metabolic_process 48 0.39 0.12 5.04E-04 GO_bp:go_regulation_of_neurotransmitter_secretion 9 0.89 0.27 5.14E-04 GO_bp:go_negative_regulation_of_neurotransmitter_secretion 73 0.32 0.10 7.57E-04 GO_bp:go_neural_crest_cell_differentiation 19 0.61 0.20 1.11E-03 GO_bp:go_cellular_response_to_reactive_nitrogen_species 13 0.80 0.26 1.11E-03 GO_bp:go_positive_regulation_of_oligodendrocyte_differentiation 541 0.11 0.04 1.79E-03 GO_bp:go_regulation_of_cell_cycle_process 60 0.31 0.11 1.87E-03 GO_bp:go_regulation_of_neurotransmitter_transport 106 0.23 0.08 1.97E-03 GO_bp:go_regulation_of_cell_cycle_arrest 316 0.13 0.05 2.40E-03 GO_bp:go_regulation_of_cell_cycle_phase_transition 66 0.25 0.09 2.91E-03 GO_bp:go_positive_regulation_of_cell_cycle_phase_transition 87 0.25 0.09 2.92E-03 GO_bp:go_regulation_of_cell_matrix_adhesion 23 0.48 0.17 2.97E-03 GO_bp:go_negative_regulation_of_oxidoreductase_activity 121 0.21 0.08 3.09E-03 GO_bp:go_monocarboxylic_acid_transport 136 0.20 0.07 3.10E-03 GO_bp:go_mitotic_cell_cycle_checkpoint 23 0.43 0.16 3.56E-03 GO_bp:go_negative_regulation_of_viral_transcription 36 0.41 0.16 3.76E-03 GO_bp:go_bile_acid_metabolic_process 26 0.46 0.18 4.36E-03 GO_bp:go_ventricular_system_development 49 0.33 0.13 4.52E-03 GO_bp:go_neural_crest_cell_migration 15 0.59 0.23 4.75E-03 GO_bp:go_negative_regulation_of_neurological_system_process 18 0.52 0.20 5.07E-03 GO_bp:go_cellular_component_maintenance 22 0.45 0.18 5.07E-03 GO_bp:go_regulation_of_odontogenesis 21 0.51 0.20 5.12E-03 GO_bp:go_negative_regulation_of_peptidyl_serine_phosphorylation 24 0.49 0.19 5.36E-03 GO_bp:go_positive_regulation_of_skeletal_muscle_tissue_development 95 0.21 0.08 5.45E-03 GO_bp:go_monocarboxylic_acid_catabolic_process 12 0.68 0.27 5.51E-03 GO_bp:go_regulation_of_protein_polyubiquitination 61 0.28 0.11 5.55E-03 GO_bp:go_forebrain_generation_of_neurons 21 0.46 0.18 5.63E-03 GO_bp:go_ruffle_organization 48 0.30 0.12 5.66E-03 GO_bp:go_arachidonic_acid_metabolic_process 125 0.18 0.07 6.22E-03 GO_bp:go_regulation_of_establishment_of_protein_localization_to_mitoch ondrion 73 0.24 0.10 6.69E-03 GO_bp:go_regulation_of_neural_precursor_cell_proliferation 17 0.48 0.19 6.80E-03 GO_bp:go_cyclic_nucleotide_catabolic_process 28 0.37 0.15 6.95E-03 GO_bp:go_regulation_of_neuroblast_proliferation 14 0.54 0.22 7.08E-03 GO_bp:go_preassembly_of_gpi_anchor_in_er_membrane

253

97 0.21 0.09 7.50E-03 GO_bp:go_negative_regulation_of_cell_cycle_g1_s_phase_transition 193 0.15 0.06 7.59E-03 GO_bp:go_negative_regulation_of_mitotic_cell_cycle 18 0.42 0.17 7.66E-03 GO_bp:go_positive_regulation_of_cell_cycle_g2_m_phase_transition 84 0.23 0.09 7.85E-03 GO_bp:go_positive_regulation_of_cell_cycle_arrest 27 0.43 0.18 8.07E-03 GO_bp:go_membrane_biogenesis 46 0.31 0.13 8.22E-03 GO_bp:go_demethylation 144 0.17 0.07 8.70E-03 GO_bp:go_negative_regulation_of_cell_cycle_phase_transition 99 0.20 0.08 8.85E-03 GO_bp:go_mitotic_dna_integrity_checkpoint 41 0.31 0.13 9.23E-03 GO_bp:go_rna_secondary_structure_unwinding 34 0.34 0.15 9.48E-03 GO_bp:go_renal_water_homeostasis 57 0.26 0.11 9.81E-03 GO_bp:go_negative_regulation_of_synaptic_transmission 63 0.23 0.10 9.81E-03 GO_bp:go_positive_regulation_of_response_to_dna_damage_stimulus 71 0.22 0.10 9.82E-03 GO_bp:go_extracellular_matrix_disassembly 54 0.29 0.12 9.90E-03 GO_bp:go_positive_regulation_of_muscle_tissue_development 9 0.73 0.32 1.00E-02 GO_bp:go_renal_filtration 25 0.39 0.17 1.02E-02 GO_bp:go_apoptotic_nuclear_changes 23 0.45 0.19 1.03E-02 GO_bp:go_protein_dealkylation 23 0.45 0.19 1.03E-02 GO_bp:go_protein_demethylation 11 0.57 0.25 1.05E-02 GO_bp:go_mitotic_sister_chromatid_cohesion 147 0.15 0.07 1.12E-02 GO_bp:go_cellular_lipid_catabolic_process 13 0.51 0.23 1.13E-02 GO_bp:go_protein_import_into_mitochondrial_matrix 18 0.44 0.19 1.15E-02 GO_bp:go_positive_regulation_of_dna_recombination 95 0.19 0.08 1.15E-02 GO_bp:go_regulation_of_protein_targeting_to_mitochondrion 39 0.31 0.14 1.16E-02 GO_bp:go_negative_regulation_of_reactive_oxygen_species_metabolic_pr ocess 320 0.11 0.05 1.16E-02 GO_bp:go_positive_regulation_of_cell_cycle 48 0.29 0.13 1.17E-02 GO_bp:go_response_to_gamma_radiation 42 0.30 0.14 1.25E-02 GO_bp:go_negative_regulation_of_jak_stat_cascade 42 0.30 0.14 1.25E-02 GO_bp:go_negative_regulation_of_stat_cascade 10 0.54 0.24 1.26E-02 GO_bp:go_creatine_metabolic_process 12 0.61 0.27 1.27E-02 GO_bp:go_alpha_linolenic_acid_metabolic_process 31 0.34 0.15 1.27E-02 GO_bp:go_bile_acid_and_bile_salt_transport 15 0.46 0.21 1.28E-02 GO_bp:go_negative_regulation_of_smooth_muscle_contraction 56 0.25 0.11 1.29E-02 GO_bp:go_regulation_of_cell_cycle_g2_m_phase_transition 66 0.24 0.11 1.31E-02 GO_bp:go_positive_regulation_of_axonogenesis 22 0.39 0.17 1.35E-02 GO_bp:go_negative_regulation_of_muscle_contraction 28 0.33 0.15 1.37E-02 GO_bp:go_very_long_chain_fatty_acid_metabolic_process 914 0.06 0.03 1.38E-02 GO_bp:go_organic_acid_metabolic_process 181 0.14 0.06 1.40E-02 GO_bp:go_cellular_response_to_oxidative_stress 40 0.28 0.13 1.41E-02 GO_bp:go_cellular_component_disassembly_involved_in_execution_phase _of_apoptosis 133 0.15 0.07 1.41E-02 GO_bp:go_cellular_response_to_radiation 89 0.20 0.09 1.44E-02 GO_bp:go_canonical_wnt_signaling_pathway 456 0.09 0.04 1.45E-02 GO_bp:go_regulation_of_mitotic_cell_cycle 58 0.24 0.11 1.45E-02 GO_bp:go_multicellular_organismal_water_homeostasis

254

34 0.29 0.14 1.47E-02 GO_bp:go_transcription_from_rna_polymerase_i_promoter 371 0.10 0.05 1.47E-02 GO_bp:go_organic_anion_transport 143 0.15 0.07 1.51E-02 GO_bp:go_regulation_of_cell_cycle_g1_s_phase_transition 250 0.12 0.05 1.51E-02 GO_bp:go_organic_acid_transport 59 0.25 0.12 1.52E-02 GO_bp:go_positive_regulation_of_osteoblast_differentiation 189 0.14 0.06 1.53E-02 GO_bp:go_cell_cycle_checkpoint 23 0.38 0.17 1.54E-02 GO_bp:go_regulation_of_isotype_switching 13 0.50 0.23 1.56E-02 GO_bp:go_proteasome_assembly 31 0.32 0.15 1.57E-02 GO_bp:go_peroxisome_organization 20 0.39 0.18 1.57E-02 GO_bp:go_organic_cation_transport 22 0.39 0.18 1.58E-02 GO_bp:go_modulation_of_transcription_in_other_organism_involved_in_s ymbiotic_interaction 240 0.12 0.05 1.59E-02 GO_bp:go_lipid_catabolic_process 480 0.09 0.04 1.60E-02 GO_bp:go_anion_transport 199 0.12 0.06 1.60E-02 GO_bp:go_carboxylic_acid_catabolic_process 199 0.12 0.06 1.60E-02 GO_bp:go_organic_acid_catabolic_process 14 0.44 0.21 1.62E-02 GO_bp:go_dna_damage_response_signal_transduction_resulting_in_transcr iption 185 0.14 0.06 1.62E-02 GO_bp:go_stem_cell_differentiation 32 0.29 0.14 1.64E-02 GO_bp:go_regulation_of_protein_deacetylation 42 0.28 0.13 1.65E-02 GO_bp:go_long_chain_fatty_acid_transport 82 0.21 0.10 1.66E-02 GO_bp:go_positive_regulation_of_ossification 28 0.33 0.15 1.70E-02 GO_bp:go_negative_regulation_of_cell_matrix_adhesion This table provides the results of the pathway enrichment for the Gene Ontology Biological Pathways (Chapter 2). Only the 100 most significant results are provided.

255

Appendix Table 6: Gene based analysis of the non-stratified Eyes Test (Chapter 4)

Symbol Z P Number Tissue FDR P SNPs

CHAF1B 4.63 3.64E-06 37 anterior_cingulate 0.09 GTDC2 4.58 4.62E-06 8 cortex 0.09 DECR2 -4.57 4.80E-06 20 cerebellar_hemisphere 0.09 AC007040.11 4.38 1.20E-05 21 frontal_cortex 0.11 XYLT1 4.35 1.35E-05 2 hypothalamus 0.11 GTF2H5 4.35 1.38E-05 54 cortex 0.11 DVL2 4.34 1.43E-05 20 Putamen 0.11 SULT1A2 4.34 1.45E-05 18 anterior_cingulate 0.11 TRIM73 -4.29 1.82E-05 19 cerebellar_hemisphere 0.12 NDUFA3 4.22 2.45E-05 8 cerebellum 0.13 EEF1A2 -4.22 2.47E-05 23 hippocampus 0.13 SLC2A12 -4.19 2.76E-05 12 cerebellum 0.13 SULT1A2 4.18 2.90E-05 29 Putamen 0.13 TUFM 4.15 3.37E-05 12 caudate_basal_ganglia 0.14 CCDC169 -4.11 3.92E-05 79 Putamen 0.14 SULT1A2 4.11 3.93E-05 14 cortex 0.14 STAT6 4.08 4.41E-05 27 frontal_cortex 0.15 SULT1A2 4.05 5.21E-05 14 frontal_cortex 0.17 NDUFA3 4.02 5.72E-05 16 cerebellar_hemisphere 0.18 BTD 3.99 6.68E-05 4 anterior_cingulate 0.18 TRIM73 -3.98 6.84E-05 13 cerebellum 0.18 CUL3 -3.98 6.90E-05 25 caudate_basal_ganglia 0.18 TST 3.98 6.99E-05 40 hypothalamus 0.18 TRIM73 -3.95 7.75E-05 4 nucleus _accumbens 0.19 CXCL13 -3.94 8.19E-05 17 cerebellar_hemisphere 0.19 TCEA2 -3.93 8.47E-05 13 cerebellum 0.19 APOBEC3C -3.92 8.75E-05 24 cortex 0.19 SSTR4 -3.92 8.79E-05 12 cerebellum 0.19 LRPPRC 3.91 9.22E-05 15 cortex 0.19 LRPPRC 3.88 1.04E-04 17 caudate_basal_ganglia 0.20 TMC1 3.88 1.05E-04 33 caudate_basal_ganglia 0.20 TMEM180 3.87 1.08E-04 2 nucleus _accumbens 0.20 TRIM73 -3.85 1.20E-04 5 cortex 0.20 TMEM180 3.84 1.22E-04 13 hypothalamus 0.20 TMEM180 3.84 1.24E-04 21 cerebellum 0.20 ACTR1A 3.82 1.32E-04 1 caudate_basal_ganglia 0.20 HAAO 3.82 1.35E-04 23 caudate_basal_ganglia 0.20 CCDC77 3.81 1.38E-04 8 anterior_cingulate 0.20

256

COPZ2 3.80 1.45E-04 4 caudate_basal_ganglia 0.20 SULT1A2 3.80 1.45E-04 20 caudate_basal_ganglia 0.20 BET1 3.79 1.48E-04 4 Putamen 0.20 NCLN 3.79 1.49E-04 12 Putamen 0.20 LRPPRC 3.79 1.51E-04 28 cerebellar_hemisphere 0.20 TMEM101 3.79 1.53E-04 23 hippocampus 0.20 TTC27 -3.78 1.56E-04 37 caudate_basal_ganglia 0.20 CCDC77 3.76 1.68E-04 21 frontal_cortex 0.21 SPPL2C -3.76 1.72E-04 1 nucleus _accumbens 0.21 DNAJC15 -3.75 1.76E-04 18 cerebellar_hemisphere 0.21 ERP27 3.75 1.78E-04 24 nucleus _accumbens 0.21 DNAJC15 -3.72 2.00E-04 32 cerebellum 0.23 SULT1A1 -3.71 2.05E-04 18 caudate_basal_ganglia 0.23 KCNJ4 3.71 2.06E-04 3 anterior_cingulate 0.23 NPIPB6 3.71 2.07E-04 3 cerebellar_hemisphere 0.23 MN1 3.70 2.19E-04 8 Putamen 0.23 EPO 3.69 2.21E-04 1 nucleus _accumbens 0.23 ZNF555 3.69 2.23E-04 27 hypothalamus 0.23 CCDC77 3.69 2.25E-04 11 caudate_basal_ganglia 0.23 TRIM73 -3.68 2.32E-04 7 Putamen 0.23 HMHA1 -3.67 2.42E-04 15 caudate_basal_ganglia 0.23 SULT1A1 -3.67 2.42E-04 16 frontal_cortex 0.23 GALNTL6 3.66 2.49E-04 34 frontal_cortex 0.23 SULT1A2 3.66 2.50E-04 22 cerebellum 0.23 LRPPRC 3.66 2.50E-04 6 anterior_cingulate 0.23 TUFM 3.65 2.61E-04 23 Putamen 0.23 FAM204A -3.65 2.61E-04 6 hypothalamus 0.23 TFAP2A -3.65 2.62E-04 1 caudate_basal_ganglia 0.23 TUFM 3.65 2.67E-04 24 nucleus _accumbens 0.23 GTDC2 3.64 2.76E-04 5 nucleus _accumbens 0.23 CABP5 3.64 2.77E-04 9 cerebellum 0.23 SGSM1 -3.63 2.80E-04 18 cerebellar_hemisphere 0.23 SUOX 3.63 2.82E-04 33 cerebellum 0.23 ANTXR2 -3.62 2.95E-04 8 Putamen 0.24 MFN2 3.61 3.01E-04 4 hypothalamus 0.24 NUPR1 3.61 3.10E-04 20 cerebellar_hemisphere 0.25 LRPPRC 3.60 3.19E-04 13 hippocampus 0.25 TMEM119 -3.60 3.21E-04 2 caudate_basal_ganglia 0.25 GATSL2 -3.60 3.23E-04 7 cortex 0.25 SH2B1 -3.60 3.24E-04 17 cerebellum 0.25 PPP1CA -3.59 3.35E-04 7 cerebellum 0.25 RAMP3 3.58 3.46E-04 10 caudate_basal_ganglia 0.25 SULT1A1 -3.58 3.48E-04 33 cortex 0.25 TMEM25 3.55 3.87E-04 33 frontal_cortex 0.28

257

POM121C -3.55 3.90E-04 10 Putamen 0.28 NDUFA3 3.54 3.95E-04 4 cortex 0.28 PEX1 -3.54 4.05E-04 1 Putamen 0.28 SYF2 3.53 4.13E-04 1 cortex 0.28 NDUFA3 3.52 4.31E-04 4 caudate_basal_ganglia 0.29 SULT1A1 -3.51 4.45E-04 19 cerebellum 0.29 FLT4 -3.51 4.52E-04 14 caudate_basal_ganglia 0.29 HPX 3.51 4.52E-04 21 anterior_cingulate 0.29 DNAJC15 -3.50 4.58E-04 28 anterior_cingulate 0.29 DNAJC15 -3.50 4.63E-04 36 hippocampus 0.29 SEMA5B 3.50 4.67E-04 15 hypothalamus 0.29 ATP5J2 -3.50 4.71E-04 3 anterior_cingulate 0.29 LRPPRC 3.50 4.74E-04 8 nucleus _accumbens 0.29 DSCR3 3.49 4.76E-04 22 Putamen 0.29 HS3ST3A1 3.49 4.78E-04 17 cerebellum 0.29 TAB1 -3.49 4.79E-04 12 cerebellum 0.29 DNAJC15 -3.49 4.85E-04 30 frontal_cortex 0.29 This table provides the results of the MetaXcan gene based results for the non-stratified Eyes Test (Chapter 4). Only top 100 most significant results are provided.

258

Appendix Table 7: Gene based association for the Eyes Test - males (Chapter 4)

Ensembl ID Symbol Z P Variance Number explained of SNPs

ENSG00000171014 OR4D5 -4.25 2.12E-05 0.00 1 ENSG00000272047 GTF2H5 4.18 2.86E-05 0.20 54 ENSG00000198750 GATSL2 -3.97 7.17E-05 0.04 7 ENSG00000144647 GTDC2 3.76 1.73E-04 0.08 8 ENSG00000187145 MRPS21 3.45 5.52E-04 0.03 8 ENSG00000113838 TBCCD1 3.42 6.37E-04 0.09 33 ENSG00000172830 SSH3 3.29 1.00E-03 0.16 41 ENSG00000138095 LRPPRC 3.25 1.17E-03 0.12 15 ENSG00000183833 MAATS1 -3.22 1.27E-03 0.03 14 ENSG00000075975 MKRN2 -3.21 1.31E-03 0.03 9 ENSG00000088280 ASAP3 3.19 1.44E-03 0.17 52 ENSG00000143537 ADAM15 -3.12 1.78E-03 0.03 11 ENSG00000182261 NLRP10 3.08 2.08E-03 0.07 34 ENSG00000117614 SYF2 3.04 2.38E-03 0.00 1 ENSG00000137691 C11orf70 -3.01 2.58E-03 0.00 1 ENSG00000112335 SNX3 2.97 2.96E-03 0.00 7 ENSG00000214688 C10orf105 2.96 3.09E-03 0.01 4 ENSG00000142185 TRPM2 2.96 3.10E-03 0.01 5 ENSG00000108379 WNT3 2.95 3.16E-03 0.25 45 ENSG00000151322 NPAS3 2.94 3.28E-03 0.00 2 ENSG00000075292 ZNF638 -2.93 3.38E-03 0.02 8 ENSG00000164308 ERAP2 2.92 3.49E-03 0.40 29 ENSG00000095321 CRAT 2.92 3.53E-03 0.03 8 ENSG00000167363 FN3K 2.91 3.59E-03 0.00 3 ENSG00000159128 IFNGR2 -2.91 3.66E-03 0.41 76 ENSG00000104946 TBC1D17 2.90 3.70E-03 0.00 1 ENSG00000119547 ONECUT2 -2.90 3.71E-03 0.04 20 ENSG00000185215 TNFAIP2 2.90 3.78E-03 0.02 16 ENSG00000121104 FAM117A 2.90 3.79E-03 0.04 7 ENSG00000239332 AC016722.2 2.89 3.83E-03 0.00 4 ENSG00000103479 RBL2 -2.89 3.86E-03 0.23 28 ENSG00000075568 TMEM131 -2.89 3.91E-03 4.68 41 ENSG00000101474 APMAP 2.88 3.96E-03 0.29 67 ENSG00000228157 AC007952.5 -2.87 4.14E-03 0.00 1 ENSG00000156374 PCGF6 -2.86 4.26E-03 0.01 15 ENSG00000172487 OR8J1 -2.86 4.28E-03 0.00 1 ENSG00000112232 KHDRBS2 2.86 4.29E-03 0.15 33 ENSG00000168924 LETM1 2.85 4.44E-03 0.00 2

259

ENSG00000151465 CDC123 -2.84 4.53E-03 0.06 31 ENSG00000165480 SKA3 -2.83 4.58E-03 0.00 2 ENSG00000178809 TRIM73 -2.82 4.84E-03 0.02 5 ENSG00000213937 CLDN9 -2.81 4.88E-03 0.05 3 ENSG00000184454 NCMAP 2.80 5.14E-03 0.03 6 ENSG00000082684 SEMA5B 2.79 5.27E-03 0.26 27 ENSG00000108091 CCDC6 -2.79 5.32E-03 0.12 48 ENSG00000058085 LAMC2 2.78 5.40E-03 0.05 9 ENSG00000042493 CAPG 2.78 5.46E-03 0.02 12 ENSG00000215262 KCNU1 -2.78 5.47E-03 0.08 17 ENSG00000154975 CA10 2.77 5.53E-03 0.13 24 ENSG00000127564 PKMYT1 -2.77 5.61E-03 0.07 8 ENSG00000021574 SPAST 2.74 6.18E-03 0.17 49 ENSG00000136872 ALDOB -2.73 6.36E-03 0.02 7 ENSG00000197647 ZNF433 2.72 6.47E-03 0.00 1 ENSG00000161652 IZUMO2 -2.72 6.53E-03 0.03 22 ENSG00000085552 IGSF9 2.72 6.57E-03 0.06 22 ENSG00000090971 NAT14 2.69 7.13E-03 0.07 33 ENSG00000147481 SNTG1 -2.68 7.38E-03 0.06 28 ENSG00000212901 KRTAP3-1 2.68 7.40E-03 0.01 7 ENSG00000116885 OSCP1 2.68 7.44E-03 0.22 46 ENSG00000257230 RP11- 2.68 7.47E-03 0.00 1 272B17.2 ENSG00000234511 C5orf58 -2.67 7.51E-03 0.03 23 ENSG00000141522 ARHGDIA 2.66 7.74E-03 0.01 21 ENSG00000095319 NUP188 -2.66 7.78E-03 0.02 8 ENSG00000140374 ETFA -2.65 7.94E-03 0.05 8 ENSG00000133789 SWAP70 2.65 8.00E-03 0.06 22 ENSG00000198346 ZNF813 2.64 8.23E-03 0.06 28 ENSG00000171160 MORN4 -2.64 8.24E-03 0.42 51 ENSG00000167900 TK1 2.64 8.31E-03 0.00 2 ENSG00000213918 DNASE1 -2.64 8.37E-03 0.03 11 ENSG00000111237 VPS29 2.63 8.57E-03 0.04 12 ENSG00000136425 CIB2 2.63 8.60E-03 0.13 14 ENSG00000084444 KIAA1467 -2.63 8.63E-03 0.12 42 ENSG00000181396 OGFOD3 -2.62 8.68E-03 0.01 3 ENSG00000119973 PRLHR -2.62 8.75E-03 0.02 12 ENSG00000225899 FRG2B 2.62 8.76E-03 0.02 12 ENSG00000181291 TMEM132E -2.62 8.79E-03 0.05 16 ENSG00000197322 C17orf102 2.62 8.82E-03 0.03 13 ENSG00000238083 LRRC37A2 2.60 9.22E-03 0.40 48 ENSG00000170915 PAQR8 2.60 9.22E-03 0.03 8 ENSG00000198104 OR2T6 -2.60 9.33E-03 0.03 10 ENSG00000119681 LTBP2 -2.60 9.37E-03 0.01 8 ENSG00000138231 DBR1 2.59 9.68E-03 0.00 2

260

ENSG00000135213 POM121C -2.59 9.72E-03 0.05 5 ENSG00000121289 CEP89 2.58 9.74E-03 0.06 26 ENSG00000159166 LAD1 -2.58 9.75E-03 0.02 2 ENSG00000198862 LTN1 -2.58 9.83E-03 0.01 4 ENSG00000121895 TMEM156 -2.58 9.85E-03 0.10 38 ENSG00000269302 AC110771.1 -2.57 1.01E-02 0.00 2 ENSG00000112249 ASCC3 -2.57 1.02E-02 0.20 68 ENSG00000170776 AKAP13 2.57 1.02E-02 0.10 18 ENSG00000171045 TSNARE1 -2.56 1.06E-02 0.13 27 ENSG00000120647 CCDC77 2.54 1.12E-02 0.10 19 ENSG00000186868 MAPT -2.54 1.12E-02 0.06 21 ENSG00000169919 GUSB -2.54 1.12E-02 0.08 12 ENSG00000091640 SPAG7 -2.52 1.18E-02 0.08 6 ENSG00000196365 LONP1 2.51 1.20E-02 0.15 15 ENSG00000198150 AC135178.1 2.51 1.20E-02 0.04 12 ENSG00000171872 KLF17 2.51 1.20E-02 0.03 11 ENSG00000122367 LDB3 2.51 1.21E-02 0.02 6 This table provides the results of the MetaXcan gene based results for the males-only Eyes Test (Chapter 4). Only top 100 most significant results are provided.

261

Appendix Table 8: Gene based association for the Eyes Test - females (Chapter 4)

Ensembl ID Symbol Z P Variance Number of explained SNPs ENSG00000197165 SULT1A2 4.25 2.10E-05 0.03 14 ENSG00000162241 SLC25A45 4.19 2.83E-05 0.10 29 ENSG00000189099 PRSS48 4.12 3.87E-05 0.04 10 ENSG00000213658 LAT 3.97 7.31E-05 0.04 11 ENSG00000172500 FIBP 3.89 9.86E-05 0.08 22 ENSG00000244509 APOBEC3C -3.83 1.26E-04 0.08 24 ENSG00000176953 NFATC2IP 3.81 1.39E-04 0.01 9 ENSG00000222033 AC007405.2 -3.75 1.74E-04 0.13 42 ENSG00000167562 ZNF701 3.75 1.79E-04 0.00 1 ENSG00000196502 SULT1A1 -3.74 1.86E-04 0.07 33 ENSG00000117305 HMGCL 3.59 3.34E-04 0.06 15 ENSG00000123610 TNFAIP6 -3.56 3.75E-04 0.01 1 ENSG00000126106 TMEM53 -3.55 3.92E-04 0.11 25 ENSG00000189167 ZAR1L -3.53 4.14E-04 0.19 78 ENSG00000143815 LBR 3.48 4.98E-04 0.05 30 ENSG00000131015 ULBP2 3.25 1.15E-03 0.05 12 ENSG00000100380 ST13 3.24 1.18E-03 0.03 18 ENSG00000172803 SNX32 3.23 1.25E-03 0.24 20 ENSG00000162595 DIRAS3 3.16 1.58E-03 0.01 4 ENSG00000089902 RCOR1 3.13 1.72E-03 0.17 51 ENSG00000160999 SH2B2 3.11 1.85E-03 0.07 16 ENSG00000163395 IGFN1 -3.02 2.55E-03 0.03 4 ENSG00000146054 TRIM7 3.00 2.67E-03 0.02 11 ENSG00000142733 MAP3K6 -2.99 2.81E-03 0.01 3 ENSG00000158321 AUTS2 2.98 2.86E-03 0.00 3 ENSG00000151422 FER 2.98 2.90E-03 0.01 4 ENSG00000103051 COG4 2.96 3.08E-03 0.06 10 ENSG00000018510 AGPS 2.96 3.12E-03 0.04 31 ENSG00000214929 SPATA31D1 -2.94 3.24E-03 0.01 1 ENSG00000204511 MCCD1 2.94 3.27E-03 0.01 1 ENSG00000254997 KRTAP5-9 2.94 3.29E-03 0.14 19 ENSG00000106397 PLOD3 2.93 3.38E-03 0.07 37 ENSG00000175066 GK5 -2.92 3.48E-03 0.27 44 ENSG00000175215 CTDSP2 2.91 3.62E-03 0.05 10 ENSG00000073067 CYP2W1 -2.90 3.69E-03 0.04 14 ENSG00000142197 DOPEY2 -2.89 3.82E-03 0.05 8 ENSG00000203722 RAET1G 2.87 4.13E-03 0.40 35 ENSG00000171700 RGS19 -2.86 4.23E-03 0.23 7

262

ENSG00000176155 CCDC57 -2.86 4.28E-03 0.02 3 ENSG00000115155 OTOF 2.82 4.84E-03 0.20 37 ENSG00000170128 GPR25 2.82 4.85E-03 0.00 2 ENSG00000244274 DBNDD2 2.81 4.91E-03 0.05 42 ENSG00000137075 RNF38 -2.80 5.10E-03 0.10 17 ENSG00000205978 NYNRIN 2.78 5.41E-03 0.00 1 ENSG00000183513 COA5 2.77 5.63E-03 0.06 7 ENSG00000072778 ACADVL -2.77 5.67E-03 0.02 6 ENSG00000231500 RPS18 2.76 5.74E-03 0.00 1 ENSG00000117519 CNN3 2.75 5.99E-03 0.00 1 ENSG00000186452 TMPRSS12 -2.75 6.02E-03 0.05 17 ENSG00000005961 ITGA2B 2.75 6.04E-03 0.07 11 ENSG00000164818 HEATR2 2.74 6.11E-03 0.00 4 ENSG00000110046 ATG2A 2.73 6.30E-03 0.03 6 ENSG00000094880 CDC23 -2.73 6.37E-03 0.03 10 ENSG00000188729 OSTN 2.72 6.44E-03 0.08 50 ENSG00000204632 HLA-G 2.72 6.56E-03 0.41 65 ENSG00000149571 KIRREL3 -2.72 6.60E-03 0.00 4 ENSG00000142875 PRKACB 2.71 6.69E-03 0.02 19 ENSG00000147324 MFHAS1 2.71 6.74E-03 0.33 59 ENSG00000163517 HDAC11 -2.71 6.74E-03 0.04 13 ENSG00000172543 CTSW -2.71 6.75E-03 0.10 12 ENSG00000144647 GTDC2 2.70 6.97E-03 0.08 8 ENSG00000125630 POLR1B -2.70 7.01E-03 0.14 33 ENSG00000187672 ERC2 2.70 7.04E-03 0.05 16 ENSG00000105879 CBLL1 2.69 7.14E-03 0.02 2 ENSG00000065000 AP3D1 -2.69 7.22E-03 0.01 7 ENSG00000170906 NDUFA3 2.68 7.27E-03 0.02 4 ENSG00000153107 ANAPC1 2.68 7.46E-03 0.04 44 ENSG00000158161 EYA3 2.67 7.48E-03 0.08 23 ENSG00000128266 GNAZ 2.67 7.57E-03 0.02 3 ENSG00000170049 KCNAB3 2.67 7.59E-03 0.07 18 ENSG00000144504 ANKMY1 2.67 7.66E-03 0.06 14 ENSG00000226650 KIF4B -2.67 7.68E-03 0.00 1 ENSG00000131652 THOC6 -2.65 7.96E-03 0.06 5 ENSG00000204965 PCDHA5 2.64 8.25E-03 0.05 14 ENSG00000213614 HEXA 2.64 8.25E-03 0.04 27 ENSG00000162877 PM20D1 2.64 8.36E-03 0.08 29 ENSG00000181090 EHMT1 -2.63 8.62E-03 0.04 8 ENSG00000178809 TRIM73 -2.61 9.11E-03 0.02 5 ENSG00000186297 GABRA5 2.60 9.22E-03 0.00 1 ENSG00000064666 CNN2 2.60 9.39E-03 0.11 32 ENSG00000131023 LATS1 2.59 9.66E-03 0.00 3 ENSG00000256229 ZNF486 2.59 9.73E-03 0.00 1

263

ENSG00000168394 TAP1 2.58 9.90E-03 0.02 8 ENSG00000196821 C6orf106 2.58 9.99E-03 0.10 28 ENSG00000027001 MIPEP -2.57 1.02E-02 0.13 41 ENSG00000078902 TOLLIP 2.57 1.02E-02 0.00 6 ENSG00000120675 DNAJC15 -2.57 1.03E-02 0.54 55 ENSG00000081148 IMPG2 2.56 1.04E-02 0.16 45 ENSG00000135090 TAOK3 2.55 1.08E-02 0.02 10 ENSG00000162300 ZFPL1 2.54 1.12E-02 0.02 14 ENSG00000168092 PAFAH1B2 -2.53 1.14E-02 0.02 12 ENSG00000121900 TMEM54 -2.52 1.18E-02 0.00 1 ENSG00000154582 TCEB1 2.52 1.18E-02 0.05 33 ENSG00000147873 IFNA5 -2.52 1.19E-02 0.01 4 ENSG00000120519 SLC10A7 -2.50 1.23E-02 0.09 28 ENSG00000171119 NRTN -2.49 1.29E-02 0.00 2 ENSG00000188613 NANOS1 -2.49 1.29E-02 0.26 54 ENSG00000226397 C12orf77 2.48 1.31E-02 0.19 87 ENSG00000240891 PLCXD2 -2.48 1.31E-02 0.00 3 This table provides the results of the MetaXcan gene based results for the females-only Eyes Test (Chapter 4). Only top 100 most significant results are provided.

264

Appendix Table 9: List of sex-differentially expressed genes in the adult cortex (Chapter 4)

Ensembl ID Gene Name log2FD Ave Expr P FDR P

ENSG00000213318 RP11-331F4.1 2.46 1.58 1.97E-26 5.50E-23 ENSG00000022556 NLRP2 -1.22 1.38 7.84E-10 7.29E-07 ENSG00000101307 SIRPB1 -1.19 0.82 1.40E-07 0.000117 ENSG00000134184 GSTM1 -1.42 1.15 4.67E-07 0.000372 ENSG00000147813 NAPRT1 -0.65 2.18 5.16E-07 0.000392 ENSG00000163682 RPL9 1.13 2.81 3.31E-06 0.002066 ENSG00000196565 HBG2 -0.81 1.20 3.34E-06 0.002066 ENSG00000214078 CPNE1 0.54 3.96 3.75E-06 0.002242 ENSG00000175445 LPL -0.98 2.65 4.08E-06 0.002354 ENSG00000133636 NTS -1.03 1.51 4.30E-06 0.002395 ENSG00000176887 SOX11 -1.18 1.71 5.45E-06 0.002941 ENSG00000142583 SLC2A5 0.55 1.05 7.47E-06 0.003879 ENSG00000237550 RPL9P9 -0.61 0.83 7.66E-06 0.003879 ENSG00000255050 RP11-661A12.9 -0.62 1.53 8.06E-06 0.003964 ENSG00000140678 ITGAX 0.71 1.18 1.51E-05 0.007233 ENSG00000206341 HLA-H -0.68 1.17 2.59E-05 0.011868 ENSG00000112182 BACH2 -0.75 1.22 2.63E-05 0.011868 ENSG00000140557 ST8SIA2 -0.70 0.97 3.50E-05 0.01538 ENSG00000180229 HERC2P3 0.74 1.43 4.07E-05 0.016992 ENSG00000117724 CENPF -0.55 0.78 4.48E-05 0.017916 ENSG00000169221 TBC1D10B -0.41 4.04 4.50E-05 0.017916 ENSG00000232656 IDI2-AS1 0.53 3.20 4.85E-05 0.018788 ENSG00000196756 RP4-564F22.2 0.53 1.64 4.94E-05 0.018788 ENSG00000119147 C2ORF40 -0.80 1.26 5.36E-05 0.019922 ENSG00000255823 MTRNR2L8 -2.07 2.40 5.64E-05 0.020496 ENSG00000134871 COL4A2 -1.16 1.80 6.31E-05 0.021516 ENSG00000230614 AC073958.2 -1.27 2.20 7.68E-05 0.025682 ENSG00000188368 PRR19 -0.67 1.60 0.000105 0.033785 ENSG00000118271 TTR -1.34 1.80 0.000105 0.033785 ENSG00000051596 THOC3 -0.62 1.87 0.000115 0.036414 ENSG00000225630 MTND2P28 -3.04 4.00 0.000129 0.039656 ENSG00000157570 TSPAN18 -0.71 1.08 0.000131 0.039656 ENSG00000249119 MTND6P4 -2.15 6.53 0.000133 0.039656 ENSG00000258289 CHURC1 -0.42 4.30 0.000136 0.03978 ENSG00000232694 XX-CR54.3 0.83 1.86 0.00015 0.043131 ENSG00000187498 COL4A1 -0.92 1.44 0.000157 0.043848 ENSG00000141441 FAM59A 0.37 3.11 0.000157 0.043848 ENSG00000166833 NAV2 0.42 2.41 0.000187 0.051259

265

ENSG00000198502 HLA-DRB5 -0.81 1.81 0.000201 0.05431 ENSG00000100505 TRIM9 0.43 4.84 0.000225 0.05978 ENSG00000166313 APBB1 -0.25 6.39 0.00029 0.075082 ENSG00000183486 MX2 -0.45 1.15 0.000292 0.075082 ENSG00000160284 C21ORF56 -0.62 2.20 0.000302 0.076584 ENSG00000149243 KLHL35 -0.50 1.57 0.000329 0.082001 ENSG00000152518 ZFP36L2 0.77 3.02 0.000356 0.087456 ENSG00000170525 PFKFB3 0.33 4.44 0.000361 0.087456 ENSG00000182578 CSF1R 0.85 2.69 0.000383 0.091361 ENSG00000151413 NUBPL -0.36 1.89 0.000416 0.09685 ENSG00000167178 ISLR2 -0.87 3.34 0.000417 0.09685 ENSG00000256148 RP11-809N8.5 -1.54 2.40 0.000424 0.097003 ENSG00000183496 MEX3B -0.54 1.34 0.000435 0.098188 ENSG00000146063 TRIM41 -0.53 3.32 0.000453 0.100984 ENSG00000088325 TPX2 -0.55 1.52 0.000525 0.113926 ENSG00000135245 C7ORF68 -0.97 1.65 0.00054 0.1157 ENSG00000055609 MLL3 0.50 3.10 0.000547 0.1157 ENSG00000236762 RP11-159H3.1 -0.55 2.42 0.000677 0.13869 ENSG00000115828 QPCT -0.44 3.25 0.00068 0.13869 ENSG00000147862 NFIB 0.39 4.05 0.000696 0.140136 ENSG00000144426 NBEAL1 -0.95 1.83 0.000729 0.145016 ENSG00000134982 APC 0.47 6.01 0.000759 0.148081 ENSG00000189060 H1F0 -0.44 5.37 0.000762 0.148081 ENSG00000105223 PLD3 -0.42 7.19 0.000798 0.151527 ENSG00000173334 TRIB1 0.59 2.64 0.000816 0.153279 ENSG00000226686 AC012309.5 -0.65 1.13 0.000842 0.156457 ENSG00000144895 EIF2A -0.35 4.40 0.000874 0.160491 ENSG00000135333 EPHA7 -0.46 2.76 0.000938 0.170458 ENSG00000204287 HLA-DRA 0.87 3.95 0.001022 0.183777 ENSG00000019582 CD74 0.79 4.88 0.001052 0.184494 ENSG00000215482 CALM2P3 -0.63 5.20 0.001059 0.184494 ENSG00000179899 PHC1P1 0.45 1.95 0.001059 0.184494 ENSG00000133398 MED10 -0.49 4.47 0.001077 0.185607 ENSG00000232677 AC092296.1 -0.35 2.06 0.001099 0.187562 ENSG00000142892 PIGK -0.31 4.08 0.001167 0.196358 ENSG00000241720 RP4-735C1.4 0.53 3.02 0.001174 0.196358 ENSG00000128567 PODXL 0.44 3.92 0.001189 0.196756 ENSG00000112619 PRPH2 -0.54 2.11 0.001223 0.198008

This table lists the genes with sex differential expression in the cortex. Genes with log2 FC > 0 have male-specific expression, whereas genes with log2FC < 0 have female-specific expression.

266

Appendix Table 10: Results of the gene-based analyses for the Triangles Task (Chapter 5)

Ensembl ID Chr Start Stop P Symbol ENSG00000187634 1 850260 889955 0.35764 SAMD11 ENSG00000188976 1 869584 904689 0.39204 NOC2L ENSG00000187961 1 885967 911095 0.48572 KLHL17 ENSG00000187583 1 891877 921245 0.51894 PLEKHN1 ENSG00000187642 1 900579 927497 0.59299 C1orf170 ENSG00000188290 1 924342 945552 0.39159 HES4 ENSG00000187608 1 938803 959920 0.77176 ISG15 ENSG00000188157 1 945503 1001496 0.9669 AGRN ENSG00000237330 1 996346 1019687 0.38034 RNF223 ENSG00000131591 1 1007198 1061741 0.113 C1orf159 ENSG00000162571 1 1099264 1143315 0.76515 TTLL10 ENSG00000186891 1 1128888 1152071 0.70277 TNFRSF18 ENSG00000186827 1 1136706 1159518 0.37765 TNFRSF4 ENSG00000078808 1 1142288 1177411 0.33365 SDF4 ENSG00000176022 1 1157629 1180421 0.28987 B3GALT6 ENSG00000184163 1 1167826 1192102 0.31563 FAM132A ENSG00000160087 1 1179289 1219265 0.22983 UBE2J2 ENSG00000162572 1 1205816 1237409 0.10683 SCNN1D ENSG00000131584 1 1217756 1254989 0.039371 ACAP3 ENSG00000169972 1 1233947 1257057 0.10008 PUSL1 ENSG00000127054 1 1236965 1270071 0.23585 CPSF3L ENSG00000224051 1 1250136 1274277 0.67764 GLTPD1 ENSG00000169962 1 1256694 1280686 0.66621 TAS1R3 ENSG00000107404 1 1260656 1294730 0.72493 DVL1 ENSG00000162576 1 1278069 1307157 0.82876 MXRA8 ENSG00000175756 1 1299110 1320875 0.95096 AURKAIP1 ENSG00000221978 1 1311091 1344708 0.7426 CCNL2 ENSG00000242485 1 1327288 1352693 0.39025 MRPL20 ENSG00000235098 1 1343800 1367149 0.36686 ANKRD65 ENSG00000205116 1 1351508 1373167 0.36897 TMEM88B ENSG00000179403 1 1360241 1388262 0.19683 VWA1 ENSG00000215915 1 1375069 1415538 0.18033 ATAD3C ENSG00000160072 1 1397143 1443228 0.29932 ATAD3B ENSG00000197785 1 1437531 1480067 0.60105 ATAD3A ENSG00000205090 1 1460554 1485833 0.65085 TMEM240 ENSG00000160075 1 1467053 1520249 0.65539 SSU72 ENSG00000197530 1 1540795 1575990 0.39256 MIB2 ENSG00000248333 1 1560603 1600473 0.93932 CDK11B ENSG00000189339 1 1582939 1634167 0.80806 SLC35E2B

267

ENSG00000008128 1 1624169 1665766 0.24798 CDK11A ENSG00000215790 1 1646277 1687431 0.084516 SLC35E2 ENSG00000008130 1 1672671 1721896 0.85067 NADK ENSG00000078369 1 1706729 1832495 0.91742 GNB1 ENSG00000169885 1 1836266 1858735 0.71034 CALML6 ENSG00000178821 1 1839029 1860712 0.79929 TMEM52 ENSG00000142609 1 1843396 1945276 0.91886 C1orf222 ENSG00000187730 1 1940780 1972192 0.78137 GABRD ENSG00000067606 1 1971909 2126834 0.82562 PRKCZ ENSG00000162585 1 2105903 2154159 0.85064 C1orf86 ENSG00000157933 1 2150134 2251558 0.65989 SKI ENSG00000116151 1 2242692 2333146 0.78135 MORN1 ENSG00000157916 1 2313267 2346883 0.66386 RER1 ENSG00000157911 1 2326236 2355236 0.53644 PEX10 ENSG00000149527 1 2347419 2446969 0.58505 PLCH2 ENSG00000157881 1 2429972 2468039 0.51869 PANK4 ENSG00000197921 1 2450184 2471684 0.86225 HES5 ENSG00000157873 1 2477078 2506821 0.39262 TNFRSF14 ENSG00000157870 1 2507930 2532908 0.37128 FAM213B ENSG00000142606 1 2512078 2574481 0.52659 MMEL1 ENSG00000215912 1 2557415 2728286 0.1921 TTC34 ENSG00000169717 1 2928046 2949465 0.89692 ACTRT2 ENSG00000142611 1 2975732 3365185 0.75847 PRDM16 ENSG00000130762 1 3360990 3407677 0.034413 ARHGEF16 ENSG00000162591 1 3396484 3538059 0.50809 MEGF6 ENSG00000158109 1 3531566 3556691 0.62822 TPRG1L ENSG00000116213 1 3537331 3579325 0.72351 WRAP73 ENSG00000078900 1 3559084 3662765 0.91739 TP73 ENSG00000162592 1 3658962 3698209 0.3994 CCDC27 ENSG00000235169 1 3679352 3702546 0.20481 SMIM1 ENSG00000130764 1 3686784 3723068 0.20894 LRRC47 ENSG00000116198 1 3718645 3783778 0.43489 CEP104 ENSG00000169598 1 3763845 3811993 0.34156 DFFB ENSG00000198912 1 3795689 3826857 0.62702 C1orf174 ENSG00000196581 1 4704792 4862594 0.84181 AJAP1 ENSG00000131697 1 5912871 6062533 0.0059929 NPHP4 ENSG00000069424 1 6041526 6171253 0.0032564 KCNAB2 ENSG00000116254 1 6151853 6250183 0.92798 CHD5 ENSG00000116251 1 6231329 6279449 0.88273 RPL22 ENSG00000158286 1 6255535 6291359 0.34466 RNF207 ENSG00000116237 1 6271253 6306032 0.2586 ICMT ENSG00000173673 1 6294252 6315638 0.38068 HES3 ENSG00000158292 1 6297406 6331035 0.62562 GPR153 ENSG00000097021 1 6314329 6464451 0.9864 ACOT7

268

ENSG00000069812 1 6462478 6494730 0.80514 HES2 ENSG00000187017 1 6474848 6531430 0.70699 ESPN ENSG00000215788 1 6511211 6536255 0.68493 TNFRSF25 ENSG00000171680 1 6516152 6590121 0.90104 PLEKHG5 ENSG00000162408 1 6571407 6624595 0.72603 NOL9 ENSG00000173662 1 6605241 6649817 0.61443 TAS1R1 ENSG00000204859 1 6630061 6659340 0.38194 ZBTB48 ENSG00000162413 1 6640784 6684667 0.15614 KLHL21 ENSG00000116273 1 6663745 6694093 0.11587 PHF13 ENSG00000041988 1 6674926 6705646 0.080052 THAP3 ENSG00000007923 1 6684228 6771984 0.27375 DNAJC11 ENSG00000171735 1 6835384 7839766 0.36166 CAMTA1 ENSG00000049245 1 7821329 7851492 0.33359 VAMP3 ENSG00000049246 1 7834380 7915237 0.18988 PER3 ENSG00000049247 1 7893143 7923572 0.14657 UTS2 ENSG00000049249 1 7969907 8010926 0.46022 TNFRSF9 ENSG00000116288 1 8004351 8055565 0.45584 PARK7 ENSG00000116285 1 8054464 8096368 0.34349 ERRFI1 ENSG00000162426 1 8367886 8414227 0.95779 SLC45A1 ENSG00000142599 1 8402457 8887702 0.62749 RERE ENSG00000074800 1 8911061 8949308 0.28367 ENO1 This table provides the results of the gene based analysis for the Triangles Task (Chapter 5). The top 100 results are provided.

269

Appendix Table 11: Results of the pathway analyses for the Triangles Task (Chapter 5)

NGenes Beta SE P Full Bane 66 0.36 0.09 7.79E-05 Curated_gene_sets:kumar_autophagy_network 20 0.60 0.17 1.51E-04 GO_bp:go_mrna_cleavage 15 0.74 0.22 4.46E-04 GO_bp:go_arachidonic_acid_secretion 15 0.74 0.22 4.46E-04 GO_bp:go_arachidonate_transport 21 0.58 0.18 4.68E-04 GO_bp:go_epithelial_structure_maintenance 27 0.54 0.17 6.12E-04 Curated_gene_sets:reactome_adherens_junctions_interactions 55 0.38 0.12 7.96E-04 Curated_gene_sets:reactome_cell_cell_junction_organization 33 0.41 0.13 8.50E-04 GO_cc:go_immunological_synapse 13 0.64 0.20 9.41E-04 Curated_gene_sets:verrecchia_response_to_tgfb1_c4 139 0.22 0.07 1.30E-03 GO_bp:go_gene_silencing_by_rna 28 0.43 0.14 1.39E-03 GO_bp:go_pyrimidine_nucleoside_biosynthetic_process 53 0.36 0.12 1.42E-03 Curated_gene_sets:huper_breast_basal_vs_luminal_up 6 1.20 0.41 1.47E-03 Curated_gene_sets:smid_breast_cancer_normal_like_dn 60 0.33 0.11 1.76E-03 Curated_gene_sets:reactome_immunoregulatory_interactions_between_a_lymphoid_and_a_non_lymphoid_cell 15 0.67 0.23 1.92E-03 GO_bp:go_maintenance_of_gastrointestinal_epithelium 222 0.16 0.05 2.01E-03 Curated_gene_sets:winter_hypoxia_metagene 31 0.40 0.14 2.05E-03 Curated_gene_sets:reactome_glycosphingolipid_metabolism 50 0.33 0.12 2.19E-03 Curated_gene_sets:reactome_tcr_signaling 42 0.37 0.13 2.21E-03 GO_bp:go_long_chain_fatty_acid_transport 21 0.54 0.19 2.28E-03 GO_bp:go_icosanoid_transport 21 0.54 0.19 2.28E-03 GO_bp:go_fatty_acid_derivative_transport 26 0.49 0.17 2.28E-03 Curated_gene_sets:colin_pilocytic_astrocytoma_vs_glioblastoma_dn 16 0.55 0.19 2.37E-03 Curated_gene_sets:kegg_other_glycan_degradation 29 0.40 0.14 2.52E-03 Curated_gene_sets:worschech_tumor_evasion_and_tolerogenicity_up 232 0.15 0.05 2.65E-03 Curated_gene_sets:senese_hdac1_targets_dn 21 0.55 0.20 2.70E-03 Curated_gene_sets:west_adrenocortical_tumor_markers_up

270

10 0.66 0.24 2.73E-03 Curated_gene_sets:biocarta_lym_pathway 40 0.32 0.12 2.82E-03 GO_bp:go_mitotic_recombination 53 0.29 0.11 2.86E-03 Curated_gene_sets:browne_hcmv_infection_4hr_up 11 0.60 0.22 2.88E-03 GO_bp:go_acetyl_coa_biosynthetic_process 65 0.27 0.10 2.93E-03 Curated_gene_sets:reactome_antiviral_mechanism_by_ifn_stimulated_genes 167 0.18 0.06 2.96E-03 GO_cc:go_lamellipodium 53 0.52 0.19 3.31E-03 Curated_gene_sets:nikolsky_breast_cancer_16q24_amplicon 341 0.12 0.04 3.38E-03 GO_cc:go_cell_leading_edge 5 1.01 0.38 3.64E-03 Curated_gene_sets:okamoto_liver_cancer_multicentric_occurrence_dn 13 0.61 0.23 3.76E-03 GO_bp:go_pyrimidine_nucleoside_monophosphate_metabolic_process 55 0.30 0.11 3.83E-03 GO_bp:go_fatty_acid_transport 16 0.53 0.20 3.88E-03 Curated_gene_sets:reactome_pre_notch_processing_in_golgi 22 0.42 0.16 3.90E-03 Curated_gene_sets:pid_integrin_a9b1_pathway 200 0.16 0.06 3.98E-03 GO_bp:go_gene_silencing 114 0.19 0.07 4.02E-03 GO_bp:go_glycosyl_compound_biosynthetic_process 10 0.59 0.22 4.09E-03 Curated_gene_sets:reactome_regulation_of_pyruvate_dehydrogenase_pdh_complex 42 0.32 0.12 4.16E-03 GO_bp:go_response_to_electrical_stimulus 6 0.80 0.30 4.18E-03 Curated_gene_sets:mootha_pyr 89 0.23 0.09 4.27E-03 Curated_gene_sets:chicas_rb1_targets_low_serum 15 0.45 0.17 4.47E-03 GO_bp:go_neurotrophin_trk_receptor_signaling_pathway 14 0.54 0.21 4.58E-03 Curated_gene_sets:castellano_nras_targets_dn 45 0.33 0.13 4.67E-03 Curated_gene_sets:warters_ir_response_5gy 145 0.18 0.07 4.69E-03 GO_mf:go_mrna_binding 12 0.65 0.25 4.73E-03 Curated_gene_sets:biocarta_thelper_pathway 16 0.48 0.19 4.92E-03 Curated_gene_sets:ray_targets_of_p210_bcr_abl_fusion_dn 16 0.54 0.21 5.03E-03 GO_bp:go_notochord_development 51 0.28 0.11 5.13E-03 GO_cc:go_cell_division_site 48 0.33 0.13 5.14E-03 Curated_gene_sets:martinez_response_to_trabectedin 194 0.19 0.07 5.29E-03 GO_bp:go_cell_cell_adhesion_via_plasma_membrane_adhesion_molecules 28 0.38 0.15 5.52E-03 Curated_gene_sets:mueller_common_targets_of_aml_fusions_dn

271

105 0.19 0.07 5.82E-03 GO_bp:go_negative_regulation_of_protein_secretion 82 0.28 0.11 5.87E-03 Curated_gene_sets:reactome_rna_pol_i_transcription 19 0.42 0.17 5.87E-03 GO_bp:go_t_helper_1_type_immune_response 17 0.49 0.19 5.91E-03 Curated_gene_sets:zembutsu_sensitivity_to_vincristine 55 0.27 0.11 5.96E-03 GO_bp:go_epidermal_growth_factor_receptor_signaling_pathway 10 0.64 0.25 5.96E-03 GO_bp:go_negative_regulation_of_mast_cell_activation 12 0.62 0.25 6.11E-03 Curated_gene_sets:biocarta_tcytotoxic_pathway 86 0.20 0.08 6.17E-03 GO_mf:go_single_stranded_dna_binding 33 0.34 0.14 6.25E-03 Curated_gene_sets:wendt_cohesin_targets_up 256 0.13 0.05 6.34E-03 Curated_gene_sets:oswald_hematopoietic_stem_cell_in_collagen_gel_dn 18 0.48 0.19 6.50E-03 Curated_gene_sets:scheidereit_ikk_targets 13 0.50 0.20 6.51E-03 GO_cc:go_copi_vesicle_coat 24 0.35 0.14 6.57E-03 GO_bp:go_atp_synthesis_coupled_proton_transport 24 0.35 0.14 6.57E-03 GO_bp:go_energy_coupled_proton_transport_down_electrochemical_gradient 11 0.54 0.22 6.81E-03 GO_bp:go_regulation_of_acyl_coa_biosynthetic_process 10 0.63 0.26 6.86E-03 GO_bp:go_negative_regulation_of_leukocyte_degranulation 23 0.36 0.15 6.95E-03 GO_bp:go_neurotrophin_signaling_pathway 11 0.58 0.23 7.04E-03 GO_cc:go_rnai_effector_complex 13 0.46 0.19 7.08E-03 GO_bp:go_pre_mirna_processing 20 0.47 0.19 7.17E-03 Curated_gene_sets:reactome_tak1_activates_nfkb_by_phosphorylation_and_activation_of_ikks_complex 18 0.53 0.22 7.37E-03 GO_bp:go_quaternary_ammonium_group_transport 42 0.29 0.12 7.56E-03 GO_bp:go_negative_regulation_of_jak_stat_cascade 42 0.29 0.12 7.56E-03 GO_bp:go_negative_regulation_of_stat_cascade 6 0.81 0.33 7.60E-03 Curated_gene_sets:mcbryan_terminal_end_bud_dn 8 0.68 0.28 7.66E-03 Curated_gene_sets:yamanaka_glioblastoma_survival_dn 14 0.49 0.20 7.67E-03 GO_bp:go_cellular_response_to_electrical_stimulus 33 0.32 0.13 7.69E-03 Curated_gene_sets:dorn_adenovirus_infection_12hr_dn 10 0.56 0.23 7.82E-03 Curated_gene_sets:biocarta_monocyte_pathway 21 0.42 0.17 7.84E-03 Curated_gene_sets:mootha_ffa_oxydation 17 0.45 0.19 7.90E-03 GO_mf:go_nucleotide_transmembrane_transporter_activity

272

13 0.59 0.24 7.90E-03 GO_mf:go_c_acyltransferase_activity 25 0.38 0.16 7.96E-03 Curated_gene_sets:moreira_response_to_tsa_up 91 0.21 0.09 7.97E-03 Curated_gene_sets:bonci_targets_of_mir15a_and_mir16_1 129 0.17 0.07 8.11E-03 GO_bp:go_negative_regulation_of_cellular_amide_metabolic_process 80 0.21 0.09 8.11E-03 Curated_gene_sets:keshelava_multiple_drug_resistance 34 0.35 0.15 8.24E-03 Curated_gene_sets:lien_breast_carcinoma_metaplastic 46 0.28 0.12 8.25E-03 Curated_gene_sets:faelt_b_cll_with_vh_rearrangements_up 92 0.21 0.09 8.33E-03 Curated_gene_sets:kegg_pyrimidine_metabolism 76 0.22 0.09 8.35E-03 Curated_gene_sets:warters_response_to_ir_skin 34 0.34 0.14 8.37E-03 GO_bp:go_trna_transport 67 0.24 0.10 8.39E-03 Curated_gene_sets:basso_cd40_signaling_dn 20 0.48 0.20 8.39E-03 Curated_gene_sets:gavin_pde3b_targets 50 0.25 0.11 8.45E-03 GO_bp:go_ribonucleoside_triphosphate_biosynthetic_process 69 0.26 0.11 8.46E-03 Curated_gene_sets:reactome_meiotic_synapsis This study provides the results of the pathway analyses for the Triangles Task (Chapter 5). Only the top 100 most significant results are provided.

273

Appendix Table 12: Results of the eQTL analyses for social relationship satisfaction (Chapter 6)

UNIQUE SNP Database Tissue Ensembl Gene ID P-value FDR_ P Chr BP Gene ID (Hg_19) symbol 6:30207495:A:C BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30207495 HLA-F 6:30207929:A:T BRAINEAC CRBL ENSG00000204642 0.000577 0.005985 6 30207929 HLA-F 6:30208843:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30208843 HLA-F 6:30209918:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30209918 HLA-F 6:30210550:A:C BRAINEAC CRBL ENSG00000204642 0.000836 0.007922 6 30210550 HLA-F 6:30213540:A:T BRAINEAC CRBL ENSG00000204642 0.000845 0.007922 6 30213540 HLA-F 6:30214250:C:G BRAINEAC CRBL ENSG00000204642 0.000831 0.007904 6 30214250 HLA-F 6:30223428:C:T BRAINEAC CRBL ENSG00000204642 0.00451 0.029198 6 30223428 HLA-F 6:30225273:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30225273 HLA-F 6:30226123:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30226123 HLA-F 6:30226305:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30226305 HLA-F 6:30227425:C:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30227425 HLA-F 6:30227527:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30227527 HLA-F 6:30227912:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30227912 HLA-F 6:30227913:G:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30227913 HLA-F 6:30227915:A:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30227915 HLA-F 6:30229989:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30229989 HLA-F 6:30230760:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30230760 HLA-F 6:30230777:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30230777 HLA-F 6:30230924:C:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30230924 HLA-F 6:30230930:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30230930 HLA-F 6:30231224:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231224 HLA-F 6:30231273:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231273 HLA-F 6:30231330:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231330 HLA-F 6:30231587:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231587 HLA-F

274

6:30231594:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231594 HLA-F 6:30231602:C:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231602 HLA-F 6:30231666:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231666 HLA-F 6:30231755:C:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231755 HLA-F 6:30231768:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231768 HLA-F 6:30231829:C:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30231829 HLA-F 6:30232009:G:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30232009 HLA-F 6:30232250:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30232250 HLA-F 6:30232436:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30232436 HLA-F 6:30232672:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30232672 HLA-F 6:30232785:A:C BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30232785 HLA-F 6:30232953:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30232953 HLA-F 6:30233513:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30233513 HLA-F 6:30233558:A:C BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30233558 HLA-F 6:30233739:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30233739 HLA-F 6:30233876:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30233876 HLA-F 6:30234152:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234152 HLA-F 6:30234627:A:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234627 HLA-F 6:30234657:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234657 HLA-F 6:30234700:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234700 HLA-F 6:30234721:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234721 HLA-F 6:30234931:C:T BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234931 HLA-F 6:30234934:A:C BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234934 HLA-F 6:30234945:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30234945 HLA-F 6:30235204:A:G BRAINEAC CRBL ENSG00000204642 0.000801 0.007663 6 30235204 HLA-F 6:30236038:G:T BRAINEAC CRBL ENSG00000204642 0.000784 0.007552 6 30236038 HLA-F 6:30306306:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30306306 HLA-F 6:30307342:C:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30307342 HLA-F 6:30312176:A:G BRAINEAC CRBL ENSG00000204642 0.000846 0.007922 6 30312176 HLA-F 6:30163955:A:G BRAINEAC WHMT ENSG00000234127 1.37E-07 0.000302 6 30163955 TRIM26

275

6:30165273:C:T BRAINEAC WHMT ENSG00000234127 1.37E-07 0.000302 6 30165273 TRIM26 6:30166266:C:T BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30166266 TRIM26 6:30167835:C:T BRAINEAC WHMT ENSG00000234127 1.31E-07 0.000297 6 30167835 TRIM26 6:30169327:A:G BRAINEAC WHMT ENSG00000234127 1.31E-07 0.000297 6 30169327 TRIM26 6:30169475:C:T BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30169475 TRIM26 6:30170280:A:G BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30170280 TRIM26 6:30170510:C:T BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30170510 TRIM26 6:30170525:C:T BRAINEAC WHMT ENSG00000234127 1.36E-07 0.000302 6 30170525 TRIM26 6:30170906:G:T BRAINEAC WHMT ENSG00000234127 1.22E-07 0.000281 6 30170906 TRIM26 6:30170970:A:G BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30170970 TRIM26 6:30171369:G:T BRAINEAC WHMT ENSG00000234127 9.39E-08 0.00022 6 30171369 TRIM26 6:30171417:C:T BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30171417 TRIM26 6:30171827:C:T BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30171827 TRIM26 6:30172513:A:C BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30172513 TRIM26 6:30173330:A:G BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30173330 TRIM26 6:30174131:C:T BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30174131 TRIM26 6:30174633:A:C BRAINEAC WHMT ENSG00000234127 8.72E-08 0.000206 6 30174633 TRIM26 6:30207929:A:T BRAINEAC WHMT ENSG00000234127 2.16E-05 0.01506 6 30207929 TRIM26 6:30223428:C:T BRAINEAC WHMT ENSG00000234127 2.89E-05 0.017485 6 30223428 TRIM26 6:30166266:C:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.57E-07 0.020661 6 30166266 TRIM26 6:30169475:C:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.55E-07 0.020661 6 30169475 TRIM26 6:30170280:A:G GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.55E-07 0.020661 6 30170280 TRIM26 6:30170510:C:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 1.30E-06 0.020661 6 30170510 TRIM26 6:30170906:G:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.54E-07 0.020661 6 30170906 TRIM26 6:30170970:A:G GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.55E-07 0.020661 6 30170970 TRIM26 6:30171369:G:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 9.32E-07 0.020661 6 30171369 TRIM26 6:30171827:C:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 1.51E-06 0.020661 6 30171827 TRIM26 6:30174131:C:T GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.55E-07 0.020661 6 30174131 TRIM26 6:30174633:A:C GTEx Brain_Caudate_basal_ganglia ENSG00000234127 8.55E-07 0.020661 6 30174633 TRIM26 This table provides the results of the eQTL analyses for the two social relationship satisfaction GWAS (Chapter 6).

276

Appendix Table 13: Results of the chromatin interactions for social relationship satisfaction (Chapter 6)

Region1 Region 2 FDR corrected P Tissue/cell Ensembl Gene ID 3:175960001-176000000 3:176000001-176040000 9.37E-13 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175960001-176000000 9.37E-13 Neural_Progenitor_Cell NA 3:175920001-175960000 3:175960001-176000000 3.14E-10 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175920001-175960000 3.14E-10 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174240001-174280000 1.87E-06 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174480001-174520000 2.18E-06 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174920001-174960000 2.42E-06 Neural_Progenitor_Cell NA 3:175960001-176000000 3:178080001-178120000 6.98E-06 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174960001-175000000 1.33E-05 Neural_Progenitor_Cell NA 3:176000001-176040000 3:177800001-177840000 1.54E-05 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174800001-174840000 1.70E-05 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173760001-173800000 2.41E-05 Neural_Progenitor_Cell NA 3:175960001-176000000 3:176080001-176120000 2.67E-05 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175840001-175880000 2.67E-05 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174120001-174160000 3.38E-05 Neural_Progenitor_Cell ENSG00000177694 3:176000001-176040000 3:176040001-176080000 7.21E-05 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173080001-173120000 7.64E-05 Neural_Progenitor_Cell ENSG00000169760 3:175960001-176000000 3:174040001-174080000 9.98E-05 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175560001-175600000 0.000137 Neural_Progenitor_Cell NA 3:176000001-176040000 3:178080001-178120000 0.000151 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173880001-173920000 0.000151 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174120001-174160000 0.000166 Neural_Progenitor_Cell ENSG00000177694 3:175920001-175960000 3:175840001-175880000 0.000178 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174320001-174360000 0.000197 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174800001-174840000 0.000244 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174520001-174560000 0.000302 Neural_Progenitor_Cell NA

277

3:175920001-175960000 3:174840001-174880000 0.000371 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174560001-174600000 0.000434 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174280001-174320000 0.000491 Neural_Progenitor_Cell NA 3:175960001-176000000 3:176040001-176080000 0.000497 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175160001-175200000 0.000524 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174840001-174880000 0.000678 Neural_Progenitor_Cell NA 3:175960001-176000000 3:177960001-178000000 0.000807 Dorsolateral_Prefrontal_Cortex ENSG00000197584 3:175960001-176000000 3:173960001-174000000 0.000833 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174800001-174840000 0.000876 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175600001-175640000 0.00088 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174200001-174240000 0.000931 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175080001-175120000 0.001184 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175440001-175480000 0.001184 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173760001-173800000 0.001188 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173840001-173880000 0.001374 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174880001-174920000 0.001451 Neural_Progenitor_Cell NA 3:175960001-176000000 3:172960001-173000000 0.001459 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174440001-174480000 0.001488 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174040001-174080000 0.00152 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173040001-173080000 0.001665 Neural_Progenitor_Cell NA 6:30160001-30200000 6:30280001-30320000 0.00175 Dorsolateral_Prefrontal_Cortex ENSG0000020459: ENSG00000248167: ENSG00000241370 6:30280001-30320000 6:30160001-30200000 0.00175 Dorsolateral_Prefrontal_Cortex ENSG00000234127 3:176000001-176040000 3:173720001-173760000 0.002044 Hippocampus NA 3:175960001-176000000 3:173920001-173960000 0.002312 Neural_Progenitor_Cell NA 3:175960001-176000000 3:178320001-178360000 0.002325 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174720001-174760000 0.002335 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175440001-175480000 0.002659 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175840001-175880000 0.002744 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173120001-173160000 0.002783 Hippocampus NA

278

3:175920001-175960000 3:174640001-174680000 0.003165 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173960001-174000000 0.003234 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174000001-174040000 0.003234 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173720001-173760000 0.003441 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174080001-174120000 0.003905 Neural_Progenitor_Cell NA 3:175920001-175960000 3:178120001-178160000 0.003963 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174440001-174480000 0.004214 Dorsolateral_Prefrontal_Cortex NA 3:176000001-176040000 3:174640001-174680000 0.004591 Neural_Progenitor_Cell NA 3:175920001-175960000 3:177800001-177840000 0.004712 Neural_Progenitor_Cell NA 3:176000001-176040000 3:178120001-178160000 0.005301 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175480001-175520000 0.005354 Neural_Progenitor_Cell NA 3:175920001-175960000 3:172920001-172960000 0.005519 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174760001-174800000 0.005792 Neural_Progenitor_Cell NA 3:175920001-175960000 3:175440001-175480000 0.006026 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174640001-174680000 0.006057 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174680001-174720000 0.006057 Neural_Progenitor_Cell NA 3:176000001-176040000 3:178440001-178480000 0.006064 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173120001-173160000 0.00619 Neural_Progenitor_Cell NA 6:30280001-30320000 6:30680001-30720000 0.006545 Hippocampus ENSG00000137337: ENSG00000196230: ENSG00000137312: ENSG00000137331 3:175920001-175960000 3:178320001-178360000 0.00678 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174560001-174600000 0.006943 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173080001-173120000 0.00696 Neural_Progenitor_Cell ENSG00000169760 3:175920001-175960000 3:177960001-178000000 0.007076 Neural_Progenitor_Cell ENSG00000197584 3:176000001-176040000 3:178040001-178080000 0.007076 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174680001-174720000 0.007932 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174720001-174760000 0.007932 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174160001-174200000 0.008275 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175240001-175280000 0.010099 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174720001-174760000 0.010102 Dorsolateral_Prefrontal_Cortex NA

279

3:175920001-175960000 3:174680001-174720000 0.01043 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174720001-174760000 0.01043 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173240001-173280000 0.01048 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173320001-173360000 0.01048 Neural_Progenitor_Cell NA 6:30280001-30320000 6:30000001-30040000 0.010747 Hippocampus ENSG00000066379:ENSG00000204619 3:176000001-176040000 3:174480001-174520000 0.01092 Neural_Progenitor_Cell NA 3:176000001-176040000 3:177920001-177960000 0.01122 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175480001-175520000 0.01122 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174560001-174600000 0.011258 Neural_Progenitor_Cell NA 3:175960001-176000000 3:178160001-178200000 0.012049 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173720001-173760000 0.012049 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175520001-175560000 0.013018 Neural_Progenitor_Cell NA 3:175920001-175960000 3:178080001-178120000 0.013686 Neural_Progenitor_Cell NA 3:175960001-176000000 3:178120001-178160000 0.013686 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174480001-174520000 0.013686 Neural_Progenitor_Cell NA 3:176000001-176040000 3:178040001-178080000 0.013701 Dorsolateral_Prefrontal_Cortex NA 6:30280001-30320000 6:27440001-27480000 0.013827 Hippocampus ENSG00000096654 3:175960001-176000000 3:178520001-178560000 0.013994 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173440001-173480000 0.013994 Neural_Progenitor_Cell NA 3:175960001-176000000 3:175640001-175680000 0.014065 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173880001-173920000 0.015574 Neural_Progenitor_Cell NA 3:175920001-175960000 3:178400001-178440000 0.017117 Neural_Progenitor_Cell NA 3:175960001-176000000 3:177920001-177960000 0.01775 Hippocampus NA 3:176000001-176040000 3:173920001-173960000 0.017885 Neural_Progenitor_Cell NA 3:175920001-175960000 3:175640001-175680000 0.018625 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174160001-174200000 0.018625 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174960001-175000000 0.018625 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173040001-173080000 0.018734 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175400001-175440000 0.019436 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173320001-173360000 0.01953 Dorsolateral_Prefrontal_Cortex NA

280

3:176000001-176040000 3:173960001-174000000 0.019984 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175520001-175560000 0.02041 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173400001-173440000 0.021168 Dorsolateral_Prefrontal_Cortex NA 6:30280001-30320000 6:28840001-28880000 0.022689 Hippocampus NA 3:176000001-176040000 3:174880001-174920000 0.022922 Neural_Progenitor_Cell NA 3:175960001-176000000 3:174120001-174160000 0.023082 Hippocampus ENSG00000177694 3:175920001-175960000 3:174920001-174960000 0.023951 Neural_Progenitor_Cell NA 3:175920001-175960000 3:173600001-173640000 0.024817 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174200001-174240000 0.024946 Hippocampus NA 3:175920001-175960000 3:173480001-173520000 0.025485 Dorsolateral_Prefrontal_Cortex NA 3:175960001-176000000 3:177680001-177720000 0.025612 Neural_Progenitor_Cell NA 3:175960001-176000000 3:177480001-177520000 0.026572 Neural_Progenitor_Cell NA 3:175960001-176000000 3:178240001-178280000 0.027234 Neural_Progenitor_Cell NA 3:175960001-176000000 3:177880001-177920000 0.030132 Neural_Progenitor_Cell NA 3:175960001-176000000 3:177640001-177680000 0.031198 Dorsolateral_Prefrontal_Cortex NA 3:175920001-175960000 3:176400001-176440000 0.036027 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174360001-174400000 0.036093 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173760001-173800000 0.036236 Dorsolateral_Prefrontal_Cortex NA 3:176000001-176040000 3:178600001-178640000 0.037838 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174760001-174800000 0.038129 Neural_Progenitor_Cell NA 3:175960001-176000000 3:176040001-176080000 0.039059 Hippocampus NA 3:175920001-175960000 3:175160001-175200000 0.039108 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175560001-175600000 0.039515 Neural_Progenitor_Cell NA 3:175920001-175960000 3:178480001-178520000 0.040672 Neural_Progenitor_Cell NA 3:176000001-176040000 3:175080001-175120000 0.040923 Neural_Progenitor_Cell NA 3:175920001-175960000 3:178040001-178080000 0.041314 Neural_Progenitor_Cell NA 3:175920001-175960000 3:174320001-174360000 0.042682 Neural_Progenitor_Cell NA 3:176000001-176040000 3:174400001-174440000 0.042682 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173440001-173480000 0.044419 Neural_Progenitor_Cell NA 3:176000001-176040000 3:173480001-173520000 0.044419 Neural_Progenitor_Cell NA

281

3:175920001-175960000 3:175520001-175560000 0.045595 Neural_Progenitor_Cell NA 3:175960001-176000000 3:173880001-173920000 0.049824 Hippocampus NA This table provides the results of the chromatin interactions for the two social relationship satisfaction GWAS (Chapter 6).

282

Appendix Table 14: Gene based analyses for family relationship satisfaction (Chapter 6)

Ensembl ID Chr Start Stop P Symbol ENSG00000164054 3 48499197 48552259 3.26E-08 SHISA5 ENSG00000241370 6 30302908 30324661 7.76E-08 RPP21 ENSG00000048471 16 12060594 12678146 1.25E-07 SNX29 ENSG00000204687 6 29444474 29465738 2.18E-07 MAS1L ENSG00000213689 3 48496445 48519044 2.58E-07 TREX1 ENSG00000164051 3 48463574 48491866 4.02E-07 CCDC51 ENSG00000234127 6 30142232 30191204 4.17E-07 TRIM26 ENSG00000232112 3 48471667 48495616 6.26E-07 TMA7 ENSG00000196628 18 52879562 53342018 6.62E-07 TCF4 ENSG00000164053 3 48478114 48517115 6.78E-07 ATRIP ENSG00000164050 3 48435261 48481594 1.05E-06 PLXNB1 ENSG00000124228 20 47825884 47870614 1.50E-06 DDX27 ENSG00000124214 20 47719878 47814904 1.83E-06 STAU1 ENSG00000164049 3 48403709 48452666 2.85E-06 FBXW12 ENSG00000124201 20 47844483 47904963 3.03E-06 ZNFX1 ENSG00000216490 19 18273972 18298927 3.32E-06 IFI30 ENSG00000122482 1 91370859 91497829 3.71E-06 ZNF644 ENSG00000175806 8 9901778 10296401 4.64E-06 MSRA ENSG00000171160 10 99364310 99403344 5.14E-06 MORN4 ENSG00000121903 1 33928246 33972107 7.46E-06 ZSCAN20 ENSG00000204713 6 28860779 28901766 9.44E-06 TRIM27 ENSG00000004455 1 33463585 33556597 1.00E-05 AK2 ENSG00000183273 12 119762517 119988852 1.23E-05 CCDC60 ENSG00000187555 16 8975951 9068371 1.30E-05 USP7 ENSG00000089775 14 64905824 64981931 1.31E-05 ZBTB25 ENSG00000124207 20 47652849 47723489 1.39E-05 CSE1L ENSG00000131386 3 16206156 16283499 1.64E-05 GALNT15 ENSG00000104320 8 90935564 91025456 1.65E-05 NBN ENSG00000104325 8 91003633 91074320 1.90E-05 DECR1 ENSG00000129055 3 134186548 134215558 2.15E-05 ANAPC13 ENSG00000126804 14 64960430 65010408 2.29E-05 ZBTB1 ENSG00000174611 3 134311980 134380478 2.49E-05 KY ENSG00000268173 19 18253968 18298927 2.57E-05 PIK3R2 ENSG00000163714 3 142673339 142789567 2.66E-05 U2SURP ENSG00000182923 3 134194585 134303859 3.12E-05 CEP63 ENSG00000179841 14 64922217 64951221 3.27E-05 AKAP5 ENSG00000112367 6 110002499 110156631 3.56E-05 FIG4 ENSG00000121989 2 148592086 148698393 3.90E-05 ACVR2A ENSG00000092421 5 115769312 115920630 4.06E-05 SEMA6A

283

ENSG00000204406 2 148768580 149285805 4.09E-05 MBD5 ENSG00000204704 6 29001990 29023017 4.38E-05 OR2W1 ENSG00000168298 6 26146559 26167343 5.34E-05 HIST1H1E ENSG00000204709 6 28901654 28922314 5.49E-05 C6orf100 ENSG00000204702 6 29058386 29079658 5.83E-05 OR2J1 ENSG00000187068 3 184785838 184880802 5.84E-05 C3orf70 ENSG00000136731 2 128838774 128963251 5.89E-05 UGGT1 ENSG00000197912 16 89547325 89634176 5.95E-05 SPG7 ENSG00000185250 6 109701418 109772374 6.19E-05 PPIL6 ENSG00000213886 6 29513292 29537702 6.20E-05 UBD ENSG00000248167 6 30287359 30324631 6.92E-05 TRIM39- RPP21 ENSG00000198366 6 26010718 26031186 7.28E-05 HIST1H3A ENSG00000135596 6 109755265 109797171 8.26E-05 MICAL1 ENSG00000135587 6 109751966 109775122 8.28E-05 SMPD2 ENSG00000171560 4 155494278 155521918 8.74E-05 FGA ENSG00000164946 9 14724664 14920993 9.11E-05 FREM1 ENSG00000115947 2 148677968 148789147 9.53E-05 ORC4 ENSG00000158636 11 76145967 76274069 9.62E-05 C11orf30 ENSG00000142920 1 33536705 33596131 9.79E-05 ADC ENSG00000254858 19 18293992 18317758 0.000102 MPV17L2 ENSG00000196176 6 26011907 26032278 0.000103 HIST1H4A ENSG00000234745 6 31311649 31334965 0.000105 HLA-B ENSG00000204701 6 29069668 29090603 0.000107 OR2J3 ENSG00000196653 3 44744135 44775323 0.000107 ZNF502 ENSG00000124568 6 25773125 25842287 0.000107 SLC17A1 ENSG00000124198 20 47528427 47663230 0.000109 ARFGEF2 ENSG00000250317 4 25853452 25941435 0.000116 SMIM20 ENSG00000188921 9 20985306 21041635 0.000121 PTPLAD2 ENSG00000155085 6 109804059 110022420 0.000124 AK9 ENSG00000105647 19 18253928 18291350 0.000126 PIK3R2 ENSG00000135535 6 109677717 109713762 0.000128 CD164 ENSG00000005810 13 77608792 77911185 0.000132 MYCBP2 ENSG00000107859 10 103979943 104011231 0.000138 PITX3 ENSG00000100344 22 44309619 44370368 0.000143 PNPLA3 ENSG00000170264 2 62041989 62091278 0.000145 FAM161A ENSG00000009307 1 115249534 115311297 0.000152 CSDE1 ENSG00000158941 8 22452145 22489027 0.000152 CCAR2 ENSG00000127804 17 2298856 2425185 0.000155 METTL16 ENSG00000005483 7 104644626 104764808 0.000157 KMT2E ENSG00000156931 3 184519931 184780402 0.00016 VPS8 ENSG00000158373 6 26148349 26181577 0.000161 HIST1H2BD ENSG00000147439 8 22467931 22536661 0.000182 BIN3 ENSG00000105649 19 18297594 18324884 0.000183 RAB3A ENSG00000198156 16 28343876 28384829 0.000191 NPIPB6

284

ENSG00000120314 5 140034261 140063709 0.000196 WDR55 ENSG00000241852 8 22447114 22471663 0.000197 C8orf58 ENSG00000170445 5 140042758 140081609 0.000199 HARS ENSG00000188641 1 97533299 98396605 0.000202 DPYD ENSG00000005955 17 34890737 34956278 0.000205 GGNBP2 ENSG00000009724 1 11076580 11117290 0.000205 MASP2 ENSG00000035499 5 59882739 60006017 0.00021 DEPDC1B ENSG00000157540 21 38728092 38899753 0.00021 DYRK1A ENSG00000249967 10 99334131 99443667 0.000213 PI4K2A ENSG00000124610 6 26007260 26028040 0.000214 HIST1H1A ENSG00000155252 10 99334131 99446191 0.000217 PI4K2A ENSG00000146039 6 25744927 25791419 0.000218 SLC17A4 ENSG00000171435 12 117880817 118416788 0.000222 KSR2 ENSG00000081189 5 88003975 88209922 0.000223 MEF2C ENSG00000102452 13 101696130 102078843 0.000226 NALCN ENSG00000112365 6 109773797 109814440 0.000232 ZBTB24 ENSG00000142273 19 45271126 45313891 0.000234 CBLC This table lists the results of the MAGMA-based results from the family relationship satisfaction GWAS (Chapter 6). Significant results have been italicized.

285

Appendix Table 15: Gene based analyses for friendship satisfaction (Chapter 6)

Ensembl ID Chr Start Stop P Symbol ENSG00000164054 3 48499197 48552259 7.96E-08 SHISA5 ENSG00000204687 6 29444474 29465738 2.50E-07 MAS1L ENSG00000241370 6 30302908 30324661 3.08E-07 RPP21 ENSG00000213689 3 48496445 48519044 7.16E-07 TREX1 ENSG00000171160 10 99364310 99403344 8.44E-07 MORN4 ENSG00000164051 3 48463574 48491866 1.33E-06 CCDC51 ENSG00000048471 16 12060594 12678146 1.59E-06 SNX29 ENSG00000196628 18 52879562 53342018 1.79E-06 TCF4 ENSG00000124228 20 47825884 47870614 1.81E-06 DDX27 ENSG00000164053 3 48478114 48517115 1.85E-06 ATRIP ENSG00000232112 3 48471667 48495616 1.96E-06 TMA7 ENSG00000234127 6 30142232 30191204 2.19E-06 TRIM26 ENSG00000204713 6 28860779 28901766 2.70E-06 TRIM27 ENSG00000164050 3 48435261 48481594 3.20E-06 PLXNB1 ENSG00000124214 20 47719878 47814904 3.36E-06 STAU1 ENSG00000175806 8 9901778 10296401 3.80E-06 MSRA ENSG00000124201 20 47844483 47904963 4.11E-06 ZNFX1 ENSG00000164049 3 48403709 48452666 6.66E-06 FBXW12 ENSG00000216490 19 18273972 18298927 8.46E-06 IFI30 ENSG00000131386 3 16206156 16283499 1.15E-05 GALNT15 ENSG00000197912 16 89547325 89634176 1.23E-05 SPG7 ENSG00000204702 6 29058386 29079658 1.34E-05 OR2J1 ENSG00000104325 8 91003633 91074320 1.34E-05 DECR1 ENSG00000187555 16 8975951 9068371 1.42E-05 USP7 ENSG00000104320 8 90935564 91025456 1.66E-05 NBN ENSG00000237515 16 12985477 13344272 1.76E-05 SHISA9 ENSG00000121903 1 33928246 33972107 1.82E-05 ZSCAN20 ENSG00000122482 1 91370859 91497829 1.87E-05 ZNF644 ENSG00000124207 20 47652849 47723489 1.95E-05 CSE1L ENSG00000183273 12 1.2E+08 1.2E+08 2.18E-05 CCDC60 ENSG00000204709 6 28901654 28922314 2.23E-05 C6orf100 ENSG00000204406 2 1.49E+08 1.49E+08 2.60E-05 MBD5 ENSG00000089775 14 64905824 64981931 2.63E-05 ZBTB25 ENSG00000204701 6 29069668 29090603 2.67E-05 OR2J3 ENSG00000213886 6 29513292 29537702 2.73E-05 UBD ENSG00000166483 11 9585228 9625004 2.81E-05 WEE1 ENSG00000204704 6 29001990 29023017 2.84E-05 OR2W1 ENSG00000121989 2 1.49E+08 1.49E+08 3.00E-05 ACVR2A ENSG00000120314 5 1.4E+08 1.4E+08 3.14E-05 WDR55

286

ENSG00000129055 3 1.34E+08 1.34E+08 3.28E-05 ANAPC13 ENSG00000170445 5 1.4E+08 1.4E+08 3.30E-05 HARS ENSG00000112367 6 1.1E+08 1.1E+08 3.78E-05 FIG4 ENSG00000256453 5 1.4E+08 1.4E+08 4.24E-05 DND1 ENSG00000268173 19 18253968 18298927 4.33E-05 PIK3R2 ENSG00000081189 5 88003975 88209922 4.60E-05 MEF2C ENSG00000096996 19 18159805 18219754 4.63E-05 IL12RB1 ENSG00000146007 5 1.4E+08 1.4E+08 4.70E-05 ZMAT2 ENSG00000174611 3 1.34E+08 1.34E+08 4.73E-05 KY ENSG00000182923 3 1.34E+08 1.34E+08 4.93E-05 CEP63 ENSG00000131495 5 1.4E+08 1.4E+08 5.27E-05 NDUFA2 ENSG00000126804 14 64960430 65010408 5.35E-05 ZBTB1 ENSG00000112855 5 1.4E+08 1.4E+08 5.35E-05 HARS2 ENSG00000113119 5 1.4E+08 1.4E+08 5.41E-05 TMCO6 ENSG00000155252 10 99334131 99446191 5.49E-05 PI4K2A ENSG00000249967 10 99334131 99443667 5.51E-05 PI4K2A ENSG00000185250 6 1.1E+08 1.1E+08 5.54E-05 PPIL6 ENSG00000115947 2 1.49E+08 1.49E+08 6.22E-05 ORC4 ENSG00000155886 9 19497450 19796926 6.36E-05 SLC24A2 ENSG00000092421 5 1.16E+08 1.16E+08 6.67E-05 SEMA6A ENSG00000179841 14 64922217 64951221 6.68E-05 AKAP5 ENSG00000113141 5 1.4E+08 1.4E+08 6.77E-05 IK ENSG00000151090 3 24148651 24546773 6.79E-05 THRB ENSG00000107859 10 1.04E+08 1.04E+08 6.82E-05 PITX3 ENSG00000009724 1 11076580 11117290 7.17E-05 MASP2 ENSG00000241935 10 99334080 99382559 7.27E-05 HOGA1 ENSG00000005810 13 77608792 77911185 7.50E-05 MYCBP2 ENSG00000175175 17 56823230 57068983 7.65E-05 PPM1E ENSG00000135587 6 1.1E+08 1.1E+08 7.72E-05 SMPD2 ENSG00000254858 19 18293992 18317758 8.44E-05 MPV17L2 ENSG00000135596 6 1.1E+08 1.1E+08 8.64E-05 MICAL1 ENSG00000163714 3 1.43E+08 1.43E+08 9.60E-05 U2SURP ENSG00000141452 18 21073473 21121746 9.72E-05 C18orf8 ENSG00000173894 17 77741931 77771782 9.88E-05 CBX2 ENSG00000198156 16 28343876 28384829 0.000102 NPIPB6 ENSG00000109436 4 1.42E+08 1.42E+08 0.000106 TBC1D9 ENSG00000155085 6 1.1E+08 1.1E+08 0.000106 AK9 ENSG00000198331 11 1.26E+08 1.26E+08 0.000107 HYLS1 ENSG00000146414 6 1.46E+08 1.46E+08 0.000108 SHPRH ENSG00000017260 3 1.31E+08 1.31E+08 0.000109 ATP2C1 ENSG00000164946 9 14724664 14920993 0.000112 FREM1 ENSG00000135535 6 1.1E+08 1.1E+08 0.000114 CD164 ENSG00000100344 22 44309619 44370368 0.00012 PNPLA3 ENSG00000170264 2 62041989 62091278 0.000125 FAM161A

287

ENSG00000004455 1 33463585 33556597 0.000128 AK2 ENSG00000127804 17 2298856 2425185 0.000129 METTL16 ENSG00000105649 19 18297594 18324884 0.000129 RAB3A ENSG00000107862 10 1.04E+08 1.04E+08 0.000133 GBF1 ENSG00000108384 17 56759934 56821703 0.000135 RAD51C ENSG00000158636 11 76145967 76274069 0.000135 C11orf30 ENSG00000109743 4 15694573 15749936 0.000139 BST1 ENSG00000142273 19 45271126 45313891 0.000142 CBLC ENSG00000133103 13 40219764 40375802 0.000143 COG6 ENSG00000248167 6 30287359 30324631 0.000146 TRIM39- RPP21 ENSG00000168143 6 54701569 54816820 0.000155 FAM83B ENSG00000255408 5 1.4E+08 1.4E+08 0.000155 PCDHA3 ENSG00000237765 4 15673285 15717188 0.000164 FAM200B ENSG00000124198 20 47528427 47663230 0.000169 ARFGEF2 ENSG00000159692 4 1195236 1253741 0.00017 CTBP1 ENSG00000167522 16 89324038 89566969 0.000177 ANKRD11 ENSG00000111262 12 5009071 5050527 0.000179 KCNA1 This table lists the results of the MAGMA-based results from the friendship satisfaction GWAS (Chapter 6). Significant results have been italicized.

288

Appendix Table 16: Results of the chromatin interactions for the SQ-R (Chapter 7)

Region1 Region2 FDR P Tissue/cell Genes Subset 14:26120001-26160000 14:29040001-29080000 1.19E-14 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27440001-27480000 1.29E-12 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28800001-28840000 4.34E-09 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:70640001-70680000 1.21E-08 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:28040001-28080000 1.79E-08 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26080001-26120000 2.55E-08 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27160001-27200000 1.16E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27320001-27360000 1.63E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28760001-28800000 2.65E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28400001-28440000 2.92E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26040001-26080000 3.37E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27720001-27760000 3.97E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27240001-27280000 5.26E-07 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27200001-27240000 9.32E-07 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:71440001-71480000 1.08E-06 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26200001-26240000 1.50E-06 Hippocampus NA Males_only 14:26120001-26160000 14:26200001-26240000 2.53E-06 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:71800001-71840000 4.98E-06 Neural_Progenitor_Cell ENSG00000141665: Non_stratified ENSG00000075336 14:26120001-26160000 14:27760001-27800000 5.37E-06 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:71000001-71040000 6.85E-06 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:25880001-25920000 1.01E-05 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:71600001-71640000 1.02E-05 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:71680001-71720000 2.63E-05 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26040001-26080000 3.43E-05 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:27880001-27920000 3.70E-05 Neural_Progenitor_Cell NA Males_only

289

14:26120001-26160000 14:25920001-25960000 4.12E-05 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:70680001-70720000 5.09E-05 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:28720001-28760000 5.47E-05 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26800001-26840000 5.64E-05 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26000001-26040000 7.63E-05 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:29000001-29040000 8.27E-05 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28560001-28600000 1.02E-04 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27920001-27960000 1.15E-04 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:71480001-71520000 1.75E-04 Dorsolateral_Prefrontal_Cortex NA Non_stratified 18:70720001-70760000 18:71480001-71520000 2.00E-04 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:27440001-27480000 2.19E-04 Dorsolateral_Prefrontal_Cortex NA Males_only 18:70720001-70760000 18:71760001-71800000 2.21E-04 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:71120001-71160000 3.65E-04 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:27000001-27040000 4.00E-04 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26680001-26720000 6.89E-04 Neural_Progenitor_Cell NA Males_only 3:117480001-117520000 3:117280001-117320000 7.26E-04 Neural_Progenitor_Cell NA Non_stratified 3:117480001-117520000 3:116480001-116520000 8.42E-04 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26360001-26400000 8.66E-04 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26840001-26880000 1.25E-03 Neural_Progenitor_Cell NA Males_only 3:117480001-117520000 3:116520001-116560000 1.25E-03 Neural_Progenitor_Cell NA Non_stratified 3:117480001-117520000 3:116680001-116720000 1.26E-03 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26520001-26560000 1.31E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26800001-26840000 1.42E-03 Hippocampus NA Males_only 3:117520001-117560000 3:116480001-116520000 1.45E-03 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:71360001-71400000 1.54E-03 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:29120001-29160000 1.78E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27320001-27360000 1.88E-03 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:25880001-25920000 1.92E-03 Hippocampus NA Males_only 14:26120001-26160000 14:28520001-28560000 1.97E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28960001-29000000 2.03E-03 Neural_Progenitor_Cell NA Males_only

290

18:70720001-70760000 18:71560001-71600000 2.18E-03 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:27840001-27880000 2.26E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26320001-26360000 2.30E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27480001-27520000 2.32E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28080001-28120000 2.43E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26200001-26240000 2.71E-03 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:27240001-27280000 2.85E-03 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:26720001-26760000 3.56E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27560001-27600000 3.87E-03 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26960001-27000000 3.92E-03 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:70800001-70840000 4.12E-03 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:28680001-28720000 4.75E-03 Neural_Progenitor_Cell NA Males_only 3:117480001-117520000 3:116720001-116760000 4.75E-03 Neural_Progenitor_Cell NA Non_stratified 3:117480001-117520000 3:117240001-117280000 5.35E-03 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:27680001-27720000 5.45E-03 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:68240001-68280000 5.84E-03 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:28600001-28640000 5.86E-03 Neural_Progenitor_Cell NA Males_only 3:117480001-117520000 3:118360001-118400000 6.29E-03 Neural_Progenitor_Cell NA Non_stratified 3:117520001-117560000 3:116640001-116680000 6.29E-03 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:70920001-70960000 7.25E-03 Neural_Progenitor_Cell ENSG00000263711 Non_stratified 14:26120001-26160000 14:29000001-29040000 7.87E-03 Hippocampus NA Males_only 14:26120001-26160000 14:28160001-28200000 7.99E-03 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:29040001-29080000 7.99E-03 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:25480001-25520000 8.76E-03 Neural_Progenitor_Cell ENSG00000168952 Males_only 3:117480001-117520000 3:117440001-117480000 1.01E-02 Hippocampus NA Non_stratified 14:26120001-26160000 14:26480001-26520000 1.02E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 3:117480001-117520000 3:116960001-117000000 1.03E-02 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:28360001-28400000 1.07E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26080001-26120000 1.08E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:28000001-28040000 1.09E-02 Dorsolateral_Prefrontal_Cortex NA Males_only

291

3:117520001-117560000 3:118520001-118560000 1.12E-02 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:71520001-71560000 1.44E-02 Dorsolateral_Prefrontal_Cortex NA Non_stratified 14:26120001-26160000 14:27880001-27920000 1.50E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 3:117520001-117560000 3:116560001-116600000 1.53E-02 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:71400001-71440000 1.55E-02 Neural_Progenitor_Cell NA Non_stratified 18:70720001-70760000 18:70480001-70520000 1.60E-02 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26240001-26280000 1.69E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28640001-28680000 1.79E-02 Neural_Progenitor_Cell NA Males_only 18:70720001-70760000 18:68680001-68720000 1.93E-02 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26880001-26920000 1.94E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28120001-28160000 2.10E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27800001-27840000 2.36E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:25840001-25880000 2.41E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 3:117520001-117560000 3:117640001-117680000 2.43E-02 Neural_Progenitor_Cell NA Non_stratified 3:117520001-117560000 3:117240001-117280000 2.58E-02 Hippocampus NA Non_stratified 18:70720001-70760000 18:71400001-71440000 2.65E-02 Dorsolateral_Prefrontal_Cortex NA Non_stratified 3:117520001-117560000 3:116720001-116760000 2.71E-02 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:27360001-27400000 3.30E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:27520001-27560000 3.31E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28880001-28920000 3.36E-02 Neural_Progenitor_Cell NA Males_only 3:117480001-117520000 3:116480001-116520000 3.37E-02 Hippocampus NA Non_stratified 14:26120001-26160000 14:28320001-28360000 3.55E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28880001-28920000 3.75E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 3:117480001-117520000 3:118320001-118360000 3.79E-02 Neural_Progenitor_Cell NA Non_stratified 3:117520001-117560000 3:116600001-116640000 3.99E-02 Dorsolateral_Prefrontal_Cortex NA Non_stratified 18:70720001-70760000 18:71240001-71280000 4.06E-02 Neural_Progenitor_Cell NA Non_stratified 3:117520001-117560000 3:116600001-116640000 4.09E-02 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:25880001-25920000 4.27E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 18:70720001-70760000 18:68160001-68200000 4.42E-02 Neural_Progenitor_Cell NA Non_stratified 14:26120001-26160000 14:26080001-26120000 4.50E-02 Hippocampus NA Males_only

292

14:26120001-26160000 14:25920001-25960000 4.56E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:26160001-26200000 4.67E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:28800001-28840000 4.73E-02 Hippocampus NA Males_only 14:26120001-26160000 14:25600001-25640000 4.79E-02 Neural_Progenitor_Cell NA Males_only 14:26120001-26160000 14:26720001-26760000 4.80E-02 Dorsolateral_Prefrontal_Cortex NA Males_only 14:26120001-26160000 14:27040001-27080000 4.83E-02 Neural_Progenitor_Cell ENSG00000139910 Males_only This table lists the FDR corrected significant chromatin interactions in the genome-wide significant loci for the SQ-R (Chapter 7).

293

Appendix Table 17: Gene based analyses for the SQ-R (Chapter 7)

Ensembl Gene Chr Start Stop N P Gene Symbol ENSG00000054282 1 243409320 243673394 51564 3.70E-09 SDCCAG8 ENSG00000130449 5 60618100 60851997 51564 8.52E-08 ZSWIM6 ENSG00000105732 19 42562629 42595701 51564 1.99E-06 ZNF574 ENSG00000033170 14 65867310 66220839 51564 2.13E-06 FUT8 ENSG00000105737 19 42492473 42583650 51564 9.07E-06 GRIK5 ENSG00000258484 15 69100560 69249150 51564 1.17E-05 SPESP1 ENSG00000172469 6 96015419 96067333 51564 1.83E-05 MANEA ENSG00000150760 10 128583978 129260781 51564 2.16E-05 DOCK1 ENSG00000068878 2 54081204 54207977 51564 3.39E-05 PSME4 ENSG00000171316 8 61581337 61789465 51564 4.00E-05 CHD7 ENSG00000116586 1 156014543 156038301 51564 4.19E-05 LAMTOR2 ENSG00000171903 19 16013177 16055677 51564 4.75E-05 CYP4F11 ENSG00000028277 19 42580263 42710737 51564 4.88E-05 POU2F2 ENSG00000152601 3 151951617 152193569 51564 5.53E-05 MBNL1 ENSG00000196628 18 52879562 53342018 51564 6.71E-05 TCF4 ENSG00000122852 10 81360695 81385196 51564 7.13E-05 SFTPA1 ENSG00000270898 2 53887430 54097297 51564 7.30E-05 ASB3 ENSG00000100490 14 50786310 50893179 51564 8.99E-05 CDKL1 ENSG00000181031 17 52293 246045 51564 9.01E-05 RPH3AL ENSG00000041353 18 52375091 52572747 51564 1.04E-04 RAB27B ENSG00000132698 1 156020951 156050295 51564 1.07E-04 RAB25 ENSG00000178386 19 44545520 44582144 51564 1.09E-04 ZNF223 ENSG00000065135 1 110081233 110146975 51564 1.11E-04 GNAI3 ENSG00000143702 1 243277730 243428650 51564 1.17E-04 CEP170 ENSG00000090263 7 140695854 140725028 51564 1.39E-04 MRPS33 ENSG00000159882 19 44497100 44528078 51564 1.40E-04 ZNF230 ENSG00000076554 8 80860571 81153467 51564 1.60E-04 TPD52 ENSG00000267022 19 44519506 44601471 51564 2.02E-04 ZNF223 ENSG00000092964 8 26361791 26525694 51564 2.06E-04 DPYSL2 ENSG00000115239 2 53749810 54097170 51564 2.08E-04 GPR75- ASB3 ENSG00000189233 8 27869481 27951388 51564 2.11E-04 NUGGC ENSG00000068912 2 54004181 54055956 51564 2.24E-04 ERLEC1 ENSG00000255346 15 69212864 69365083 51564 2.35E-04 NOX5 ENSG00000066248 2 233733396 233887982 51564 2.44E-04 NGEF ENSG00000119737 2 54070050 54097126 51564 2.48E-04 GPR75 ENSG00000155367 1 113242616 113268099 51564 2.58E-04 PPM1J ENSG00000196357 19 36663188 36747159 51564 2.59E-04 ZNF565 ENSG00000182600 2 233711980 233753418 51564 2.62E-04 C2orf82 ENSG00000251322 22 51102843 51181726 51564 2.73E-04 SHANK3

294

ENSG00000134245 1 112999163 113082787 51564 2.80E-04 WNT2B ENSG00000188283 19 37698828 37744828 51564 2.81E-04 ZNF383 ENSG00000155130 6 114168541 114194648 51564 2.81E-04 MARCKS ENSG00000134138 15 37171406 37403504 51564 3.15E-04 MEIS2 ENSG00000108506 17 59932731 60015377 51564 3.30E-04 INTS2 ENSG00000164176 5 83226373 83690611 51564 3.82E-04 EDIL3 ENSG00000221962 3 152047484 152068779 51564 3.87E-04 TMEM14E ENSG00000187068 3 184785838 184880802 51564 3.92E-04 C3orf70 ENSG00000185565 3 115511235 117726095 51564 3.99E-04 LSAMP ENSG00000138592 15 50706577 50803280 51564 4.01E-04 USP8 ENSG00000161281 19 36631824 36653771 51564 4.07E-04 COX7A1 ENSG00000129933 19 19421490 19479563 51564 4.09E-04 MAU2 ENSG00000158987 5 130749614 130980929 51564 4.35E-04 RAPGEF6 ENSG00000105705 19 19376827 19441653 51564 4.46E-04 SUGP1 ENSG00000254726 1 156031804 156061789 51564 5.28E-04 MEX3A ENSG00000156097 1 110072494 110101028 51564 5.29E-04 GPR61 ENSG00000205138 19 36476090 36497220 51564 5.39E-04 SDHAF1 ENSG00000154359 8 12569403 12623582 51564 5.39E-04 LONRF1 ENSG00000175520 11 5518530 5541215 51564 5.52E-04 UBQLN3 ENSG00000156475 5 145957936 146474347 51564 5.52E-04 PPP2R2B ENSG00000188779 15 68102042 68136899 51564 5.52E-04 SKOR1 ENSG00000009724 1 11076580 11117290 51564 5.74E-04 MASP2 ENSG00000182504 3 101432769 101499406 51564 5.76E-04 CEP97 ENSG00000168216 6 70375694 70517003 51564 5.77E-04 LMBRD1 ENSG00000156050 14 74388204 74427117 51564 5.81E-04 FAM161B ENSG00000263002 19 44635710 44673156 51564 5.93E-04 ZNF234 ENSG00000196407 1 151809739 151836173 51564 6.02E-04 THEM5 ENSG00000204539 6 31072867 31098223 51564 6.05E-04 CDSN ENSG00000203871 6 88096840 88119467 51564 6.12E-04 C6orf164 ENSG00000134183 1 110135889 110165679 51564 6.15E-04 GNAT2 ENSG00000213996 19 19365173 19394200 51564 6.18E-04 TM6SF2 ENSG00000262874 19 51881543 51903828 51564 6.22E-04 C19orf84 ENSG00000155636 2 178967151 179013738 51564 6.40E-04 RBM45 ENSG00000138821 4 103162198 103362415 51564 6.54E-04 SLC39A8 ENSG00000173372 1 22952999 22976101 51564 6.69E-04 C1QA ENSG00000186026 19 44566297 44603766 51564 6.80E-04 ZNF284 ENSG00000108510 17 60009966 60152643 51564 6.87E-04 MED13 ENSG00000167491 19 19486635 19629740 51564 6.89E-04 GATAD2A ENSG00000131370 3 15286360 15392875 51564 6.95E-04 SH3BP5 ENSG00000171608 1 9701790 9799172 51564 7.02E-04 PIK3CD ENSG00000170634 2 54187975 54542437 51564 7.42E-04 ACYP2 ENSG00000217128 5 130967407 131142710 51564 7.53E-04 FNIP1 ENSG00000167625 19 42714423 42742353 51564 7.62E-04 ZNF526 ENSG00000113361 5 31183857 31339253 51564 7.85E-04 CDH6

295

ENSG00000108298 17 37346536 37370980 51564 7.99E-04 RPL19 ENSG00000213625 1 65876270 65911690 51564 8.25E-04 LEPROT ENSG00000184040 10 18031218 18099855 51564 8.35E-04 TMEM236 ENSG00000214595 2 54940636 55209157 51564 8.41E-04 EML6 ENSG00000126249 19 34885289 34927073 51564 8.61E-04 PDCD2L ENSG00000023697 12 16054106 16200220 51564 8.68E-04 DERA ENSG00000234776 11 45918085 45938833 51564 8.82E-04 C11orf94 ENSG00000166454 16 81059452 81090963 51564 8.94E-04 ATMIN ENSG00000134864 13 101173810 101251782 51564 8.97E-04 GGACT ENSG00000175518 11 5525623 5547935 51564 9.11E-04 UBQLNL ENSG00000087301 14 52887308 53029240 51564 9.14E-04 TXNDC16 ENSG00000166164 16 50337398 50412845 51564 9.22E-04 BRD7 ENSG00000095713 10 99614757 99800585 51564 9.34E-04 CRTAC1 ENSG00000141570 17 77755931 77785482 51564 9.52E-04 CBX8 ENSG00000168787 6 29354416 29375448 51564 9.60E-04 OR12D2 ENSG00000187097 14 74414713 74496102 51564 9.66E-04 ENTPD5 ENSG00000160803 1 155995092 156033585 51564 9.97E-04 UBQLN4 ENSG00000144815 3 101488046 101557073 51564 9.98E-04 NXPE3 This table lists the results (top 100 most significant) of the MAGMA-based results from the SQ-R (non-stratified) GWAS (Chapter 7). Significant genes after Bonferroni correction have been italicized.

296

Appendix Table 18: Pathway analyses for the SQ-R (Chapter 7)

N genes Beta SE P Full Name 8 1.16 0.288 2.83E-05 Curated_gene_sets:nadella_prkar1a_targets_dn 203 0.204 0.0574 0.000195 Curated_gene_sets:tien_intestine_probiotics_24hr_dn 106 0.299 0.0847 0.000206 GO_mf:go_sh3_domain_binding 28 0.606 0.173 0.000232 GO_bp:go_positive_regulation_of_tor_signaling 67 0.378 0.113 0.000396 GO_mf:go_extracellular_ligand_gated_ion_channel_activity 16 0.616 0.19 0.00059 Curated_gene_sets:ray_targets_of_p210_bcr_abl_fusion_dn 581 0.116 0.0359 0.000647 GO_mf:go_protein_domain_specific_binding 34 0.498 0.156 0.000703 GO_bp:go_establishment_or_maintenance_of_apical_basal_cell_polarity 34 0.498 0.156 0.000703 GO_bp:go_establishment_or_maintenance_of_bipolar_cell_polarity 356 0.14 0.0441 0.000723 Curated_gene_sets:monnier_postradiation_tumor_escape_dn 10 1.04 0.33 0.000825 Curated_gene_sets:reactome_gaba_a_receptor_activation 21 0.505 0.163 0.00101 Curated_gene_sets:biocarta_cytokine_pathway 21 0.515 0.167 0.001024 Curated_gene_sets:biocarta_p53hypoxia_pathway 10 0.947 0.311 0.001158 GO_mf:go_monovalent_cation_proton_antiporter_activity 16 0.568 0.189 0.00135 GO_bp:go_regulation_of_inclusion_body_assembly 7 0.914 0.306 0.001423 Curated_gene_sets:mccollum_geldanamycin_resistance_dn 13 0.648 0.218 0.001453 Curated_gene_sets:kegg_circadian_rhythm_mammal 9 0.819 0.276 0.001515 Curated_gene_sets:yih_response_to_arsenite_c5 62 0.33 0.111 0.001521 Curated_gene_sets:pid_il12_2pathway 42 0.43 0.145 0.001532 GO_cc:go_chloride_channel_complex 9 0.803 0.274 0.001709 Curated_gene_sets:biocarta_gaba_pathway 271 0.151 0.052 0.00182 Curated_gene_sets:durand_stroma_s_up 8 0.989 0.34 0.001823 Curated_gene_sets:coller_myc_targets_dn 33 0.432 0.15 0.001977 GO_bp:go_regulation_of_myelination 35 0.425 0.148 0.002113 GO_bp:go_olfactory_lobe_development 10 0.889 0.315 0.002373 Curated_gene_sets:myllykangas_amplification_hot_spot_27

297

81 0.281 0.1 0.002522 GO_mf:go_anion_channel_activity 16 0.525 0.189 0.00269 GO_mf:go_exodeoxyribonuclease_activity 13 0.655 0.236 0.002752 Curated_gene_sets:biocarta_granulocytes_pathway 24 0.484 0.176 0.002979 Curated_gene_sets:gargalovic_response_to_oxidized_phospholipids_green_dn 30 0.454 0.166 0.003037 Curated_gene_sets:reactome_synthesis_of_pips_at_the_plasma_membrane 54 0.324 0.118 0.003095 Curated_gene_sets:reactome_loss_of_nlp_from_mitotic_centrosomes 27 0.382 0.14 0.003179 Curated_gene_sets:dacosta_uv_response_via_ercc3_xpcs_up 1497 0.0612 0.0225 0.003207 GO_mf:go_rna_binding 18 0.585 0.216 0.003353 GO_bp:go_purinergic_nucleotide_receptor_signaling_pathway 16 0.623 0.23 0.003405 Curated_gene_sets:nielsen_malignat_fibrous_histiocytoma_up 50 0.303 0.112 0.003525 GO_cc:go_nuclear_inner_membrane 22 0.436 0.162 0.003529 Curated_gene_sets:rampon_enriched_learning_environment_late_up 8 0.902 0.335 0.003588 Curated_gene_sets:oxford_rala_and_ralb_targets_dn 67 0.282 0.105 0.0036 GO_bp:go_regulation_of_dendrite_morphogenesis 10 0.722 0.269 0.003615 GO_bp:go_vocalization_behavior 291 0.146 0.0554 0.004121 Curated_gene_sets:aguirre_pancreatic_cancer_copy_number_up 20 0.514 0.195 0.004219 GO_cc:go_phosphatidylinositol_3_kinase_complex 64 0.282 0.107 0.004232 GO_bp:go_regulation_of_tor_signaling 57 0.317 0.121 0.004392 Curated_gene_sets:huttmann_b_cll_poor_survival_dn 71 0.263 0.101 0.004441 GO_bp:go_multi_organism_behavior 14 0.58 0.223 0.004648 GO_bp:go_positive_regulation_of_endothelial_cell_apoptotic_process 17 0.574 0.221 0.004677 GO_mf:go_nucleotide_receptor_activity 134 0.19 0.0733 0.004844 Curated_gene_sets:coulouarn_temporal_tgfb1_signature_dn 48 0.317 0.123 0.004923 Curated_gene_sets:appierto_response_to_fenretinide_dn 5 1.1 0.428 0.004934 Curated_gene_sets:tesar_alk_and_jak_targets_mouse_es_d4_up 22 0.461 0.179 0.005061 Curated_gene_sets:nikolsky_overconnected_in_breast_cancer 23 0.462 0.18 0.005191 GO_bp:go_purinergic_receptor_signaling_pathway 29 0.41 0.16 0.005235 GO_bp:go_regulation_of_neurotransmitter_receptor_activity 147 0.201 0.0789 0.005447 Curated_gene_sets:kegg_jak_stat_signaling_pathway 519 0.0961 0.0379 0.005607 Curated_gene_sets:martinez_rb1_targets_dn

298

717 0.0824 0.0328 0.005947 GO_cc:go_synapse 61 0.287 0.114 0.006078 Curated_gene_sets:reactome_recruitment_of_mitotic_centrosome_proteins_and_complexes 8 0.797 0.318 0.006126 Curated_gene_sets:vanloo_sp3_targets_up 12 0.685 0.274 0.00613 Curated_gene_sets:wagschal_ehmt2_targets_up 33 0.4 0.16 0.006152 Curated_gene_sets:korkola_seminoma_up 11 0.953 0.382 0.006339 GO_bp:go_positive_regulation_of_camp_mediated_signaling 29 0.412 0.166 0.00644 Curated_gene_sets:reactome_gluconeogenesis 11 0.702 0.283 0.006464 GO_bp:go_maintenance_of_cell_polarity 35 0.359 0.145 0.006529 GO_mf:go_glutamate_receptor_binding 14 0.698 0.283 0.006829 GO_cc:go_gaba_receptor_complex 10 0.579 0.235 0.006845 GO_mf:go_voltage_gated_chloride_channel_activity 40 0.314 0.128 0.006869 Curated_gene_sets:varela_zmpste24_targets_up 10 0.664 0.27 0.006996 Curated_gene_sets:shin_b_cell_lymphoma_cluster_6 45 0.323 0.132 0.007047 Curated_gene_sets:pid_erbb2_erbb3_pathway 39 0.335 0.137 0.007079 Curated_gene_sets:zhan_multiple_myeloma_dn 9 0.707 0.288 0.007148 Curated_gene_sets:mikkelsen_partially_reprogrammed_to_pluripotency 15 0.478 0.195 0.00721 GO_mf:go_voltage_gated_anion_channel_activity 34 0.372 0.152 0.007234 Curated_gene_sets:weigel_oxidative_stress_response 66 0.257 0.105 0.007237 GO_bp:go_cell_surface_receptor_signaling_pathway_involved_in_cell_cell_signaling 52 0.301 0.124 0.007396 GO_mf:go_excitatory_extracellular_ligand_gated_ion_channel_activity 63 0.247 0.102 0.007552 Curated_gene_sets:guo_hex_targets_dn 20 0.477 0.197 0.007621 GO_mf:go_purinergic_receptor_activity 11 0.626 0.259 0.007786 GO_bp:go_oxaloacetate_metabolic_process 16 0.44 0.182 0.007881 GO_bp:go_pseudouridine_synthesis 44 0.307 0.128 0.008126 Curated_gene_sets:heidenblad_amplicon_8q24_dn 24 0.391 0.164 0.008363 GO_cc:go_centriolar_satellite 19 0.522 0.218 0.008377 Curated_gene_sets:davicioni_pax_foxo1_signature_in_arms_dn 12 0.617 0.258 0.008397 GO_bp:go_positive_regulation_of_insulin_receptor_signaling_pathway 11 0.681 0.285 0.008441 GO_bp:go_telomere_localization 25 0.454 0.19 0.008552 GO_bp:go_inflammatory_response_to_antigenic_stimulus

299

18 0.474 0.199 0.008743 GO_bp:go_macrophage_differentiation 24 0.414 0.175 0.008829 GO_bp:go_postsynaptic_membrane_organization 33 0.356 0.151 0.00908 Curated_gene_sets:pid_il2_pi3k_pathway 27 0.418 0.177 0.009166 GO_bp:go_establishment_or_maintenance_of_epithelial_cell_apical_basal_polarity 12 0.539 0.228 0.009182 GO_bp:go_intra_s_dna_damage_checkpoint 120 0.178 0.0757 0.00922 GO_cc:go_neuron_projection_terminus 10 0.728 0.309 0.009249 Curated_gene_sets:claus_pgr_positive_meningioma_up 12 0.5 0.213 0.009373 GO_bp:go_3_utr_mediated_mrna_stabilization 73 0.22 0.0937 0.009563 GO_bp:go_vascular_endothelial_growth_factor_receptor_signaling_pathway 25 0.421 0.18 0.009635 GO_bp:go_regulation_of_t_cell_differentiation_in_thymus 25 0.421 0.18 0.009635 GO_bp:go_regulation_of_thymocyte_aggregation 56 0.276 0.119 0.010029 Curated_gene_sets:korkola_yolk_sac_tumor 15 0.437 0.188 0.010103 GO_cc:go_nuclear_membrane_part 41 0.324 0.139 0.010147 Curated_gene_sets:pid_amb2_neutrophils_pathway 12 0.534 0.23 0.010188 GO_bp:go_cellular_response_to_atp This table lists the pathway analysis using FUMA for the SQ-R (top 100 most significant results only). Ngenes = number of genes in pathway, Beta = regression beta, SE = standard error, P = P-value. None of the pathways were significant after Bonferroni correction.

300

301