3 Meta-Analysis of Genome- Wide Association Studies for Neuroticism in 449,484 Individuals Identifies Novel Genetic Loci and Pathways

Changing perspectives Towards detailed phenotyping in genetics Mats Nagel Ó Mats Nagel, 2020 All rights reserved. The research in this thesis was supported by a grant from the Netherlands Organization for Scientific Research, project number 452-12-014. Cover design: Mats Nagel Cover photo: Vern Dewit Printing: Off Page ISBN: 978-94-93197-07-7 DOI: 10.31237/osf.io/a4nz2 VRIJE UNIVERSITEIT Changing perspectives Towards detailed phenotyping in genetics ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor of Philosophy aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. V. Subramaniam, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de Faculteit der Bètawetenschappen op dinsdag 30 juni 2020 om 13.45 uur in een online bijeenkomst van de universiteit, De Boelelaan 1105 door Mats Nagel geboren te Amsterdam promotor: prof.dr. D.P. Posthuma copromotor: dr. S. van der Sluis Leescommissie: prof.dr. H. D. Mansvelder prof. C. M. Lewis prof.dr. B. W. J. H. Penninx dr. E. I. Fried dr. M. G. Nivard Paranimfen: Oscar van Mourik Philip R. Jansen Contents 1 General introduction 9 1.1 Brief introduction 10 1.2 Background 11 1.3 Genetic research into psychological traits 15 1.4 Aims and outline of this thesis 24 1.5 Glossary 27 2 Item-level analyses reveal genetic heterogeneity in neuroticism 31 Abstract 32 2.1 Introduction 33 2.2 Results 34 2.3 Discussion 43 2.4 Methods 44 2.5 Supplementary information 50 3 Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways 53 Abstract 54 3.1 Results 55 3.2 Discussion 68 3.3 Methods 69 3.4 Supplementary methods 78 3.5 Supplementary results 83 4 Examining the genetic validity of internalizing disorders through item-level analyses 99 Abstract 100 4.1 Introduction 101 4.2 Results 102 4.3 Discussion 110 4.4 Methods 113 4.5 Supplementary methods 121 5 Genome-wide meta-analysis of brain volume on 47,316 individuals identifies genomic loci and genes shared with intelligence 125 Abstract 126 5.1 Introduction 127 5.2 Results 127 5.3 Conclusion 136 5.4 Methods 138 5.5 Supplementary methods 147 5.6 Supplementary results 148 6 Discussion 155 6.1 Background 156 6.2 Summary of findings 157 6.3 Obstacles in the way of studying detailed phenotypes 159 6.4 Future directions 162 References 167 Peer-reviewed publications 183 Nederlandse samenvatting 185 Dankwoord 187 1 General introduction 1.1 Brief introduction 1 Mental health problems are highly prevalent in modern-day society1. Problems range from non- clinical manifestations of disturbed mood to severe, chronic psychiatric disorders like major depression, schizophrenia and autism spectrum disorder (ASD). Despite several decades of intensive research aimed at identifying the underlying biological mechanisms, and potential drug targets, pharmacological treatments still have limited success2,3. Since all traits are at least partially influenced by our genetic makeup4, using genetic information to increase our understanding of the biological mechanisms underlying mental health problems might eventually benefit patients. The last two decades have seen an influx of studies assessing genetic variants related to a wide variety of behaviors, ranging from monogenic medical diseases to complex psychiatric disorders. Genome- wide association studies (GWAS) provide an exploratory way to identify genetic variants throughout the genome that are, statistically, associated to a trait of interest. The explosion of GWAS studies since 2005 (https://www.ebi.ac.uk/gwas/diagram) has drastically increased our knowledge of the biology of diseases and identified thousands of variants involved in a wide variety of (disease) traits. Yet for many complex traits, like psychiatric disorders, the identified genetic variants explain only a fraction of the variance in the trait5,6. We argue that this may, in part, be the result of the way in which neuropsychiatric traits are operationalized in genetic studies. Typically, participants are classified as cases (i.e., people that suffer from a given psychiatric disorder) or as controls (i.e., not suffering from that particular disorder). However, people suffering from the same disorder may exhibit different sets of symptoms that may, in turn, be influenced by different genetic variants. In other words, the manner in which phenotypes are operationalized will have consequences for the success of genetic analyses. Therefore, in order to properly study the genetic basis of complex behavior, it is vital to think about the exact nature of the phenotypes used in the analysis, and the way they are operationalized. This thesis uses large-scale genetic data and state-of-the-art methods to study the merits of more detailed phenotyping in uncovering the genetics of complex neuropsychiatric traits. 10 1.2 Background 1 All traits are heritable The study of twins provides an excellent opportunity for disentangling the relative contributions of environmental versus genetic influences on the variation that can be observed in behavior. Twin- studies exploit the fact that monozygotic twins (see glossary at the end of this chapter for an explanation) are (virtually) genetically identical, whereas dizygotic twins share approximately 50% of their genetic information. The degree to which behavior is heritable can then be estimated by e.g. comparing the phenotypic similarity between mono- versus dizygotic twin pairs7,8. Twin-studies (and behavior genetics studies in general) have shown that essentially all traits are to some extent heritable, and thus that genetic variants must exist that influence susceptibility to, or variation in, these traits4,9. That notwithstanding, there is considerable variation in heritability (h2) across traits. For instance, heritability of traits in the ophthalmological (h2 = 0.71) and skeletal (h2 = 0.59) domains is estimated to be substantially higher than the heritability of traits in the social values and interactions domains (h2 = 0.31 and h2 = 0.32, respectively)4. However, for all traits included in a recent meta-analysis by Polderman et al. (2015) of virtually all published twin studies since the 1950’s, heritability was >0, and across all traits the average contribution of genetic influences was as high as 49%, i.e., on average, 49% of the variation observed in behavior is due to variation on a genetic level. Although twin-studies provide valuable information on the extent to which behavior is heritable, the design does not allow identification of individual genetic variants that are associated with a given trait. That is, these studies are not informative on which and how many genes are involved, or on the magnitude of their effects. Sequencing the human genome The beginning of the new millennium marked the publication of the first human genome sequence10. Until then, knowledge on the human genetic code was only fragmentary, and limited indeed. The human genome project (HGP) has, for instance, taught us that approximately 6% of the genome is under purifying selection (and can thus be assumed to be functional), whereas only about 1.5% of the genome sequence encodes proteins11, which raises the question what it is these non-coding parts of the genome do. Also, the HGP shed more light on the different classes of variation that exist in the genome (see Box 1). 11 Box 1. Classes of genetic variation 1 The human genome consists of approximately 3 billion base pairs (nucleotides; A=Adenine, T=Thymine, C=Cytosine or G=Guanine). When comparing base pairs between two individuals, most locations will hold the same information: on average, only about 0.1% of the genome sequence differs between people. Human genetic variation can be divided into several classes. Here, we broadly distinguish between single nucleotide variants (or single nucleotide polymorphisms; SNPs; pronounce: snips) and structural variants (see Fig. 1.1). SNPs are variations in the DNA sequence where a single nucleotide is changed (e.g., at a specific position the C nucleotide may occur in most individuals, but is replaced by a T nucleotide in a minority of people). SNPs occur approximately once in every 1,000 nucleotides, and are the most common type of genetic variation. A SNP may be virtually unique (i.e., very rare SNPs), or occur in many individuals (i.e., common SNPs). Structural variants can be subdivided into several categories (Fig. 1.1). When one or more base pairs are observed in some genomes, but not in others this is referred to as an insertion or deletion variant (often called indels). Since the genetic code is read three nucleotides (called ‘codons’) at a time, an indel (when it is made up of a number of nucleotides that is not divisible by three) can change the subsequent codons. This is called a reading frame shift, and may result in an altered protein. Block substitutions occur when the variation between genomes covers a sequence of nucleotides. Inversions are variants where a certain fragment of the genome is observed in reverse order. Copy-number variants occur when sections of the genome are repeated. Copy-number variations can be quite large, possibly stretching out over 2 Mb or more.12 Figure 1.1. Classes of genetic variation in the human genome. Here, the two rows represent genomes of two different individuals. This figure was inspired by Frazer et al. (2009)12. 12 Another important insight was provided by the characterization of so-called linkage disequilibrium (LD) patterns throughout the genome. The mapping of LD13, regions of 1 strongly correlated genetic variants, meant that approximately 90% of the variation in the genome can be captured by ‘just a fraction’ (i.e., 500,000 – 1,000,000 single nucleotide polymorphisms, or SNPs) of the total number of SNPs making up the genome11. This finding was quickly taken advantage of in the development of genotyping arrays, which now allow for measuring millions of variants simultaneously.

Load more