GENETIC CHARACTERIZATION OF AMYOTROPHIC LATERAL SCLEROSIS

Annelot M Dekker GENETIC CHARACTERIZATION OF AMYOTROPHIC LATERAL SCLEROSIS

Genetische karakterisering van amyotrofische laterale sclerose

Annelot M. Dekker Colofon

© Annelot Dekker, 2019

ISBN: 978-94-92801-74-6

Cover design, layout and print: Guus Gijben, proefschrift-aio.nl

All rights reserved. No part of this publication may be reproduced or transmitted in any form or any means, electronical or mechanical, including photocopy, recording or otherwise, without prior written permission from the author. The copyright of the articles that have been published had been transferred to the respective journals. Genetic characterization of amyotrophic lateral sclerosis

Genetische karakterisering van amyotrofische laterale sclerose

(met een samenvatting in het Nederlands)

Proefschrift

ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de rector magnificus, prof. dr. H.R.B.M. Kummeling, ingevolge het besluit van het college voor promoties in het openbaar te verdedigen op donderdag 21 maart 2019 des middags te 2.30 uur

door

Annelot Marije Dekker

geboren op 21 augustus 1987 te Alphen aan den Rijn Promotoren: Prof. dr. J.H. Veldink Prof. dr. L.H. van den Berg

Copromotoren: Dr. S.L. Pulit Dr. M.A. van Es TABLE OF CONTENTS

Chapter 1 Introduction

Chapter 2 Association of NIPA1 repeat expansions with amyotrophic lateral sclerosis in a large international cohort.

Chapter 3 Large-scale screening in sporadic amyotrophic lateral sclerosis identifies genetic modifiers in C9orf72 repeat carriers.

Chapter 4 Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis.

Chapter 5 Exome array analysis of rare and low frequency variants in amyotrophic lateral sclerosis.

Chapter 6 NEK1 variants confer susceptibility to amyotrophic lateral sclerosis.

Chapter 7 Discussion

Appendices Abbreviations Nederlandse samenvatting Acknowledgements Publications About the author 6 Chapter 1 1

Introduction

7 Chapter 1 | General Introduction

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease. It is a complex disease, with both genetic and environmental risk factors contributing to disease onset. Disentangling these risk factors is essential for understanding who will develop ALS, what the disease mechanisms are, and how they can best be treated. This thesis focuses on the genetic underpinnings of ALS. The techniques and methods used to unravel ALS genetics arise from the scientific advances in the field of genetics in general.

Leaps and bounds made possible by genetic discovery Since the beginning of time, it has been clear that for many traits, parents and their offspring are more alike than unrelated people. This phenomenon captured the imagination of many great thinkers and prompted studies on inheritance of traits. Ancient Greek philosophers and scientists like Pythagoras and Plato postulated theorems on heredity millennia ago, although in many of those theories the seed of the father created the child and the mother was regarded as merely a incubator. Aristotle offered a radical alternative, which included women as equal contributors to their descendants, and the child as the product of the parents’ comingled blood¹-³. The basics of these theories held for centuries, and it was not until the 1800s and early 1900s that scientists started hypothesizing about the presence of (from gonos meaning product or offspring) as carriers of hereditary information.

In 1888, in a different area of research, Heinrich Wilhelm Waldeyer coined the term ‘’ (meaning stainable bodies) for the nuclear threads described by Walther Flemming in his groundbreaking book ‘Zellsubstanz, Kern und Zelltheilung’ (‘Cell substance, nucleus and cell division’)⁴. For years, geneticists and biochemists did not communicate5, until Walter Sutton and Theodore Boveri independently proposed a connection between trait inheritance and the path that chromosomes travel during meiotic cell division and gamete formation⁶. Finally, in 1952, Maurice Wilkins and Rosalind Franklin made their landmark X-ray crystallographic photographs of deoxyribonucleic acid (DNA), leading to the subsequent publication of the model for the helical structure of DNA by James Watson and Francis Crick in 1953⁷-⁹. The advances skyrocketed, and millions of scientific papers have been published in the field of genetics since, allowing us to further unravel this code of life.

The basics of the four letter code For one to understand the rationale behind the technologies presently used in genetics, we need to go back to DNA basics. Our nuclear DNA is divided into 46 chromosomes, 23 inherited from the mother and 23 inherited from the father. Twenty-two pairs are

8 homologous autosomes and one pair determines sex; the X and Y chromosomes. 1 In addition to nuclear DNA, the cell’s mitochondria also contain stretches of DNA, inherited from only the mother. DNA is composed of four different nucleotides, each defined by a specific chemical base: adenine (A), guanine (G), cytosine (C) and thymine (T). All the chromosomes combined form the , comprising over three billion nucleotides. Blocks of three nucleotides (so-called codons) form the code for 20 different amino acids, and combinations of amino acids build . A stretch of DNA that codes for a functional molecule is called a . Although appearing single- stranded during cell-division, DNA is most stable in its double-stranded helical form, where each strand forms a perfect template for the other. In a single cell, two nearly one-meter long stretches of DNA are tightly wrapped around proteins called histones, and even more densely wrapped through a process called supercoiling into threads called chromatin. This three-dimensional configuration of the genome is dynamic and crucial for gene regulation¹⁰. The human genome contains large stretches of DNA that do not encode for proteins, once termed ‘junk-DNA’. We now know that many of these sequences are regulatory elements for genes. The structural organization of DNA allows sections of DNA that are far away in linear genomic distance to physically interact due to spatial proximity. This subsequently allows adaptation and regulation of cellular processes on demand¹¹. Epigenetic mechanisms, such as DNA methylation and histone modifications, work to further stabilize programs and regulate cell-type identities¹². Individual genes specify individual proteins, but it is the relationship among genes and their products that creates function and physiology. One needs to understand these basic (but not simple) principles to enable genetic studies on pathophysiology.

The genetic variants that make us unique share 99.9% of their genome (comparatively, human and chimpanzee genomes share approximately 98.8%). The variants in the DNA code determine our unique properties, but can also predispose for disease¹³. DNA variation comes in different shapes and sizes, some more abundant or easier to study than others. The most common and most extensively studied variants are nucleotide substitutions: variants at one position in the nucleotide sequence, also called single nucleotide variants or SNVs. SNVs can be located either in non-coding regions (where they can alter, e.g., gene expression), or coding regions. SNVs in coding regions are classified according to the effect on the translated peptide sequence. Variants that do not affect the sequence are termed synonymous, and SNVs that alter the amino acid and thus the protein, are called non-synonymous variants. Non-synonymous variants are further categorized into missense mutations (which change an amino acid to another

9 Chapter 1 | General Introduction

amino acid) and nonsense mutations (that change an amino acid to a stop-codon, resulting in premature termination of translation). Other common types of DNA variation are insertions and deletions of nucleotides (indels), which, if not divisible by three, result in a change in the reading frame of codons. These mutations are called frameshift mutations. Other types of variation include differences in number of nucleotide repeats (variable number tandem repeats or VNTRs) and structural variants (SVs) spanning over 1000 base pairs in length; e.g., copy number variants (CNVs), translocations and inversions¹⁴. Observed variants at a locus are called alleles, and the frequency of the least common variant in a population is called the minor allele frequency, or MAF. Since meiotic crossing-over generally occurs at preferred sites (so-called recombination hotspots), certain combinations of alleles tend to inherit together. Blocks of alleles that are inherited together from a single parent are called haplotype blocks.

CHROMOSOME

HISTONE PROTEIN

3D STRUCTURE

M

DNA METHYLATION M

EXON INTRON EXON GENE

DNA DOUBLE HELIX

NUCLEOTIDES THYMINE ADENINE CYTOSINE GUANINE

Figure 1.1 The structure of DNA – Representation of the very basics of genetics, i.e., conformation of nuclear DNA from double helix to . Adapted from Thomas Splettstoesser, scientific illustrator.

10 How technology and methodology shape what we can find 1 The discovery of recombinant DNA technologies of cloning and sequencing allowed scientists to unravel the DNA sequence of organisms. Nearly half a century passed between the discovery of DNA and the publication of the first draft sequence of the human genome in 2001 by The Human Genome Project¹⁵. The project took 13 years to complete and total costs approximated 2.7 billion dollars. Genetic technologies have improved dramatically since, with standard sequencing pipelines currently allowing researchers and clinicians to sequence a genome within days at around 1000 euro cost¹⁶. The rapid technological developments guided new techniques to measure DNA variation, and subsequently, methods to test these variants for association with traits. The most common types of studies to look for DNA variation associated to diseases are described below.

Genetic linkage analysis In the years preceding The Human Genome Project, scientists used a method called linkage analysis to study highly penetrant variants. As a result, they were able to identify genes for monogenic (Mendelian) disorders, which typically harbor such variants¹⁷. The core concept revolves around linkage: the observation that two genetic loci are linked if they are transmitted together from parent to offspring more often than expected under independent inheritance. By tracking hundreds or thousands of genetic markers through pedigrees affected by a disease with a recognizable inheritance pattern, it is possible to infer a marker’s position relative to others in the genome¹⁸. For many diseases, including some neurodegenerative diseases, the first step in isolation of the causative variant was the identification of the chromosomal location of the disease gene through linkage analysis¹⁹.

Candidate gene association studies Candidate gene studies are association studies in which the genotype of variants in prespecified genes of interest in individuals with and without a disease are determined. These polymorphisms are then tested for association with susceptibility to the trait²⁰. The hallmark of candidate gene association studies is that they are hypothesis-driven. They thus provide a focused analysis of genomic regions of interest that are assumed to be associated with the disease based on, for instance, previous (linkage) studies or involvement in disease-relevant pathways²¹. A major issue in this type of study has been the frequent lack of consistent replication (e.g., due to poor quality control, confounding variables, or inadequate significance thresholds), leading to many false- positive publications²²,²³.

11 Chapter 1 | General Introduction

Genome-wide association studies (GWAS) The public availability of the first genome-wide database of human genetic variation through The International HapMap Project and the dropping costs of high-throughput genotyping technologies, enabled the field of genetics to move forward to more detailed and hypothesis-free genome-wide studies²⁴,²⁵. GWAS are studies of common genetic variation across the entire human genome designed to identify genetic associations with observable traits²⁶. In a typical GWAS, cases affected by a (typically relatively common) disease and controls are genotyped for a large set of common SNVs. SNVs are termed ‘common’ if they surpass a certain frequency in the population, usually arbitrarily defined as ≥ 1%, and are then named single nucleotide polymorphisms (SNPs). Subsequently, frequencies of these SNPs are compared between cases and controls, to test for an association to disease susceptibility. Since the publication of the first GWAS in 2005²⁷, a total of 68,000 SNP-trait associations have been reported in over 5,000 studies, according to the NHGRI-EBI GWAS Catalog²⁸. The increasing public availability of (summary) data, for instance through the UK Biobank containing over 500,000 participants²⁹, allows groups to include individuals at an exponentially growing rate and currently published GWAS consequently not uncommonly include hundreds of thousands individuals³⁰-³³.

Crucial in the interpretation of GWAS results, is the concept of linkage disequilibrium (LD). This is the non-random association of alleles across haplotypes, which may or may not be due to linkage at population level. The genotyped disease-associated variant might therefore be the causal variant (direct association) or a variant associated with disease through a nearby disease-causing variant in LD with the genotyped variant (indirect association)³⁴. LD allows scientists to interrogate the genome beyond markers present on genotyping arrays, although pinpointing the disease-causing locus often proves challenging. The common use of haplotype-based imputation of variants not present on genotyping arrays increases power and ability to resolve or fine-map the causal variant in GWAS analyses³⁵.

Whole-exome sequencing (WES) / whole-genome sequencing (WGS) ‘Next-generation’ sequencing (NGS) revolutionized the study of genomic variation between individuals. The term NGS refers to high-throughput sequencing technologies that are used to sequence entire genomes (whole genome sequencing) or specific areas of interest, which could range from a selection of individual genes to all ~ 20,000 protein-coding sequences (whole exome sequencing, or WES)³⁶,³⁷. Although some types of case-control association analyses performed with WES or WGS are similar to those performed in GWAS, there are important differences pertaining to

12 1 SAMPLE SIZE SPOR FAM C T C A C T C A C C T C A C C T C A C C T C A C C T C A C C T C A T G C C T C A G GENE B G T C A T G C T A G C T G A A T G C G A T T A G C T G C T A G C C T A T C G

T C A T G C G T A G C T T G A T A C G

G

T C C T C T

A

G T G

G C

A T T A G C A G C T G C T A G C T G A T T A C G T G G C A T T C A T G C G T A G C T T G A T A C G T G G C A T T C A G T G C T A G C T T G A T A C G T G G C A T T C A T G C T A G C T G A T T A C G T G G C A T

T C A G C T G T A G C T T G A T A C G T G G C A T GENE A G T C C A G

C G C C C A A A C G T T G T C C G G C C A C A T C C T G T C G C G T A A T T A G C A G C T G C C T C A A T T G C T C T G C T T C A T G C T A G C T T A T C G T G T C T C G A T G T G C A G C A T G T C C T C G T G C A G C G C A T G T C G C A T G T C G C A T G T C G C A T G T C G C A T G T C G C A T G T C G A T WES WGS VNTR GWAS INSERTION SNV IN EXON SNV IN INTRON INTERGENIC SNV INTERGENIC LINKAGE ANALYSIS LINKAGE REFERENCE GENOME INDIVIDUAL SEQUENCE INDIVIDUAL CANDIDATE GENE STUDIES CANDIDATE A B C – Schematic representation of characteristics of different genetic methods. A. Fictive sequence of the reference of the reference sequence A. Fictive genetic methods. of characteristics of different 2. Genetic method characteristics – Schematic representation Figure B. bold orange. printed are Variants bars. as grey introns depictedbars, blue as are Exons exact those at positions. sequence individual’s an genome and C. Genetic depicted in grey. other nucleotides are The depicted in orange. in the individual sequence, types present of variation of different Explanation with include patients to suited whether the methods are represent circles FAM/SPOR The the true individual sequence. to compared method resolution on included samples per study; estimate of studies often family-based include dozens a gross represents Sample size family history. or without a positive computational costs, by determined are GWAS and WES WGS, in sizes sample Different samples. million a to up include can now GWAS where individuals capacity and analytical preferences.

13 Chapter 1 | General Introduction

costs, density of coverage of variation in the genome and MAF spectrum of the directly sequenced variants³⁸. Where GWAS analyses are mainly focused on testing common biallelic variants, NGS allows testing of rare variants (i.e., MAF < 1%) and other types of variation as well (besides SNVs)²⁵,³⁹. Although WGS costs of one human genome dropped dramatically over the last decades, analyzing large amounts of data is still costly compared to GWAS-arrays, both in terms and money and amount of required DNA. In addition, a strong data infrastructure is required in terms of analytical skills, computer capacity and available storage. Despite these challenges, many scientific groups have reverted to next-generation sequencing and publicly releasing their (summary) data. Today, the Genome Aggregation Database (gnomAD) contains 123,136 exomes and 15,496 genomes, providing the scientific community a huge reference set of allele frequencies⁴⁰. Also, NGS has evolved into an indispensable tool in daily clinical practice, particularly for diagnosis of Mendelian and neurodevelopmental disorders in pediatric populations⁴¹.

How to ask meaningful questions in disease genetics? Genetic studies, and scientific pursuit more broadly, often start with W’s:

Why The primary ‘why’ of genetic research in disease is to better understand disease pathophysiology, under the assumption that a better understanding will lead to prevention or better treatment. And, why do some people get sick and others don’t. However, the road from variant-trait association to biology has often proven difficult³⁸. Increasingly, prediction tools and therapies that are purely gene-based are applied, bypassing the need for a complete and fundamental understanding of disease pathophysiology to improve risk prediction and treatment. Therefore, the goal of genetic research is threefold: 1) dissection of the underlying disease pathophysiology, 2) providing more accurate risk prediction models based on genetic profile and 3) tailoring treatments based on genetic data⁴².

What The essence of the ‘what’ in genetics is to disentangle what is the genetic architecture of the studied trait. The genetic architecture of a disease is the underlying genetic basis defined as the combination of the number, type, frequency, relationship between and magnitude of effect of genetic variants contributing to a trait⁴³. Knowledge of the genetic architecture of a disease helps guide new study designs and directly informs the clinical goals of diagnosis, prognosis and identification of therapeutic targets⁴⁴.

14 Who 1 In (genetic) epidemiology, one needs to think carefully about who the study population should be. Every trait with a presumed genetic component can be studied, although in disease genetics some conditions are easier to study than others. First, a clear phenotype helps determine a clear ‘who’ in complex disease genetics (i.e., diseases that originate from combined genetic and environmental factors). Availability of the study population are determined by factors as age of onset, disease duration and willingness to report complaints⁴⁵. Additionally, to enable studies exploring more rare variation, large sample sizes are needed to achieve sufficient power. This can be challenging, especially for traits lying on the rarer end of the disease frequency spectrum. Therefore, universities across the globe are increasingly joining forces, forming large collaborative consortia that enable inclusion of large groups of geographically dispersed patients and controls (which, in itself, poses unique analytic challenges).

Unravelling the genetics of amyotrophic lateral sclerosis In 1874, the same year that experiments revealed that the material inside cell nuclei consist of acid and protein (hence DNA, deoxyribonucleic acid), amyotrophic lateral sclerosis (ALS) was first described in literature by Jean-Marie Charcot⁴⁶. He understood ALS to be a single neurodegenerative disease with a spectrum of presentations, on the basis of clinicopathological correlation⁴⁷. The core observations of the disease that were described then still hold true to this date. Although often grouped under neuromuscular diseases, ALS is indeed a neurodegenerative disease that primarily affects upper and lower motor neurons in the brain, brainstem and spinal cord. This leads to progressive muscle wasting, weakness and spasticity, eventually affecting nearly all voluntary muscles. Disease onset is focal, and symptoms spread gradually to other parts of the body. Diagnosis is challenging due to large phenotypic variability and patients are often referred to tertiary care centers specialized in neuromuscular diseases for a final opinion⁴⁸. There is no definitive test available for the diagnosis of ALS; it is currently based on fulfillment of electrophysiological and clinical criteria as described in the (revised) El Escorial Criteria⁴⁹,⁵⁰ or the Awaji Criteria⁵¹. Central to the diagnosis of ALS are disease spread, presence of signs of upper and lower motor neuron degeneration in multiple regions (i.e., bulbar, cervical, thoracic or lumbosacral) and exclusion of other disease processes possibly explaining the presentation. Other related motor neuron diseases include progressive muscular atrophy (PMA) and primary lateral sclerosis (PLS), where only the lower or upper motor neurons are affected, respectively. Most ALS patients die from respiratory failure, usually within three to five years of symptom onset, although survival time is highly variable⁵²,⁵³.

15 Chapter 1 | General Introduction

In European countries, incidence ranges between 2-3 per 100,000 person years⁵⁴. In the Netherlands each year approximately 500 people are diagnosed with ALS, and due to the very poor prognosis, the prevalence is not higher than 9-11 per 100.000 person years, corresponding to ~1,500 patients currently living with ALS in the Netherlands. The lifetime risk of developing ALS is 1 in 350 in men and 1 in 400 in women⁵²,⁵⁵. Although not rare in terms of incidence rates, under regulations of the European Union ALS is considered a rare disease (i.e., affecting less than 5 in 10.000 people) allowing orphan designation of drugs developed and tested for the disease. To date, only one drug, the glutamate-inhibitor riluzole, has been proven irrefutably to increase survival by approximately three months, through a mechanism not completely understood⁵⁶. Despite numerous trials, recent years only produced one new drug that possibly slows disease progression⁵⁷. Additional studies are warranted and ongoing, to prove definitively whether Edavarone, a free radical scavenger, is effective for the entire patient population and after intake for prolonged periods of time.

The underlying mechanisms of the central neuropathological hallmark of the ALS, aggregation and accumulation of ubiquitylated proteinaceous inclusions in motor neurons, are not yet fully elucidated⁵⁸. It is, in contrast to the original beliefs of Charcot, considered a complex disease, where both environmental and genetic factors contribute to disease onset⁵⁹. Nevertheless, more than a century passed between the Charcot’s ‘De la sclérose latérale amyotrophique’ and the first scientific article linking a gene to ALS⁶⁰. Twin studies have since estimated the heritability of the disease (the phenotypic variability that is attributable to genetic variation) at around 60%, albeit with large confidence intervals⁶¹. Historically, ALS has been divided into familial ALS (fALS) and sporadic ALS (sALS), distinguishing patients with a positive family history, from patients without. Approximately 5-10% of patients describe a positive family history concordant with a Mendelian inheritance pattern. Nowadays, we know that the dichotomy between fALS and sALS is somewhat artificial. The definition of fALS is debated between physicians⁶², and the appearance of familial disease depends on multiple factors as family size, accuracy of family history data and pleiotropy (i.e., genetic variants leading to multiple phenotypic traits). Consequently, mutations found in familial cases are also found in sporadic patients, albeit with lower prevalence⁴⁸,⁶³,⁶⁴.

In 1990, approximately 50 years after the first description of familial ALS, SOD1 was the first gene linked to ALS through linkage analysis in families with an autosomal dominant inheritance pattern⁶⁰. To date, dozens of variants in this gene have been

16 associated to the disease, with considerable phenotypic variation⁶⁵. Linkage analysis 1 again proved successful in 2008 and 2009 for identifying the next two major genes associated with ALS, TARDBP and FUS, respectively⁶⁶,⁶⁷. The studies were supported by the previous discovery of TAR DNA binding protein 43 (TDP-43) positive inclusions in motor neurons of ALS patients in 2006⁶⁸, and the subsequent search for TDP-43 homologues in the linkage region. These TDP-43 positive inclusions are now regarded as a pathological hallmark of ALS-related neurodegeneration⁶⁵. For years, it had been clear that there was a clinical overlap between ALS and frontotemporal dementia, and that they represent different ends of the same disease spectrum. TDP-43 inclusions found in affected neurons in both neurodegenerative disorders provided the pathological link⁶⁸. Although a region on the short arm of chromosome 9 had already been linked to the ALS-FTD spectrum, scientists had a hard time pinpointing the actual locus until the rise of GWAS. Through multiple studies, thousands of ALS or FTD cases and healthy controls were genotyped, and results allowed narrowing of the C9 locus to a 232 kilobase (kb) block of linkage disequilibrium in the chromosome 9p21 region⁶⁹-⁷¹. In 2011, two groups simultaneously published the discovery of a large hexanucleotide repeat expansion in between the first two exons of C9orf72 as the origin of these signals⁷²,⁷³. The pathologic repeat expansion has since been the most frequent genetic cause of ALS and ALS/FTD in European populations, accounting for approximately 40% of Dutch ALS cases with a positive family history, and approximately 6% for those without⁷⁴.

In this thesis Concurrent with the rapid developments within the field of genetics in general, the discoveries in ALS genetics have boomed since the dawn of the twenty-first century. Until the start of the research described in this thesis in 2013, more than 20 genes had been implicated in ALS. Together, the identified genetic risk loci explained approximately 70% of ALS cases with a positive family history (mainly driven by SOD1, TARDBP, FUS and C9orf72), but only 11% of sporadic ALS cases in populations of European ancestry75. Also, fundamental understanding of the genetic architecture of the disease, including explanations for the observed pleiotropy of ALS-genes (most markedly C9orf72), oligogenic inheritance patterns and phenotypic variability, was still lacking. By implementing various genetic methods described in this introduction, the goal of the work described in this thesis is therefore to further elucidate the genetic background of ALS, ultimately aiming to 1) help dissect the underlying disease pathophysiology, 2) guide more accurate risk prediction models based on genetic makeup and 3) provide additional genetic data to tailor personalized treatments. To this end, this thesis addresses the following questions:

17 Chapter 1 | General Introduction

• Do NIPA1 repeat expansions indeed confer an increased risk of ALS? • Can we add to the evidence of an oligogenic model of ALS? • Can we identify new genetic risk factors for ALS using various methods, including GWAS, WES and WGS? • What do these studies tell us about the genetic architecture of ALS?

18 REFERENCES 1

1. Brumbaugh, R. S. Plato’s Genetic Theory. Journal of Heredity 45, 191–196 (1954). 2. Wilberding, J. Plato’s Embryology. Early Sci Med 20, 150–168 (2015). 3. Goy, I. Was Aristotle the ‘father’ of the epigenesis doctrine? History and Philosophy of the Life Sciences 40, 1–16 (2018). 4. Paweletz, N. Walther Flemming: pioneer of mitosis research. Nat. Rev. Mol. Cell Biol. 2, 72–75 (2001). 5. Lander, E. S. Introduction to Biology. (2004). 6. Satzinger, H. Theodor and Marcella Boveri: chromosomes and in heredity and development. Nature reviews. Genetics 9, 231–238 (Nature Publishing Group, 2008). 7. Franklin, R. E. & Gosling, R. G. Molecular configuration in sodium thymonucleate. Nature 421, 400–1– discussion 396 (2003). 8. Wilkins, M. H. F., Stokes, A. R. & Wilson, H. R. Molecular structure of deoxypentose nucleic acids. 171, 738– 740 (1953). 9. Watson, J. D. & Crick, F. H. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. 171, 737–738 (1953). 10. Corless, S. & Gilbert, N. Effects of DNA supercoiling on chromatin architecture. Biophysical Reviews 8, 1–14 (2016). 11. Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat Rev Genet 19, 453–467 (2018). 12. Allis, C. D. & Jenuwein, T. The molecular hallmarks of epigenetic control. Nat Rev Genet 17, 487–500 (2016). 13. Miko, I., LeJeune, L.eds. Essentials of Genetics. (NPG Education, 2009). 14. The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. 526, 68–74 (2015). 15. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). 16. Payne, K., Gavan, S. P., Wright, S. J. & Thompson, A. J. Cost-effectiveness analyses of genetic and genomic diagnostic tests. Nat Rev Genet 19, 235– 246 (2018). 17. Ott, J., Wang, J. & Leal, S. M. Genetic linkage analysis in the age of whole- genome sequencing. Nat Rev Genet 16, 275–284 (2015). 18. Dawn Teare, M. & Barrett, J. H. Genetic linkage studies. The Lancet 366, 1036–1044 (2005). 19. Pulst, S. M. Genetic Linkage Analysis. Arch. Neurol. 56, 667–672 (1999). 20. Patnala, R., Clements, J. & Batra, J. Candidate gene association studies: a comprehensive guide to useful in silico tools. BMC Genetics 2013 14:1 14, 39 (2013). 21. Jorgensen, T. J. et al. Hypothesis-Driven Candidate Gene Association Studies: Practical Design and Analytical Considerations. American Journal of Epidemiology 170, 986–993 (2009). 22. Ioannidis, J. P., Ntzani, E. E., Trikalinos, T. A. & Contopoulos-Ioannidis, D. G. Replication validity of genetic association studies. Nat Genet 29, 306–309 (2001). 23. Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet Med 4, 45– 61 (2002). 24. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

19 Chapter 1 | General Introduction

25. Manolio, T. A. & Collins, F. S. The HapMap and Genome-Wide Association Studies in Diagnosis and Therapy. Annu. Rev. Med. 60, 443–456 (2009). 26. Policy for sharing of data obtained in NIH supported or conducted genome-wide association studies (GWAS). Federal Register 8/30/07. 2007. (2018). Available at: https://grants.nih.gov/grants/guide/ notice- files/NOT-OD-07-088.html#policy. (Accessed: 14 August 2018) 27. Klein, R. J. et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 381–385 (2005). 28. MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 45, D896–D901 (2017). 29. Sudlow, C. et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med 12, e1001779–10 (2015). 30. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46, 1173–1186 (2014). 31. Zillikens, M. C. et al. Large meta-analysis of genome-wide association studies identifies five loci for lean body mass. Nature Communications 8, 1–12 (2017). 32. Bansal, V. et al. Genome-wide association study results for educational attainment aid in identifying genetic heterogeneity of schizophrenia. Nature Communications 9, 1–12 (2018). 33. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Human Molecular Genetics 90, 7 (2018). 34. Palmer, L. J. & Cardon, L. R. Shaking the tree: mapping complex disease genes with linkage disequilibrium. The Lancet 366, 1223–1234 (2005). 35. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat Rev Genet 11, 499–511 (2010). 36. Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19 000 human protein- coding genes. Human Molecular Genetics 23, 5866–5878 (2014). 37. Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011). 38. Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. The American Journal of Human Genetics 101, 5–22 (2017). 39. Mardis, E. R. A decade’s perspective on DNA sequencing technology. Nature 470, 198–203 (2011). 40. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). 41. Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017). 42. Bell, J. Predicting disease using genomics. Nature 429, 453–456 (2004). 43. Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 135, 1–10 (2018). 44. Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 19, 110–124 (2017). 45. Burton, P. R., Tobin, M. D. & Hopper, J. L. Key concepts in genetic epidemiology. The Lancet 366, 941–951 (2005). 46. Rowland, L. P. How amyotrophic lateral sclerosis got its name: the clinical- pathologic genius of Jean- Martin Charcot. Archives of neurology 58, 512–515 (2001). 47. Katz, J. S., Dimachkie, M. M. & Barohn, R. J. Amyotrophic Lateral Sclerosis: A Historical Perspective. Neurol Clin 33, 727–734 (2015).

20 48. van Es MD, M. A. et al. Amyotrophic lateral sclerosis. The Lancet 390, 2084–2098 (2017). 49. Brooks, B. R. El Escorial World Federation of Neurology criteria for the diagnosis of amyotrophic lateral 1 sclerosis. Subcommittee on Motor Neuron Diseases/Amyotrophic Lateral Sclerosis of the World Federation of Neurology Research Group on Neuromuscular Diseases and the El Escorial ‘Clinical limits of amyotrophic lateral sclerosis’ workshop contributors. Journal of the Neurological Sciences 124 Suppl, 96–107 (1994). 50. Brooks, B. R., Miller, R. G., Swash, M. & Munsat, T. L. El Escorial revisited: Revised criteria for the diagnosis of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Other Motor Neuron Disord. 1, 293–299 (2009). 51. Carvalho, M. D. & Swash, M. Awaji diagnostic algorithm increases sensitivity of El Escorial criteria for ALS diagnosis. Amyotroph Lateral Scler 10, 53–57 (2009). 52. Huisman, M. H. B. et al. Population based epidemiology of amyotrophic lateral sclerosis using capture- recapture methodology. Journal of Neurology, Neurosurgery & Psychiatry 82, 1165–1170 (2011). 53. Rooney, J. et al. Survival Analysis of Irish Amyotrophic Lateral Sclerosis Patients Diagnosed from 1995– 2010. PLoS ONE 8, e74733–10 (2013). 54. Logroscino, G. et al. Incidence of amyotrophic lateral sclerosis in Europe. Journal of Neurology, Neurosurgery & Psychiatry 81, 385–390 (2010). 55. Johnston, C. A. et al. Amyotrophic lateral sclerosis in an urban setting. J Neurol 253, 1642–1643 (2006). 56. Miller RG, M. J. M. D. Riluzole for amyotrophic lateral sclerosis (ALS)/motor neurondisease (MND). The Cochrane Library 1–36 (2012). 57. Abe, K. et al. Safety and efficacy of edaravone in well defined patients with amyotrophic lateral sclerosis: a randomised, double-blind, placebo- controlled trial. The Lancet Neurology 16, 505–512 (2017). 58. Hardiman, O. et al. Amyotrophic lateral sclerosis. Nat. Rev. Dis. Primers 3, 17071–19 (2017). 59. Al-Chalabi, A. & Hardiman, O. The epidemiology of ALS: a conspiracy of genes, environment and time. Nature Reviews Neurology 9, 617–628 (2013). 60. Rosen, D. R. et al. Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. 362, 59–62 (1993). 61. Al-Chalabi, A. et al. An estimate of amyotrophic lateral sclerosis heritability using twin data. Journal of Neurology, Neurosurgery & Psychiatry 81, 1324–1326 (2010). 62. Byrne, S., Elamin, M., Bede, P. & Hardiman, O. Absence of consensus in diagnostic criteria for familial neurodegenerative diseases. Journal of Neurology, Neurosurgery & Psychiatry 83, 365–367 (2012). 63. Finsterer, J. & Burgunder, J.-M. Recent progress in the genetics of motor neuron disease. European Journal of Medical Genetics 57, 103–112 (2014). 64. Al-Chalabi, A., van den Berg MD, P. L. H. & Veldink, J. Gene discovery in amyotrophic lateral sclerosis: implications for clinical management. Nature Reviews Neurology 13, 96–104 (2017). 65. Al-Chalabi, A. et al. The genetics and neuropathology of amyotrophic l ateral sclerosis. Acta Neuropathol 124, 339–352 (2012). 66. Sreedharan, J. et al. TDP-43 mutations in familial and sporadic amyotrophic lateral sclerosis. Science 319, 1668–1672 (2008). 67. Kwiatkowski, T. J. et al. Mutations in the FUS/TLS gene on chromosome 16 cause familial amyotrophic lateral sclerosis. Science 323, 1205–1208 (2009). 68. Neumann, M. et al. Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Science 314, 130–133 (2006). 69. van Es, M. A. et al. Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat Genet 41, 1083–1087 (2009).

21 Chapter 1 | General Introduction

70. Shatunov, A. et al. Chromosome 9p21 in sporadic amyotrophic lateral sclerosis in the UK and seven other countries: a genome-wide association study. The Lancet Neurology 9, 986–994 (2010). 71. Van Deerlin, V. M. et al. Common variants at 7p21 are associated with frontotemporal lobar degeneration with TDP-43 inclusions. Nat Genet 42, 234–239 (2010). 72. Renton, A. E. et al. A Hexanucleotide Repeat Expansion in C9orf72 Is the Cause of Chromosome 9p21- Linked ALS-FTD. Neuron 72, 257–268 (2011). 73. DeJesus-Hernandez, M. et al. Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9orf72 Causes Chromosome 9p-Linked FTD and ALS. Neuron 72, 245–256 (2011). 74. van Rheenen, W. et al. Hexanucleotide repeat expansions in C9orf72 in the spectrum of motor neuron diseases. Neurology 79, 878–882 (2012). 75. Renton, A. E., Chiò, A. & Traynor, B. J. State of play in amyotrophic lateral sclerosis genetics. Nat. Neurosci. 17, 17–23 (2014).

22 1

23 24 Chapter 2 2

Association of NIPA1 repeat expansions with amyotrophic lateral sclerosis in a large international cohort.

Gijs HP Tazelaar*, Annelot M Dekker*, Joke JFA van Vugt, Rick A van der Spek, Henk-Jan Westeneng, Lindy JBG Kool, Kevin P Kenna, Wouter van Rheenen, Sara L Pulit, Russell L McLaughlin, William Sproviero, Alfredo Iacoangeli, Annemarie Hübers, David Brenner, Karen E Morrison, Pamela J Shaw7, Christopher E Shaw, Monica Povedano Panadés, Jesus S Mora Pardina, Jonathan D Glass, Orla Hardiman, Ammar Al-Chalabi, Philip van Damme, Wim Robberecht, John E Landers, Albert C Ludolph, Jochen H Weishaupt, Leonard H van den Berg, Jan H Veldink, Michael A van Es on behalf of the Project MinE ALS Sequencing Consortium

Neurobiology of Aging 74, 234.e9–234.e15 (2019)

* These authors contributed equally to this manuscript.

Members and affiliations of the Project MinE ALS Sequencing Consortium are listed in Supplementary Information.

25 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

ABSTRACT

NIPA1 (non-imprinted in Prader-Willi/Angelman syndrome 1) mutations are known to cause Hereditary Spastic Paraplegia type 6, a neurodegenerative disease that phenotypically overlaps to some extent with Amyotrophic Lateral Sclerosis. Previously, a genome-wide screen for copy number variants found an association with rare deletions in NIPA1 and ALS, and subsequent genetic analyses revealed that long (or expanded) polyalanine repeats in NIPA1 convey increased ALS susceptibility. We set out to perform a large-scale replication study to further investigate the role of NIPA1 polyalanine expansions with ALS, in which we characterized NIPA1 repeat size in an independent international cohort of 3,955 ALS patients and 2,276 unaffected controls and combined our results with previous reports. Meta-analysis on a total of 6,245 ALS patients and 5,051 controls showed an overall increased risk of ALS in those with expanded (>8) GCG-repeat length (odds ratio = 1.50, P = 3.8x10-⁵). Together with previous reports, these findings provide evidence for an association of an expanded polyalanine repeat in NIPA1 and ALS.

26 INTRODUCTION

Amyotrophic lateral sclerosis (ALS) is a rapidly progressive neurodegenerative 2 disorder characterized by the loss of both upper and lower motor neurons leading to progressive weakness, spasticity and ultimately respiratory failure¹,². The complex genetic architecture of ALS is characterized by 5-15% of patients with a positive family history, where it is assumed that there is a single causal mutation³. However, even in the majority of seemingly sporadic patients a large genetic contribution is expected and causal mutations have been reported despite a negative family history⁴,⁵. To date, mutations in more than 20 different genes have been implicated in ALS, one of the most prominent being an intronic repeat expansion in C9orf724.

In addition to C9orf72, repeat expansions in other genes have been reported in ALS, including ATXN2 and NIPA1⁶,⁷. NIPA1 (non-imprinted in Prader-Willi/Angelman syndrome 1) mutations are known to cause hereditary spastic paraplegia (HSP) type 6, a neurodegenerative disease characterized by slowly progressive upper motor neuron signs (predominantly in the lower limbs) and is a condition that to some extent has phenotypic overlap with ALS⁸. Interestingly, a genome-wide screen for copy number variants found an association with rare deletions in NIPA1 and ALS and subsequent genetic analyses revealed that long (or expanded) polyalanine repeats in NIPA1 confer increased disease susceptibility⁷,⁹. In the majority of people (98%) the 5’-end of NIPA1 (NCBI: NM_144599.4) encodes for a stretch of 12 or 13 residues of which 7 or 8 are encoded by a (GCG)n trinucleotide repeat (TNR), although both shorter and longer GCG stretches have been reported in non-affected individuals¹⁰. In this previous study, an analysis of an international cohort of 2,292 ALS patients and 2,777 controls showed that 'long' repeats (>8) in NIPA1 were enriched in ALS cases compared to controls (5.5% vs. 3.6%; OR 1.71; P = 1.6 x 10-⁴)⁷.

Although interesting and potentially relevant, only a small fraction of initially positive results from candidate gene studies (such as that performed on NIPA1) replicated consistently¹¹. Therefore, additional steps, such as replication of the findings and imposing a proper significance threshold (such as exome or genome-wide significance), are required to make any claims of causality¹².

We therefore set out to perform a large-scale replication study to further investigate the role of NIPA1 polyalanine expansions with ALS, in which we characterized NIPA1 repeat size in a large international cohort of ALS patients and unaffected controls and then meta-analyze our results with previous reports.

27 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

MATERIAL AND METHODS

Subjects All participants gave written informed consent and approval was obtained from the local, relevant ethical committees for medical research. Genotyping experiments were performed on 6,231 samples comprising 3,955 ALS patients and 2,276 healthy controls from 6 populations. All patients were diagnosed according to the revised El Escorial criteria. Control subjects were from ongoing population-based studies on risk factors in ALS. All related individuals were excluded from further analysis. Baseline characteristics for available samples are provided in Supplementary Table 1.

PCR, sequencing and genotyping Dutch samples obtained from 753 ALS and 603 unaffected individuals were analyzed using PCR according to protocols described previously and results were analyzed in a blinded and automated fashion with a call rate of 96.6% (Blauw, et al., 2012). Samples that failed genotyping, were additionally analyzed with Sanger sequencing to assess possible bias. An additional cohort of 767 unaffected controls and 764 ALS samples were genotyped using Sanger sequencing and automatically genotyped with a call rate of 99.1%. Primers: 5’-GCCCCTCTTCCTGCTCCT-3’ (forward) and 5’-CGATGCCCTTCTTCTGTAGC -3’ (reverse). A total of 847 samples were analyzed using both methods (PCR and Sanger), with manual review of discordant genotypes (n = 35, 4.1%).

We analyzed NIPA1 repeat size in whole-genome sequencing (WGS) data of 3,344 samples (2,438 cases and 906 controls) from the HiSeq X Sequencing platform, available to us through Project MinE¹³, using the Illumina ExpansionHunter tool¹⁴. There was a 691 sample overlap genotyped using both ExpansionHunter and Sanger sequencing, showing a 99% concordance (n = 684). Considering this 99% concordance between ExpansionHunter and Sanger results in the Dutch dataset, we did not perform additional validation experiments on the WGS samples and proceeded with the ExpansionHunter calls. C9orf72 status had been determined for 3,907 ALS samples from the PCR, Sanger and ExpansionHunter cohorts. Additionally, the presence of rare non-synonymous and loss-of-function variants in the established ALS-associated genes SOD1, FUS and TARDBP was known for 5,030 cases and controls from all cohorts as described previously¹³,¹⁵.

28 Statistical analysis All statistical procedures were carried out in R 3.3.0 (http://www.r-project.org). For association analyses we applied a logistic regression analysis to all subgroups, the 2 effect of the expanded (>8) versus non-expanded polyalanine repeat length on the disease status, adjusting for sex at birth, method of genotyping and country of origin. Samples with missing sex at birth status (n = 108, 1%) were imputed using multivariate multiple imputation with the ‘mice’ 2.46.0 package.

Subgroup effects were meta-analyzed using both fixed and random effects modelling using the ‘metafor’ 2.0 package. For the joint analysis on individual data, we used a generalized linear model (GLM) with fixed-effects covariates: sex, method of genotyping and country of origin. We additionally applied generalized linear mixed model (GLMM) on non-imputed data to account for possible random effects.

The survival after onset and age at onset analyses were performed using multivariate Cox regression with sex at birth, site of onset, age at onset (for survival only) and C9orf72 status as covariates.

To assess whether the observed frequency of co-occurring genetic risk variants for ALS was in excess of what would be expected on the basis of chance, we used a method described previously by Dekker et al. (2016)15. The expected frequency of co-occurring variants was calculated using the following formula: (the observed number of patients carrying a variant / the total number of patients) * (the observed number of controls carrying a variant / the total number of controls). This formula was used in order to take into account the higher frequency of just one variant in ALS patients (= frequency of variants in patients), multiplied by the chance probability of a second variant (= frequency of variants in controls). Then, a binomial test was performed to compare the observed frequency of co-occurring variants in ALS patients with the calculated expected frequency.

We specified a formal null model for an increase in repeat expansion with consideration of repeat confounding variables such as the genomic frequency and repeat size. Previous studies have shown that there are a total of 878 genes in the genome that contain a coding trinucleotide repeat (TNR) with a repeat size of 6 repeats or greater, 90 of which contain a polyalanine tract¹⁶. We therefore set two thresholds for significance in this study; 1) a relatively loose threshold, in which we correct for the number of genes that contain a polyalanine tract of 6 or larger resulting in P = 0.05/90 = 5.6x10- ⁴ and 2) a more conservative threshold, in which we correct for the total number of

29 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

genes in the genome that contain a coding TNR with a size of 6 or larger which gives P = 0.05/878 = 5.7x10-⁵.

RESULTS

Replication We first tried to replicate the initial findings in an independent Dutch cohort comprising 1,517 ALS cases and 1,370 unaffected controls by genotyping the GCG repeat length in NIPA1 using repeat PCR and/or Sanger sequencing. As was reported previously, we found the most frequent alleles to consist of either 7 or 8 (GCG)n repeats (25% and 72% respectively) (Figure 1). Our analysis showed a similar allele frequency difference of expanded or 'long' alleles (repeat length of 9 or longer) between ALS (n = 85, 2.80%) and controls (n = 51, 1.86%). Both ALS and control subgroups had only one single case with a homozygous expansion, indicating a dominant model for further analysis. This resulted in 84 individuals with ALS (5.54%) and 50 unaffected individuals (3.65%) as carriers of an expanded NIPA1 polyalanine repeat length. Logistic regression analysis, corrected for sex at birth and method of genotyping (PCR or Sanger), revealed an effect of expanded NIPA1 repeat length on disease susceptibility (OR = 1.54, P = 0.018).

Project MinE To further increase sample size and investigate cohorts other than the Dutch population, we then analyzed NIPA1 repeat expansion genotypes that were called using the Illumina ExpansionHunter tool in 2,438 independent ALS cases and 906 controls whole-genome sequenced as part of the Project MinE ALS Sequencing Consortium¹³. This multi-cohort WGS data showed a more equal distribution of expanded NIPA1 carriers in ALS (114/2,438, 4.67%) and controls (40/906, 4.42%). A logistic regression analysis, corrected for country of origin and sex, showed no significant difference.

Meta-analysis Finally, we sought to perform an analysis of all available NIPA1 polyalanine expansion data, combining our data with the original data published previously⁷. After exclusion of duplicate samples; individual level data was available for a total of 5,056 samples (2,290 cases and 2,775 controls) in the discovery dataset published by Blauw et al. (2012). Our replication cohort (including results from PCR, Sanger and Expansion Hunter) comprised 3,955 cases and 2,276 controls. The final dataset included 6,245 ALS patients and 5,051 controls, reaching a final number of 11,296 unique individuals. We combined this data in a fixed-effects meta-analysis and found an overall risk of expanded NIPA1

30 0.8

2 0.6 0.4 0.2 0.0 2 3 4 5 6 7 8 9 10 11

Figure 1. NIPA1 polyalanine repeat length distribution. Proportion of total alleles grouped per NIPA1 polyalanine repeat size. Alleles displayed were observed multiple times in the Dutch replication cohort of 1517 individuals affected with ALS (blue) and 1370 unaffected controls (orange). repeat length on ALS (odds ratio (OR) = 1.50, P = 3.8x10-⁵) (Figure 2). Since individual level data was available, we additionally performed a multivariate logistic regression analysis, using sex at birth, method of genotyping and country of origin as covariates in the pooled data, resulting in an equal effect and significance (OR = 1.48, P = 6.2x10-⁵). Other association models that account for random effects, such as random effect meta- analysis and a generalised linear mixed model gave similar results (data not shown). Repeating the analysis excluding the 322 C9orf72 repeat expansion carriers yielded a P value of 7.7x10-⁵ for the fixed-effects meta-analysis (OR = 1.49, 95% confidence interval (CI) = 1.22-1.81) and a P value of 1.0x10-⁴ for the multivariate logistic regression analysis (OR = 1.47, 95% CI = 1.21-1.78). Exclusion of an additional 171 samples (133 cases and

31 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

38 controls) carrying a non-synonymous or loss-of-function mutation in SOD1, FUS or TARDBP did not alter the results (fixed-effects meta-analysis P value = 7.5 x10-5, OR = 1.49, 95% CI = 1.22-1.81) (Supplementary Figure 1).

Survival Clinical data and survival data was available for 1,954 out of 3955 ALS patients from the combined replication cohorts (Supplementary Table 2). After correction for sex, age at onset, bulbar site of onset and C9orf72 status, we used a Cox regression model in this mixed population to test if NIPA1 conferred any risk for shorter survival time; we found no evidence for such an effect (Hazard ratio (HR) = 1.16; 95% CI = 0.94-1.45;P = 0.16) (Supplementary Figure 2). Also, there was no significant association between NIPA1 repeat length and age at onset in this replication cohort with correction for sex, site of onset and the presence of a C9orf72 expansion (Supplementary Figure 3).

Co-occurrence with C9orf72 repeat expansion Since a significant number of NIPA1 expansion carriers was reported in a subgroup of ALS patients that also carried a C9orf72 repeat expansion¹⁵, we evaluated this co-occurrence in 4,619 participants genotyped for both loci in all cohorts (n = 712 for the discovery cohort; n = 3,907 for the combined replication cohorts).

Although we did observe a higher than expected frequency of co-occurrence of the repeat expansions, our data did not robustly replicate the previously published finding (0.37% observed vs 0.26% expected; P = 0.06) (Supplementary Table 3).

DISCUSSION

In this study, we included a large international cohort and additionally meta-analyzed the NIPA1 expansion genotypes in a total of 6,245 ALS patients and 5,051 controls. Given that we were able to replicate our previous results in an independent cohort and observed an increase in significance in the overall meta-analysis, our data adds to the evidence that expanded NIPA1 repeats are a risk factor for sporadic ALS. Mutations in NIPA1 were already known to cause hereditary spastic paraplegia type 6, a neurodegenerative disease with motor-neuron involvement, whereas the 15q11.2 microdeletions are better known for low penetrant neurodevelopmental phenotypes, further adding to the complexity of the NIPA1 locus⁸,¹⁷. Interestingly, genetic pleiotropy between HSP and ALS appears to be more widespread, as recently it has been shown that mutations in different domains in KIF5A either cause HSP or ALS¹⁸,¹⁹.

32 Case Control

Cohort Exp Non−Exp Exp Non−Exp Odds Ratio [95% CI] 2 Discovery (PCR)

Blauw, et al. 124 2166 99 2676 1.65 [1.25, 2.18]

Replication (PCR/Sanger)

Netherlands 84 1433 50 1320 1.54 [1.08, 2.21]

Replication (WGS)

Belgium 13 325 9 167 0.74 [0.31, 1.77]

Ireland 10 258 4 132 1.29 [0.40, 4.23]

Netherlands 5 58 12 140 1.02 [0.34, 3.04]

Spain 11 210 4 97 1.43 [0.44, 4.68]

UK 52 1060 9 265 1.43 [0.69, 2.98]

USA 23 413 2 65 1.74 [0.40, 7.66]

2 WGS (Q = 1.82, df = 5, p = 0.87; I = 0.0 %) 1.18 [0.78, 1.78]

2 Meta analysis (Q = 3.62, df = 7, p = 0.82; I = 0.0 %) P = 3.833e-5 1.50 [1.24, 1.82]

Joint analysis P = 6.153e-5 1.48 [1.22, 1.79]

0.2 0.5 1 2 5 Odds Ratio

Figure 2. NIPA1 polyalanine repeat expansion meta-analysis. Forest plot for the fixed-effect meta-analysis and joint analysis on individual level data of the effect of expanded NIPA1 polyalanine (>8 GCG repeats) on ALS risk with the initial discovery reports (Blauw, et al., 2012) and current replication using PCR, Sanger or whole genome sequencing (WGS) grouped per cohort/country of origin. Weights depending on number of participants. CI, confidence interval.

33 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

After C9orf72 and ATXN2, NIPA1 is the third reported expanded genomic repeat motif associated with an increased risk for ALS. Its initial discovery in ALS by identification of copy number variants in the chromosome 15q11.2 locus containing NIPA1, was followed by further genetic screening in a large international cohort consisting of Belgian, Dutch, and German subjects⁷,⁹. This subsequent study in 2,292 ALS patients and 2,777 controls revealed that, although NIPA1 deletions and missense mutations were identified in ALS patients, it actually was an increase of the (GCG)n repeat motif in the 5’-end of NIPA1 that seemed to associate with ALS (OR = 1.71 with P = 1.6x10-⁴). Knowing that positive results derived from candidate gene studies often fail to replicate, we sought to replicate the NIPA1 finding in ALS, particularly given the complex genotypic and phenotypic architecture of the NIPA1 locus¹¹.

Our results showed a very similar effect of increased NIPA1 polyalanine expansions on ALS- susceptibility in a new Dutch cohort of 1,517 ALS cases and 1,370 unaffected controls tested via PCR or Sanger sequencing. Given the high concordance between Sanger/PCR results and the calls from the bioinformatic tool ExpansionHunter on WGS data, we were able to further increase the sample size of our study by including data from Project MinE¹³,¹⁴. This allowed us to additionally evaluate the role of NIPA1 repeat sizes in non-Dutch cohorts. The size of this cohort was similar in the number of cases compared to the original discovery cohort, but smaller in number of controls compared to that original cohort. Also, the Project MinE dataset is more heterogeneous compared to the original discovery cohort. This is a possible explanation as to why the overall NIPA1 signal was not replicated in the WGS data. However, we did find a similar direction and effect size in 4 out of the 6 WGS cohorts (Ireland, Spain, the United States of America and the United Kingdom).

While empirical thresholds for genome-wide and exome-wide significance have been derived for studies assessing associations between phenotypes and single nucleotide variants, these thresholds are likely to be too stringent in the context of screening for coding repeat expansions, as the genome contains only ~900 genes with a coding TNR tract with a length of 6 or more, 90 of which code for a polyalanine tract16. We therefore set the significance threshold for associations with TNRs to be approximately P = 5.6x10-⁴, correcting for polyalanine only, or (more conservative) P = 5.7x10-⁵, correcting for all TNRs with a length of 6 repeats or more. The meta-analysis results are significant regardless of the threshold applied. Furthermore, exclusion of samples carrying a mutation in established ALS genes (C9orf72, SOD1, TARDBP and FUS) yielded somewhat lower P values (due to loss of power corresponding to lower number of included samples) with similar magnitude of effect, further supporting the role of NIPA1 as independent risk factor for developing ALS.

34 Although we did see a higher than expected number of ALS cases carrying both NIPA1 and C9orf72 repeat expansions in this study (n = 17, P = 0.06), we did not robustly reproduce the co-occurrence of C9orf72 expansion carriers in the NIPA1 expanded 2 cases described by Dekker et al. (2016). This might be attributed to the relatively small sample size in the original study (755 ALS patients), resulting in broad confidence intervals that overlap with our results (frequency = 0.004 [0.002-0.006] in the current study; frequency = 0.009 [0.004-0.019] in Dekker et al. (2016)15). Alternatively, the co- occurrence might be relevant in some, but not all included populations. Additionally, we were unable to replicate the effect of NIPA1 expansions on ALS survival and age at onset7. These findings again reemphasize the necessity for replication and the importance of tracking clinical characteristics in large genetic databases. Currently, we were able to perform a survival analysis on just 50% of our replication set and further evaluation in a larger and complete dataset is therefore recommended.

Interestingly, the increase in the NIPA1 repeat size seems to be limited to the addition of mostly two GCG repeats. However, this seemingly small addition might well have protein conformational effects as has been shown in-vitro; polyalanine stretches between 7 to 15 transition from a monomeric to a predominant macromolecular beta sheet, which in turn may lead to stronger protein–protein interactions and aggregation²⁰. In addition, a patient with a mutation in NIPA1 suffering from a progressive motor neuron phenotype was shown to have TDP-43 inclusions, very similar to effects seen in ALS and ALS-FTD cases²¹. These findings might explain how alterations in NIPA1 could increase ALS risk.

In conclusion, our data adds to the evidence for an association of NIPA1 expansions and ALS. Future investigations may provide further insights in the role of NIPA1 and polyalanine stretches in the development and possibly treatment of motor neuron disease.

35 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

REFERENCES

1. Hardiman, O., van den Berg MD, P. L. H. & Kiernan, M. C. Clinical diagnosis and management of amyotrophic lateral sclerosis. Nature Reviews Neurology 7, 639–649 (2011). 2. van Es, M. A. et al. Amyotrophic lateral sclerosis. The Lancet 0, (2017). 3. Andersen, P. M. & Al-Chalabi, A. Clinical genetics of amyotrophic lateral sclerosis: what do we really know? Nature Reviews Neurology 7, 603–615 (2011). 4. Al-Chalabi, A., van den Berg MD, P. L. H. & Veldink, J. Gene discovery in a myotrophic lateral sclerosis: implications for clinical management. Nature Reviews Neurology 13, 96–104 (2017). 5. McLaughlin, R. L., Vajda, A. & Hardiman, O. Heritability of Amyotrophic Lateral Sclerosis. JAMA Neurol 72, 857–2 (2015). 6. Elden, A. C. et al. Ataxin-2 intermediate-length polyglutamine expansions are associated with increased risk for ALS. 466, 1069–1075 (2010). 7. Blauw, H. M. et al. NIPA1 polyalanine repeat expansions are associated with amyotrophic lateral sclerosis. Human Molecular Genetics 21, 2497–2502 (2012). 8. Rainier, S., Chai, J.-H., Tokarz, D., Nicholls, R. D. & Fink, J. K. NIPA1 Gene Mutations Cause Autosomal Dominant Hereditary Spastic Paraplegia (SPG6). The American Journal of Human Genetics 73, 967–971 (2003). 9. Blauw, H. M. et al. A large genome scan for rare CNVs in amyotrophic lateral sclerosis. Human Molecular Genetics 19, 4091–4099 (2010). 10. Chai, J.-H. et al. Identification of four highly conserved genes between breakpoint hotspots BP1 and BP2 of the Prader-Willi/Angelman syndromes deletion region that have undergone evolutionary transposition mediated by flanking duplicons. The American Journal of Human Genetics 73, 898–925 (2003). 11. Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet Med 4, 45–61 (2002). 12. MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. 508, 469–476 (2014). 13. Project MinE ALS Sequencing Consortium. Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis. Eur. J. Hum. Genet. 7, 1–10 (2018). 14. Dolzhenko, E. et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Research 27, 1895–1903 (2017). 15. Dekker, A. M. et al. Large-scale screening in sporadic amyotrophic lateral sclerosis identifies genetic modifiers in C9orf72 repeat carriers. Neurobiology of Aging 1–7 (2016). doi:10.1016/j. neurobiolaging.2015.12.012 16. Kozlowski, P., de Mezer, M. & Krzyzosiak, W. J. Trinucleotide repeats in human genome and exome. Nucleic Acids Res 38, 4027–4039 (2010). 17. Butler, M. G. Clinical and genetic aspects of the 15q11.2 BP1-BP2 microdeletion disorder. Journal of Intellectual Disability Research 61, 568–579 (2017). 18. Brenner, D. et al. Hot-spot KIF5A mutations cause familial ALS. Brain 141, 688–697 (2018). 19. Nicolas, A. et al. Genome-wide Analyses Identify KIF5A as a Novel ALS Gene. Neuron 97, 1268–1282.e6 (2018). 20. Shinchuk, L. M. et al. Poly-(L-alanine) expansions form core β-sheets that nucleate amyloid assembly. Proteins 61, 579–589 (2005). 21. Martinez-Lage, M. et al. TDP-43 pathology in a case of hereditary spastic paraplegia with a NIPA1/SPG6 mutation. Acta Neuropathol 124, 285–291 (2012).

36 SUPPLEMENT

Supplementary List: Members of Project MinE ALS Sequencing Consortium* 2 Fulya Akçimen¹, Ahmad Al Khleifat², Ammar Al-Chalabi²,³, Peter Andersen⁴, A. Nazli Basak¹, Denis C Bauer⁵, Ian Blair⁶, William J Brands⁷, Ross P Byrne⁸, Andrea Calvo⁹, Yolanda Campos Gonzalez¹⁰, Adriano Chio⁹, Jonothan Cooper-Knock¹¹, Philippe Corcia¹², Philippe Couratier¹³, Mamede de Carvalho¹⁴,¹⁵, Annelot M Dekker⁷, Vivian E Drory¹⁶, Chen Eitan¹⁷, Alberto Garcia Redondo¹⁸, Cinzia Gellera¹⁹, Jonathan D Glass²⁰,²¹, Marc Gotkine²², Orla Hardiman²³,²⁴, Eran Hornstein¹⁷, Alfredo Iacoangeli²⁵, Kevin P Kenna⁷, Brendan Kenna⁷, Matthew C Kiernan²⁶,²⁷, Cemile Kocoglu¹, Maarten Kooyman²⁸, John E Landers²⁹, Victoria López Alonso³⁰, Russell L McLaughlin⁸, Bas Middelkoop⁷, Jonathan Mill³¹, Miguel Mitne-Neto³², Matthieu Moisse³³,³⁴, Jesus S Mora Pardina³⁵, Karen E Morrison³⁶, Susana C Pinto¹⁴,¹⁵, Monica Povedano Panadés³⁷,³⁸, Sara L Pulit⁷, Antonia Ratti³⁹,⁴⁰, Wim Robberecht³³,³⁴,⁴¹, Raymond D Schellevis⁷, Aleksey Shatunov², Christopher E Shaw¹¹, Pamela J Shaw¹¹, Vincenzo Silani³⁹,⁴⁰, William Sproviero², Christine Staiger²⁸, Gijs HP Tazelaar⁷, Nicola Ticozzi³⁹,⁴⁰, Ceren Tunca¹, Nathalie A Twine⁵, Philip van Damme³³,³⁴,⁴¹, Leonard H van den Berg⁷, Rick A van der Spek⁷, Perry TC van Doormaal⁷, Kristel R van Eijk⁷, Michael A van Es⁷, Wouter van Rheenen⁷, Joke JFA van Vugt⁷, Jan H Veldink⁷, Peter M. Visscher⁴²,⁴³, Patrick Vourc’h⁴⁴, Markus Weber⁴⁵, Kelly L Williams⁶, Naomi Wray⁴², Jian Yang⁴², Mayana Zatz³², Katharine Zhang⁶

1. Bogazici University, Suna and Inan Kirac Foundation, NDAL, Istanbul, Turkey. 2. Maurice Wohl Clinical Neuroscience Institute and United Kingdom Dementia Research Institute, Department of Basic and Clinical Neuroscience, King’s College London, London, UK. 3. Department of Neurology, King’s College Hospital, London, UK. 4. Department of Pharmacology and Clinical Neuroscience, Umea University, Umea, Sweden. 5. Commonwealth Scientific and Industrial Research Organization, Sydney, New South Wales, Australia. 6. Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Macquarie University, Sydney, New South Wales, Australia. 7. Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, The Netherlands. 8. Population Genetics Laboratory, Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Republic of Ireland. 9. ‘Rita Levi Montalcini’ Department of Neuroscience, ALS Centre, University of Torino, Turin, Italy. 10. Mitochondrial Pathology​ Unit, Instituto de Salud Carlos III, Madrid, Spain. 11. Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK. 12. Centre SLA, CHRU de Tours, Tours, France. 13. Federation des Centres SLA Tours and Limoges, LITORALS, Tours, France

37 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

14. Institute of Physiology, Institute of Molecular Medicine, Faculty of Medicine, University of Lisbon, Lisbon, Portugal. 15. Department of Neurosciences, Hospital de Santa Maria-CHLN, Lisbon, Portugal. 16. Department of Neurology Tel-Aviv Sourasky Medical Centre, Israel. 17. Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel. 18. Hospital Carlos III, Madrid, Spain. 19. Unit of Genetics of Neurodegenerative and Metabolic Diseases, Fondazione IRCCS Istituto Neurologico ‘Carlo Besta’, Milan, Italy. 20. Department of Neurology, Emory University School of Medicine, Atlanta, Georgia, USA. 21. Emory ALS Center, Emory University School of Medicine, Atlanta, Georgia, USA. 22. Department of Neurology, The Agnes Ginges Center for Human Neurogenetics, Hadassah-Hebrew University Medical Center, Jerusalem, Israel. 23. Academic Unit of Neurology, Trinity College Dublin, Trinity Biomedical Sciences Institute, Dublin, Republic of Ireland. 24. Department of Neurology, Beaumont Hospital, Dublin, Republic of Ireland. 25. Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK. 26. Brain and Mind Centre, Sydney Medical School, The University of Sydney, Sydney, Australia. 27. Memory and Cognition Clinic, Institute of Clinical Neurosciences, Royal Prince Alfred Hospital, Sydney, Australia. 28. SURFsara, Amsterdam, the Netherlands. 29. Department of Neurology, University of Massachusetts Medical School, Worcester, Massachusetts, USA. 30. Computational Biology Unit, Instituto de Salud Carlos III, Madrid, Spain. 31. University of Exeter Medical School, Exeter University, St Luke’s Campus, Exeter, UK. 32. Universidade de São Paulo, Brazil. 33. KU Leuven - University of Leuven, Department of Neurosciences, Experimental Neurology and Leuven. Research Institute for Neuroscience and Disease (LIND), B-3000 Leuven, Belgium. 34. VIB, Vesalius Research Center, Laboratory of Neurobiology, Leuven, Belgium. 35. ALS Unit, Hospital San Rafael, Madrid, Spain. 36. Faculty of Medicine, University of Southampton, Southampton, UK. 37. Biomedical Network Research Center on Neurodegenerative Diseases (CIBERNED), Institute Carlos III, Hospitalet de Llobregat, Spain. 38. Functional Unit of Amyotrophic Lateral Sclerosis (UFELA), Service of Neurology, Bellvitge University. Hospital, Hospitalet de Llobregat, Spain. 39. Department of Neurology and Laboratory of Neuroscience, IRCCS Istituto Auxologico Italiano, Milan, Italy. 40. Department of Pathophysiology and Transplantation, ‘Dino Ferrari’ Center, Università degli Studi di Milano, Milan, Italy. 41. University Hospitals Leuven, Department of Neurology, Leuven, Belgium. 42. Queensland Brain Institute, The University of Queensland, Brisbane, Queensland, Australia. 43. Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, Queensland, Australia.

38 44. Département de Biochimie, CHU Bretonneau, Tours, France. 45. Neuromuscular Diseases Unit/ALS Clinic, Kantonsspital St Gallen, 9007, St Gallen, Switzerland. 2

* Authors are listed in alphabetical order

39 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

Supplementary Table 1. Cohort characteristics

ALS Control Discovery cohort (Blauw et al., 2012)1 Country Netherlands 924 1729 Belgium 360 414 Germany 1006 632 Sex, male (%) 1,339 (58.5) 1,474 (53,1) C9orf72 status available (%) 712 (31.1) NA Replication cohort: PCR 753 603 Country Netherlands 753 603 Sex, male (%) 433 (57.5) 347 (57.5) C9orf72 status available (%) 753 (100) NA Clinical data available (%)2 748 (99.3) NA Replication cohort: Sanger sequencing 764 767 Country Netherlands 764 767 Sex, male (%) 478 (62.6) 469 (61.1) C9orf72 status available (%) 759 (99.3) NA Clinical data available (%)2 688 (90.1) NA Replication cohort: ExpansionHunter 2,438 906 Country Belgium 338 176 Ireland 268 136 Netherlands 63 152 Spain 221 101 United Kingdom 1,112 274 United States of America 436 67 Sex, male (%) 1511 (62.0) 451 (49,8) C9orf72 status available (%) 2,395 (98.2) NA Clinical data available (%)2 518 (21.1) NA Total replication cohorts: 3,955 2,276 PCR + Sanger + Expansion Huntert Total discovery + replication 6,245 5,051 1 Data previously published in Blauw et al. (2012); duplicate samples were removed. 2 Clinical data provided in Supplementary table 2.

40 Supplementary Table 2. Clinical characteristics replication cohort

ALS patients with clinical data 1,954 Testing method 2 PCR (%) 748 (38.3) Sanger (%) 688 (35.2) Expansion Hunter (%) 518 (26.5) Country Ireland (%) 107 (5.5) Netherlands (%) 1,499 (76.7) United Kingdom (%) 348 (17.8) Sex, male (%) 1,173 (60.0) Mean age at onset, years (SD) 62.9 ± 10.9 Site of onset, bulbar (%) 669 (34.2) C9orf72 expansion (%) 145 (7.4) Median survival after onset, months (IQR) 32.0 [21.0-50.7]

Supplementary Table 3. Co-occurrence of NIPA1 and C9orf72 repeat expansions

Genotype ALS patients C9orf72 carriers Expected Binominal P (n = 4619, %) (n = 322, %) frequency (%)

NIPA1 + C9orf72 17 (0.37) 17 (5.27) 12.0 (0.26) 0.06

Co-occurrence of NIPA1 exonic trinucleotide ([GCG] > 8) and C9orf72 intronic hexanucleotide ([GGGGCC] ≥ 30) repeat expansion in 4619 ALS patients with known genotypes for both genes. Expected frequency was calculated from the total number of C9orf72 expansion carriers (322) using the known frequency of NIPA1 expansion in controls (189/5051, 3.74%).

41 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

Case Control

Cohort Exp Non−Exp Exp Non−Exp Odds Ratio [95% CI]

Previous reports (PCR)

Netherlands1 116 2113 99 2661 1.57 [1.18, 2.08]

Replication (PCR/Sanger)

Netherlands2 79 1299 50 1312 1.59 [1.10, 2.28]

Replication (WGS)

Belgium 10 281 9 164 0.64 [0.25, 1.61]

Ireland 10 225 4 129 1.45 [0.44, 4.76]

Netherlands 5 53 11 137 1.22 [0.40, 3.70]

Spain 9 177 4 95 1.35 [0.40, 4.61]

UK 50 972 9 263 1.52 [0.73, 3.17]

USA 22 369 2 64 1.87 [0.42, 8.25]

2 WGS (Q = 2.64, df = 5, p = 0.76; I = 0.0%) 1.23 [0.81, 1.87]

2 Meta analysis (Q = 3.70, df = 7, p = 0.81; I = 0.0%) P = 0.000074987 1.49 [1.22, 1.81]

Joint analysis P = 0.000100562 1.47 [1.21, 1.78]

0.2 1 2 5 Odds Ratio

Supplementary Figure 1. NIPA1 polyalanine repeat expansion meta-analysis excluding known mutant ALS gene carriers. Forest plot for the fixed-effect meta-analysis and joint analysis on individual level data of the effect of expanded NIPA1 polyalanine (>8 GCG repeats) on ALS risk without carriers of the pathogenic repeat expansion in C9orf72 or nonsynonymous/loss-of-function variants in SOD1, FUS or TARDBP.

42 2

Supplementary Figure 2. Effect of NIPA1 polyalanine repeat expansion on survival. Kaplan-Meier plot showing the effect of NIPA1 polyalanine repeat expansion on survival after onset of ALS. Cox-regression was corrected for sex, age at onset, site of onset and the presence of a C9orf72 repeat expansion in the entire cohort of 1954 Dutch ALS patients.

43 Chapter 2 | NIPA1 repeats in ALS in a large international cohort

Supplementary Figure 3. Effect of NIPA1 repeat expansion on age at onset of ALS. Plot of the effect of NIPA1 GCG repeat expansion on age at onset corrected for sex, site of onset and the presence of a C9orf72 repeat expansion in the replication cohort of 1954 ALS patients.

44 2

45 46 Chapter 3

3 Large scale screening in sporadic ALS identifies genetic modifiers in C9orf72 repeat carriers.

Annelot M. Dekker*, Meinie Seelen*, Perry T.C. van Doormaal, Wouter van Rheenen, Reinoud J.P. Bothof, Tim van Riessen, William J. Brands, Anneke J. van der Kooi, Marianne de Visser, Nicol C. Voermans, R. Jeroen Pasterkamp, Jan H. Veldink**, Leonard H. van den Berg**, Michael A. van Es**

Neurobiology of Aging 39, 220.e9–15 (2016)

* These authors contributed equally to this manuscript. ** These authors jointly supervised this work.

47 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

ABSTRACT

Sporadic ALS is considered to be a complex disease with multiple genetic risk factors contributing to the pathogenesis. Identification of genetic risk factors that co-occur frequently could provide relevant insight into underlying mechanisms of motor neuron degeneration. To dissect the genetic architecture of sporadic ALS we undertook a large sequencing study in 755 apparently sporadic ALS cases and 959 controls, analyzing ten ALS genes: SOD1, C9orf72, TARDBP, FUS, ANG, CHMP2B, ATXN2, NIPA1, SMN1 and UNC13A. We observed sporadic cases with multiple genetic risk variants in 4.1% compared to 1.3% in controls. The overall difference was not in excess of what is to be expected by chance (binomial test, P = 0.59). We did however observe a higher frequency than expected of C9orf72 repeat carriers with co- occurring susceptibility variants (ATXN2, NIPA1, SMN1; P = 0.001), which is mainly due to the co-occurrence of NIPA1 repeats in 15% of C9orf72 repeat carriers (P = 0.006).

48 INTRODUCTION

Amyotrophic lateral sclerosis (ALS) is a fatal neurological disorder characterized by motor neuron degeneration in the primary motor cortex, brainstem and spinal cord. ALS patients develop progressive weakness and spasticity eventually resulting in respiratory failure and death. Survival is approximately 3 years from symptom onset1,2. 3 To date, there is only one drug, riluzole that can moderately slow disease progression3.

In about 5-10% of patients the disease is familial and the mode of transmission is mostly autosomal dominant⁴. In the majority of cases there is no apparent family history and these cases are considered sporadic. Although the distinction between familial and sporadic seems straightforward, there is no clear definition of familial ALS and there is poor consensus amongst experts⁴. Many familial ALS pedigrees demonstrate incomplete penetrance. Genetic modeling studies have shown that considering non-penetrance and small family size, familial cases may well present as ‘apparently’ sporadic⁵. Even further complicating the matter is that there have been reports of ALS pedigrees with mutations in more than one ALS gene, suggesting that oligogenic inheritance may also occur⁶-¹¹. Therefore, it has been argued that the distinction between familial and sporadic ALS is rather arbitrary⁵.

In fact, it has been proposed that this distinction might be artificial in all diseases. In an influential paper, Manolio et al. propose a model in which they present genetic risk factors on a sliding scale with on one extreme rare mutations (<1%) with very large effect (perhaps directly pathogenic) involved in Mendelian disorders and on the other extreme common polymorphisms (>5%) with small effect (odds ratios < 1.5) likely to be involved in complex / sporadic diseases¹2. They argue that the majority of genetic risk factors are likely to be variants with intermediate frequency (1-5%) and larger effect (odds ratios >1.5). This model seems to translate well to ALS genetics.

Over the last few years great progress has been made in ALS genetics. There are over 20 familial ALS genes and several risk factors for sporadic ALS have now been identified by GWAS and candidate gene studies¹3,¹⁴. In an attempt to dissect the genetic architecture of sporadic ALS we undertook a large sequencing study in which we analyzed 10 ALS genes: high-risk genes (rare mutations with large effect): SOD1, C9orf72, TARDBP, FUS, susceptibility genes with intermediate effect: ANG, CHMP2B, ATXN2, NIPA1, SMN1, and polymorphisms with small effect: a risk SNP in UNC13A. We hypothesized that we would find: 1) a low percentage of sporadic cases with mutations in high risk genes (familial cases presenting as sporadic) and 2) a significant number

49 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

of sporadic cases with variants in multiple susceptibility genes and/or mutations in high risk genes combined with susceptibility genes (perhaps contributing to non- penetrance and phenotypic variability). The identification of genetic risk factors that co-occur frequently could provide relevant insight into the underlying mechanisms of motor neuron degeneration, which formed the rationale for this study.

MATERIAL AND METHODS

Subjects Sporadic ALS patients and control subjects were recruited as part of the Prospective ALS study The Netherlands, a population-based case-control study1. All ALS patients included in this study fulfilled the revised El Escorial criteria for definite or probable ALS. ALS mimics and subjects with a known family history of ALS were excluded. Genomic DNA was extracted from whole blood by means of salting-out or magnetic beads procedures. All material was obtained with the ethical approval of the institutional review board of the University Medical Center Utrecht. All subjects provided written informed consent.

Gene selection To date, over 100 genes have been implicated in ALS¹⁵. The level of supporting evidence for each gene or gene variant varies from small to overwhelming, and is in some cases contradictory¹⁶. Therefore, most authors assign ALS genes to different categories of certainty, although there is no consistent nomenclature. In this study, we divided genetic variants into three categories: 1) High-risk variants, 2) Susceptibility variants and 3) Risk SNPs. Variants were grouped according to consensus in literature (directly causal / Mendelian or less certain (risk factor / putative ALS gene)) 13,1⁴,¹⁷ minor allele frequency (MAF) and effect size. We considered rare variants (MAF <0.1% in controls) with large effect (causal / Mendelian) as high-risk variants. Variants with low MAF (0.1 - 5.0% in controls) with intermediate effect (OR between 1.5 and 10.0) as susceptibility variants. Variants with MAF > 5.0% in controls and OR < 1.5 were termed risk SNPs.

In total 10 genes were selected for analysis in this study. They were categorized as follows: 1) High risk genes: SOD1, TARDBP, FUS, C9orf72; 2) Susceptibility variants: ANG, CHMP2B, NIPA1, SMN1, ATXN2; and 3) Risk SNP: UNC13A. These genes were selected based on the following criteria: 1) the presence of variation in the gene in the Dutch population in previous studies. For instance, UBQLN2 mutations are a well-established, but rare cause of ALS and mutations in this gene are not found in The Netherlands¹⁸.

50 Therefore, UBQLN2 was not analysed in this study. For this reason, genes such as OPTN, VCP, hnRNPA1 and hnRNPA2B1 were also not included in the study. And 2) the genes needed to be implicated in ALS by multiple or very large studies.

Genetic analyses To complete the entire set of genes for each subject, we used previously obtained data 3 and subsequently performed additional genetic analyses. Detailed information of the genetic analyses can be found in the Supplementary Material (Methods).

In short, subjects were screened for mutations in SOD1 (NM_000454, exons 1-5), FUS (NM_004960, exons 5, 6, 14, 15), TARDBP (NM_007375, exon 6), ANG (NM_001145, exon 2) and CHMP2B (NM_014043, exons 1-6) by means of Sanger sequencing as described previously¹⁹-²³. These exons were selected since the vast majority of known pathogenic variants lie within these regions. Additional screening of FUS, TARDBP, ANG and CHMP2B was performed by multiplexed targeted resequencing, carried out on a MiSeq high-throughput, next-generation sequencing platform (Illumina). Bar-coded paired- end sequencing libraries were prepared using a Truseq Custom Amplicon kit (Illumina). Sequence Analysis Viewer software (Illumina) was used to monitor the quality of the sequencing runs. Sequencing reads were mapped to the human genome reference build GRCh³⁷ using Burrows-Wheeler Aligner (BWA v0.6.1). Subsequent depth of coverage, quality filters, variant calling and variant annotation were performed using SAMtools v0.1.19, GATK v3.2 and the 1,000 Genomes project. The impact of each mutation on the structure, and function of the protein was predicted with PolyPhen-2 (PolyPhen-2 v2.2.2; http://genetics.bwh.harvard.edu/pph2/) and PMut (http://mmb2. pcb.ub.es:8080/PMut/).

We performed fragment-length analyses of repeats in C9orf72 (NM_018325, long repeat = (GGGGCC) ≥30), ATXN2 (NM_002973, intermediate repeat = (CAG) ≥29) and NIPA1 (NM_144599, long repeat = (GCG) >8), as described previously24-26. Copy number variations in SMN1 (NM_000344, i.e. >2 copies) were determined using multiplexed ligation-dependent probe amplification, as described previously²⁷. Lastly, a previously ALS-associated SNP in UNC13A (rs12608932, recessive model CC) was determined by use of a TaqMan allelic discrimination assay²⁸.

Statistical analysis Binomial tests were performed to assess whether the observed frequency of co- occurring variants was in excess of what would be expected on the basis of chance. To calculate the expected frequency of co-occurring variants, we used the following formula: (the observed number of patients carrying a variant / the total number of 51 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

patients) * (the observed number of controls carrying a variant / the total number of controls). This formula was used in order to take into account the higher frequency of just one variant in ALS patients (= frequency of variants in patients), multiplied by the chance probability of a second variant (= frequency of variants in controls). To perform the binomial test, we then used the following formula in R: pbinom([observed number of patients with multiple variants], [total number of patients], [expected frequency], lower.tail = FALSE, log.p = FALSE). If the binomial p value is smaller than 0.05 this means that the observed frequency of co-occurring variants is higher than expected based on chance.

Gene burden testing was not performed considering these analyses could not be performed for all genes (5 out of 10). Certain types of genetic variation (repeat expansions in ATXN2, C9orf72 and NIPA1), duplications in SMN1 and intronic polymorphisms in UNC13A) only permit straightforward association testing.

We examined phenotypic associations for the combination of C9orf72 repeat expansions with susceptibility variants for age at onset, site of onset and survival. Statistical analysis was performed on SPSS software version 21 and R v3.0.2 (CRAN: http://www.r-project.org/).

RESULTS

Mutational analyses A total of 755 sporadic ALS patients and 959 control subjects from The Netherlands were included in this study. Baseline characteristics are shown in Table 1.

An overview of the genotyping results by gene is provided in Table 2. Approximately 7% of sporadic cases were found to carry variants in high-risk ALS genes (SOD1, TARDBP, FUS and C9orf72). Repeat expansions in C9orf72 were the most common (6.1%). We identified a non-synonymous mutation in FUS in a single control. The frequency of variants in high-risk ALS genes in controls was 0.1%. This difference between patients and controls was significant with P = 2.20 x 10-16. Variants in susceptibility genes ANG, CHMP2B, SMN1 duplications and repeat expansions in ATXN2 and NIPA1, were found in 14.9% of patients and in 8.4% of controls (P = 2.85 x 10-⁵). Lastly, 120 (15.9%) patients and 104 (10.8%) controls were homozygous for the UNC13A risk SNP (P = 0.002 (recessive model)).

52 Table 1. Baseline characteristics of study population.

Sporadic ALS patients Control subjects Subjects, No. 755 959 Female gender (%) 334 (44) 455 (47) Age (yrs.), median (IQR) 61 (53-69) 63 (56-70) 3 Bulbar site of onset, n (%) 254 (33.6) Survival (mo.), median (IQR) 31 (21-45) Key: ALS, amyotrophic lateral sclerosis; IQR, interquartile range.

Co-occurring ALS gene variants In 31 (4.1%) patients and in 13 (1.4%) controls we identified more than one variant in ALS-associated genes (Table 3). The higher frequency of ALS patients with multiple variants is statistically significant when applying a simple Fisher exact test with P = 7.39 x 10-⁴. When we subsequently performed a binomial test, in order to control for co-occurrence of multiple variants by chance (and taking the higher rate of having one ALS mutation in patients into account), this difference did not remain significant with P = 0.59.

When we looked at subgroups, no co-occurring high risk variants (SOD1, FUS, TARDBP, C9orf72) were found in sporadic ALS patients (Table 3a). The frequency of high-risk variants combined with susceptibility variants was significantly higher than would be expected on the basis of chance (P = 0.001, Table 3b), which is probably due to combinations with C9orf72 repeat expansions (11 out of 12 co-occurring variants). Interestingly, we observed a significantly lower rate of controls (P = 0.009) with variants in multiple high risk and susceptibility genes (without UNC13A SNP, expected 7 (0.7%) versus actually observed 1 (0.1%)).

Clinical characteristics (i.e. gender, age at onset, site of onset, survival) of patients with multiple genetics variants are described in the Supplementary Material (Table S1).

C9orf72 repeat carriers Considering the most prominent finding was the co-occurrence of C9orf72 repeats expansions with susceptibility variants (ATXN2, NIPA1, SMN1) and the UNC13A SNP, we performed additional analyses. The difference was mainly explained by the combination of C9orf72 and NIPA1 repeat expansions (binomial test, P = 5.73 x 10-4, Table 4). No significant difference in observed versus expected frequency was found for C9orf72 in combination with ATXN2, SMN1 or UNC13A (P = 0.06, P = 0.08, P = 0.13).

53 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

Considering the relatively high frequency of C9orf72 and NIPA1 repeat expansions (15% of C9orf72 cases), we questioned whether the initial association of NIPA1 with ALS might be driven by a high frequency of coincidental C9orf72 repeat expansions and that the association with NIPA1 could be false positive. Therefore, the association analysis on the NIPA1 data set was repeated after excluding C9orf72 positive cases and the association still remained, suggesting that there is a significant co-occurrence of 2 independently ALS-associated repeat expansions.

There were two cases with a C9orf72 repeat expansion combined with variants in two other genes (C9orf72, ATXN2 and UNC13A; and C9orf72, SMN1 and UNC13A). For both cases statistical analysis suggested that these combinations were not likely to be a chance finding with P = 6.60 x 10-⁴ and P = 0.01.

Phenotypic characteristics of all C9orf72 repeat carriers with concurrent possible genetic modifiers are shown in Table S2 (i.e. age at onset, site of onset, survival and co-morbid frontotemporal dementia). Unfortunately, no detailed family history on cognitive impairment was available, since most patients were included many years ago. C9orf72 repeat carriers with a NIPA1 repeat expansion had an earlier mean age at onset (52 vs. 60 years, P = 0.03) and more often a spinal onset (86 vs. 53%, P = 0.21) compared to C9orf72 repeat carriers without a NIPA1 repeat expansion. C9orf72 repeat carriers with a concomitant UNC13A SNP more often had a bulbar onset (86 vs. 33%, P = 0.01) and a shorter survival (26.3 vs. 33.3 months, P = 0.48).

DISCUSSION

In this study we attempted to dissect the genetics of sporadic ALS by analyzing 10 ALS genes in a large cohort of population based cases and controls. We did not find an overall increased risk of co-occurring variants in sporadic ALS patients. But we do present compelling statistical evidence for an excess of concomitant mutations in C9orf72 repeat carriers.

Approximately 7% of our sporadic cases were found to carry a variant in a high-risk ALS gene. We did not observe sporadic cases with simultaneous mutations in more than one high-risk ALS gene. Previous studies in familial ALS have shown pedigrees with variants in multiple high-risk ALS genes suggesting that in a percentage of familial ALS there is oligogenic inheritance⁶,⁸,¹⁰,29,30. The fact that these double mutations do occur in familial ALS, but not in sporadic ALS may suggest that the co-occurrence of mutations in 2 high-risk genes results in a familial rather than a sporadic presentation.

54 Table 2. Variants found by large scale genetic screening in sporadic ALS patients and controls

Gene Variant Exon ALS patients Controls Previous reports SOD1 I99V 4 1 0 ALS⁴3 E132K 5 1 0 Novel Total (%) 2/755 (0.3) 0/959 (0.0) FUS S115N 5 1 0 ALS⁴⁶ 3 S142N 5 0 1 Novel R495X 14 1 0 ALS⁴⁶, ⁵1 Total (%) 2/753 (0.3) 1/943 (0.1) TARDBP N352S 6 1 0 ALS⁴⁶, ⁵2 I383V 6 1 0 ALS⁴⁵, ⁵3 Total (%) 2/753 (0.3) 0/959 (0.0) C9orf72 Long repeat 46 0 ALS/FTD2⁰, ⁴⁷, ⁵⁴ Total (%) 46/755 (6.1) 0/959 (0.0) ANG K17I 2 4 1 ALS/PD/CON1⁹, ⁵⁵ I46V 2 0 1 ALS/PD/CON1⁹, ⁵⁵ T80S 2 1 0 ALS1⁹ F100I 2 1 0 ALS1⁹ Total (%) 6/707 (0.9) 2/948 (0.2) CHMP2B R22Q 2 1 0 ALS⁴⁶ S103C 3 1 0 Novel S194L 6 0 1 CON⁴⁶ E201Q 6 1 0 Novel Total (%) 3/738 (0.4) 1/928 (0.1) ATXN2 Intermediate repeat 12 7 ALS⁴⁸, ⁵⁶ Total (%) 12/755 (1.6) 7/951 (0.7) NIPA1 Long repeat 41 37 ALS⁴⁹ Total (%) 41/740 (5.5) 37/956 (3.9) SMN1 Duplications 50 33 ALS⁵⁰, ⁵⁷ Total (%) 50/755 (6.6) 33/959 (3.4) rs12608932 (CC) 120 104 ALS⁵⁸ Total (%) 120/754 (15.9) 104/958 (10.8) SOD1 (NM_000454, exons 1-5), FUS (NM_004960, exons 5, 6, 14, 15), TARDBP (NM_007375, exon 6), C9orf72 (NM_018325, long repeat = (GGGGCC) ≥30), ANG (NM_001145, exon 2), CHMP2B (NM_014043, exons 1-6), ATXN2 (NM_002973, intermediate repeat = (CAG) ≥29) and NIPA1 (NM_144599, long repeat = (GCG) >8), SMN1 (NM_000344, >2 copies), UNC13A (NM_001080421, homozygous SNP). Key: ALS, amyotrophic lateral sclerosis; FTD, frontotemporal dementia; PD, Parkinson’s disease; CON, controls.

55 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers 0 0 0 0 0 0 1 0 0 0 0 n (%) 1 (0.1) 0 (0.0) 0 (0.0) Controls, Controls, b -

P 0.27 0.001 Binomial Binomial 4 (0.6) 12 (1.7) 0 (0.03) ALS, n (%)a Expected frequency 1 1 7 3 1 1 0 1 7 3 1 n (%) 0 (0.0) 14 (1.9) 12 (1.6) ALS patients, ALS patients, Variant 3 Variant - SMN1 (duplications) Variant 2 Variant - SMN1 (duplications) repeat) (intermediate ATXN2 (long repeat) NIPA1 SMN1 (duplications) repeat) (intermediate ATXN2 SMN1 (duplications) (long repeat) NIPA1 SMN1 (duplications) (long repeat) NIPA1 SMN1 (duplications) repeat) (intermediate ATXN2 Table 3. The observed and expected co-occurring variants in sporadic ALS patients and control subjects and control ALS patients observed and expected co-occurringin sporadic The variants 3. Table 1 Variant A) Combination of high risk variants Combination A) - C) Combination of any high risk and susceptibility variants C) of any Combination (I99V)SOD1 ANG (K17I) C9orf72 (long repeat) C9orf72 (long repeat) C9orf72 (long repeat) repeat) (intermediate ATXN2 repeat) (intermediate ATXN2 Total B) Combination of a high riskwith susceptibility variant variant B) Combination (I99V)SOD1 C9orf72 (long repeat) C9orf72 (long repeat) C9orf72 (long repeat) Total

56

0 0 0 0 0 0 0 0 0 0 1 2 6 4

13 (1.4) 0.59 3 32 (4.2) 1 1 1 1 7 1 2 1 5 1 0 0 4 6 31 (4.1) SMN1 (duplications) UNC13A (rs12608932) UNC13A (rs12608932) A binomial test was performed to compare the observed performed was A binomial test frequency compare to of co-occurring ALS patients in sporadic variants b SMN1 (duplications) UNC13A (rs12608932) repeat) (intermediate ATXN2 UNC13A (rs12608932) (long repeat) NIPA1 SMN1 (duplications) SMN1 (duplications) repeat) (intermediate ATXN2 UNC13A (rs12608932) SMN1 (duplications) (long repeat) NIPA1 UNC13A (rs12608932) UNC13A (rs12608932) UNC13A (rs12608932) UNC13A (NM_001080421, homozygous SNP). To calculate the expected frequency of co-occurring variants, we used the following formula: (the observed carrying formula: the expected number of patients frequency calculate of co-occurring used the following / the one variant we variants, To Four different groups of co-occurring variants are shown, adding new possible combinations of risk variants in each step in a sliding scale of certainty. certainty. in a sliding scale of co-occurring of variants in each step of combinations risk adding new possible groups shown, variants are different Four a take into to used in order was formula This carrying * (the observed number of patients) total number of controls). number of controls / the total one variant variant of a second probability times the chance (= frequency in patients) the higher frequency in ALS patients of variants of just one variant account SMN1 (NM_000344, >2 >8), = (GCG) (NM_144599, long repeat NIPA1 ≥29) and = (CAG) repeat (NM_002973, intermediate 1-6), ATXN2 (NM_014043, exons copies), (= frequency of variants in controls). (= frequency in controls). of variants expected frequency. with the calculated CHMP2B 2), ≥30), ANG (NM_001145, exon 1-5), C9orf72 = (GGGGCC) (NM_000454, exons SOD1 (NM_018325, long repeat sclerosis. lateral ALS, amyotrophic D) Combination of high risk variants, susceptibility variants and risk susceptibility variants SNP of high risk variants, Combination D) (I99V)SOD1 (N352S) TARDBP ANG (K17I) CHMP2B (E201Q) C9orf72 (long repeat) C9orf72 (long repeat) C9orf72 (long repeat) C9orf72 (long repeat) C9orf72 (long repeat) repeat) (intermediate ATXN2 repeat) (intermediate ATXN2 repeat) (intermediate ATXN2 (long repeat) NIPA1 SMN1 (duplications) Total

57 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

Table 4. Frequency of C9orf72 repeat expansions with co-occurring variants (ATXN2, NIPA1, SMN1, UNC13A) Genes sporadic ALS (n=755) C9orf72 carriers Expected Binomial (%) (n=46) (%) frequency (%) Pa C9orf72 + ATXN2 1 (0.13) 1 (2.2) 0.4 (0.05) 0.06 C9orf72 + NIPA1 7 (0.93) 7 (15.2) 1.8 (0.24) 5.73 x 10-⁴ C9orf72 + SMN1 3 (0.40) 3 (6.5) 1.6 (0.21) 0.08 C9orf72 + UNC13A 7 (0.93) 7 (15.2) 5.0 (0.66) 0.13 ALS, amyotrophic lateral sclerosis. ap-values result from binomial test, comparing the observed frequency in sporadic ALS versus the expected frequency (calculated for all sporadic ALS patients).

Sporadic ALS is considered to be a complex disease with multiple genetic risk factors contributing to the disease. We therefore expected to find sporadic cases with multiple genetic risk variants. Indeed, we observed double mutations in 4.1% of patients compared to 1.3% of controls. Despite the 3-fold higher frequency in cases, this difference was not statistically significant when we applied a binomial test, taking the higher rate of having one ALS mutation in patients into account. With this test we compared the frequency of observed double mutations in cases to the expected number of double mutations in cases (using the frequency of mutations in controls). Although correcting for chance co-occurrence is necessary, our approach may be too strict considering that we also correct for combinations that have never been observed, such as homozygous NIPA1 expansions. Likewise, in a meta-analysis on ANG mutations in ALS and Parkinson’s disease data from over 15,000 individuals was available in which no homozygous ANG mutations were seen²³. Hence, we may be overcorrecting for merely theoretical possibilities. This seems to be reinforced by the fact that the observed frequency of double mutations was significantly lower than was to be expected in controls. We therefore hypothesize that perhaps certain combinations of mutations may be more relevant than we can statistically demonstrate.

In this study we present compelling statistical evidence for an excess of concomitant mutations in combination with C9orf72 repeat expansions. Although C9orf72 was initially discovered in ALS-FTD, the gene has now been implicated in many different neurodegenerative and psychiatric diseases including Alzheimer’s disease, Parkinsonism, Huntington’s disease phenocopies, schizophrenia and bipolar disorder³¹-³⁴. This very large phenotypic variability associated with C9orf72 repeat expansions is poorly understood. One of the hypotheses is that additional genetic variants determine phenotype in C9orf72 carriers. Indeed there are multiple case reports of C9orf72 cases with additional mutations in other ALS or FTD associated

58 genes (i.e. ANG, TARDBP, FUS, SOD1, VAPB, OPTN, UBQLN2, MAPT, GRN, DAO)⁶-⁸,¹⁰,¹¹,²⁹,³⁵-³⁹. However, only a few studies have systematically analyzed multiple genes in cohorts.

A French study on the role of ATXN2 expansions in neurodegeneration found C9orf72 expansions in 5.5% of sporadic ALS cases with ATXN2 expansions, and ATXN2 expansions in 1.8% of C9orf72 positive sporadic ALS cases, with even higher 3 frequencies in familial ALS and FTD cases⁴⁰. Another recent study by van Blitterswijk et al. also provided evidence for ATXN2 as a disease modifier of C9orf72⁴¹. In our data set we did not observe a significant co-occurrence of ATXN2 in C9orf72 repeat carriers (n = 1). However, the frequency of co-occurring ATXN2 and C9orf72 repeat expansions in our study are comparable to that of the other studies. Therefore, our findings do not contradict the previous studies.

Several studies demonstrated that UNC13A homozygous ALS cases have a shorter survival²⁸-⁴². A recent study demonstrated that multiple genetic factors influence phenotypic features in C9orf72 ALS⁴³. UNC13A was found to negatively influence survival in C9orf72-ALS. In this study we also observed a shorter survival for C9orf72 – UNC13A ALS cases (26.3 vs. 33.3 months).

In the current study, we observed a high frequency of NIPA1 repeat expansions in C9orf72 positive sporadic ALS cases (15.2% compared to 5.5% in all other sporadic ALS cases and only 3.9% in controls). This difference was significant and represents the discovery of a novel phenotypic modifier of the C9orf72 phenotype. The NIPA1 expansion is highly interesting as it is solely made up of alanines, the majority being encoded by a polymorphic (GCG)n repeat (most frequently (GCG)⁷ and (GCG)8)⁴⁴. A shared feature of all polyalanine disease pathogenesis is prominent protein aggregation⁴⁵. In vitro experiments have shown that peptides containing 7-15 alanine repeats undergo variable levels of conformational transition from a monomeric α-helix to a predominant β-sheet. However, when the repeat size is >15, there is complete conversion from monomer to β-sheet⁴⁶. In vivo experiments show that expanded polyalanine tracts have a pronounced tendency to adopt β-sheet complexes that promote strong protein–protein interactions, leading to insoluble protein assemblies. The level of protein aggregation in polyalanine diseases correlates with the size of the polyalanine expansion tract⁴⁵. There is also considerable evidence suggesting that misfolded polyalanine-containing proteins are targeted for degradation by the ubiquitin-proteasome system⁴⁷.

59 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

Allelic mutations in NIPA1 cause Spastic Paraplegia 6 (SPG6), which is an upper motor neuron syndrome⁴⁸. Copy number variations in NIPA1 are associated with ALS⁴⁹. And our data shows that NIPA1 repeat expansions are a risk factor for ALS independent of C9orf72. There is thus considerable evidence that variation in NIPA1 affects the motor system in various ways. It therefore seems plausible that a NIPA1 repeat expansion in the context of a C9orf72 repeat expansion would drive towards a motor neuron disease phenotype. Indeed, it seems there is a relatively consistent phenotype in C9orf72 – NIPA1 ALS cases (relatively young, spinal onset).

To our knowledge, this is the first report on ALS cases carrying 3 different ALS risk factors. Two out of these three cases are C9orf72 positive, which again suggests that it is the additional genetic factors that determine phenotype in C9orf72 carriers.

CONCLUSION

In summary, there are an increasing number of reports on both familial and sporadic ALS patients with mutations in multiple ALS (risk) genes. To date, very few studies have systematically addressed this phenomenon, but it seems that collaborative and larger studies will be able to identify frequently co-occurring variants, which in turn could provide novel inroads for creating disease models and perhaps therapeutic strategies. Although the high phenotypic variability associated with C9orf72 repeat expansions is not well understood, there is mounting evidence that additional genetic factors determine phenotype. Several genes have been implicated to date, including UNC13A and ATXN2 repeat expansions. In this study we identified a novel modifier, NIPA1.

60 REFERENCES

1. Huisman, M. H. B. et al. Population based epidemiology of amyotrophic lateral sclerosis using capture- recapture methodology. Journal of Neurology, Neurosurgery & Psychiatry 82, 1165–1170 (2011). 2. Rooney, J. et al. Survival Analysis of Irish Amyotrophic Lateral Sclerosis Patients Diagnosed from 1995– 2010. PLoS ONE 8, e74733–10 (2013). 3 3. Miller RG, M. J. M. D. Riluzole for amyotrophic lateral sclerosis (ALS)/motor neurondisease (MND). The Cochrane Library 1–36 (2012). 4. Byrne, S. et al. Rate of familial amyotrophic lateral sclerosis: a systematic review and meta-analysis. Journal of Neurology, Neurosurgery & Psychiatry 82, 623–627 (2011). 5. Al-Chalabi, A. & Lewis, C. M. Modelling the effects of penetrance and family size on rates of sporadic and familial disease. Hum Hered 71, 281–288 (2011). 6. Chiò, A. et al. ALS/FTD phenotype in two Sardinian families carrying both C9orf72 and TARDBP mutations. Journal of Neurology, Neurosurgery & Psychiatry 83, 730–733 (2012). 7. King, A. et al. Mixed tau, TDP-43 and p62 pathology in FTLD associated with a C9orf72 repeat expansion and p.Ala239Thr MAPT (tau) variant. Acta Neuropathol 125, 303–310 (2012). 8. Millecamps, S. et al. Phenotype difference between ALS patients with expanded repeats in C9orf72 and patients with mutations in other ALS-related genes. J Med Genet 49, 258–263 (2012). 9. Testi, S., Tamburin, S., Zanette, G. & Fabrizi, G. M. Co-occurrence of the C9orf72 expansion and a novel GRN mutation in a family with alternative expression of frontotemporal dementia and amyotrophic lateral sclerosis. J. Alzheimers Dis. 44, 49–56 (2015). 10. van Blitterswijk, M. et al. Evidence for an oligogenic basis of amyotrophic lateral sclerosis. Human Molecular Genetics 21, 3776–3784 (2012). 11. van Blitterswijk, M. et al. VAPB and C9orf72 mutations in 1 familial amyotrophic lateral sclerosis patient. Neurobiology of Aging 33, 2950.e1–2950.e4 (2012). 12. Manolio, T. A. et al. Finding the missing heritability of complex diseases. 461, 747–753 (2009). 13. Renton, A. E., Chiò, A. & Traynor, B. J. State of play in amyotrophic lateral sclerosis genetics. Nat. Neurosci. 17, 17–23 (2014). 14. Leblond, C. S., Kaneb, H. M., Dion, P. A. & Rouleau, G. A. Dissection of genetic factors associated with amyotrophic lateral sclerosis. Experimental Neurology 262, 91–101 (2014). 15. Lill, C. M., Abel, O., Bertram, L. & Al-Chalabi, A. Keeping up with genetic discoveries in amyotrophic lateral sclerosis: The ALSoD and ALSGene databases. Amyotroph Lateral Scler 12, 238–249 (2011). 16. Abel, O., Powell, J. F., Andersen, P. M. & Al-Chalabi, A. Credibility Analysis of Putative Disease-Causing Genes Using Bioinformatics. PLoS ONE 8, e64899–6 (2013). 17. Ajroud-Driss, S. & Siddique, T. Sporadic and hereditary amyotrophic lateral sclerosis (ALS). Biochim. Biophys. Acta 1852, 679–684 (2015). 18. van Doormaal, P. T. C. et al. UBQLN2 in familial amyotrophic lateral sclerosis in the Netherlands. Neurobiology of Aging 33, 2233.e7–2233.e8 (2012). 19. Groen, E. J. N. et al. FUS mutations in familial amyotrophic lateral sclerosis in the Netherlands. Arch. Neurol. 67, 224–230 (2010). 20. Ticozzi, N. et al. Mutational analysis reveals the FUS homolog TAF15 as a candidate gene for familial amyotrophic lateral sclerosis. Am. J. Med. Genet. 156, 285–290 (2011).

61 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

21. van Blitterswijk, M. et al. Genetic Overlap between Apparently Sporadic Motor Neuron Diseases. PLoS ONE 7, e48983–6 (2012). 22. van Es, M. A. et al. Large-scale SOD1 mutation screening provides evidence for genetic heterogeneity in amyotrophic lateral sclerosis. Journal of Neurology, Neurosurgery & Psychiatry 81, 562–566 (2010). 23. van Es, M. A. et al. Angiogenin variants in Parkinson disease andamyotrophic lateral sclerosis. Ann Neurol. 70, 964–973 (2011). 24. Blauw, H. M. et al. NIPA1 polyalanine repeat expansions are associated with amyotrophic lateral sclerosis. Human Molecular Genetics 21, 2497–2502 (2012). 25. van Rheenen, W. et al. Hexanucleotide repeat expansions in C9orf72 in the spectrum of motor neuron diseases. Neurology 79, 878–882 (2012). 26. Van Damme, P. et al. Expanded ATXN2 CAG repeat size in ALS identifies genetic overlap between ALS and SCA2. Neurology 76, 2066–2072 (2011). 27. Blauw, H. M. et al. SMN1 gene duplications are associated with sporadic ALS. Neurology 78, 776–780 (2012). 28. Diekstra, F. P. et al. UNC13A is a modifier of survival in amyotrophic lateral sclerosis. Neurobiology of Aging 33, 630.e3–630.e8 (2012). 29. Kenna, K. P. et al. Delineating the genetic heterogeneity of ALS using targeted high-throughput sequencing. J Med Genet 50, 776–783 (2013). 30. Cady, J. et al. Amyotrophic lateral sclerosis onset is influenced by the burden of rare variants in known amyotrophic lateral sclerosis genes. Ann Neurol. 77, 100–113 (2014). 31. Beck, J. et al. Large C9orf72 Hexanucleotide Repeat Expansions Are Seen in Multiple Neurodegenerative Syndromes and Are More Frequent Than Expected in the UK Population. The American Journal of Human Genetics 92, 345–353 (2013). 32. DeJesus-Hernandez, M. et al. Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9orf72 Causes Chromosome 9p-Linked FTD and ALS. Neuron 72, 245–256 (2011). 33. Lesage, S. et al. C9orf72 repeat expansions are a rare genetic cause of parkinsonism. Brain 136, 385–391 (2013). 34. Meisler, M. H. et al. C9orf72 expansion in a family with bipolar disorder. Bipolar Disord 15, 326–332 (2013). 35. Lattante, S. et al. Contribution of major amyotrophic lateral sclerosis genes to the etiology of sporadic disease. Neurology 79, 66–72 (2012). 36. Kaivorinne, A.-L. et al. Novel TARDBP sequence variant and C9orf72 repeat expansion in a family with frontotemporal dementia. Alzheimer Dis Assoc Disord 28, 190–193 (2014). 37. van Blitterswijk, M. et al. C9orf72 repeat expansions in cases with previously identified pathogenic mutations. Neurology 81, 1332–1341 (2013). 38. Ferrari, R. et al. Screening for C9orf72 repeat expansion in FTLD. Neurobiology of Aging 33, 1850.e1–1850. e11 (2012). 39. Lashley, T. et al. A pathogenic progranulinmutation and C9orf72repeat expansion in a family with frontotemporal dementia. Neuropathol Appl Neurobiol 40, 502–513 (2014). 40. Lattante, S. et al. Contribution of ATXN2 intermediary polyQ expansions in a spectrum of neurodegenerative disorders. Neurology 83, 990–995 (2014). 41. van Blitterswijk, M. et al. Ataxin-2 as potential disease modifier in C9orf72 expansion carriers. Neurobiology of Aging 35, 2421.e13–2421.e17 (2014).

62 42. Chiò, A. et al. UNC13A influences survival in Italian amyotrophic lateral sclerosis patients: a population- based study. Neurobiology of Aging 34, 357.e1–5 (2013). 43. van Blitterswijk, M. et al. Genetic modifiers in carriers of repeat expansions in the C9orf72 gene. Mol Neurodegeneration 9, 38–10 (2014). 44. Chai, J.-H. et al. Identification of four highly conserved genes between breakpoint hotspots BP1 and BP2 of the Prader-Willi/Angelman syndromes deletion region that have undergone evolutionary transposition mediated by flanking duplicons. The American Journal of Human Genetics 73, 898–925 (2003). 45. Messaed, C. & Rouleau, G. A. Molecular mechanisms underlying polyalanine diseases. Neurobiology of 3 Disease 34, 397–405 (2009). 46. Shinchuk, L. M. et al. Poly-(L-alanine) expansions form core β-sheets that nucleate amyloid assembly. Proteins 61, 579–589 (2005). 47. Abu-Baker, A. Involvement of the ubiquitin-proteasome pathway and molecular chaperones in oculopharyngeal muscular dystrophy. Human Molecular Genetics 12, 2609–2623 (2003). 48. Rainier, S., Chai, J.-H., Tokarz, D., Nicholls, R. D. & Fink, J. K. NIPA1 Gene Mutations Cause Autosomal Dominant Hereditary Spastic Paraplegia (SPG6). The American Journal of Human Genetics 73, 967–971 (2003). 49. Blauw, H. M. et al. A large genome scan for rare CNVs in amyotrophic lateral sclerosis. Human Molecular Genetics 19, 4091–4099 (2010).

63 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

SUPPLEMENTARY METHODS – GENETIC ANALYSES

Sanger Sequencing Venous blood samples were drawn using 10-mL EDTA tubes, and genomic DNA was extracted from whole blood using a standard salting-out procedure.

All five exons of SOD1 (NM_000454) were sequenced using BigDye Terminator 3.1 technology, after initial touchdown PCR amplification, as described previously1.The following primers were used: SOD1-1-F, CGTCGTAGTCTCCTGCAGCG, and SOD1-1-R, GCGGGCGACCCGCTCCTAGC; SOD1-2-F, GGGTAAAGGTAAATCAGCTG, and SOD1-2-R, ATCTAACTAGGGTGAACAAG; SOD1-3-F, CCCAGAAGTCGTGATGCAGG, and SOD1-3-R, CCATATGAACTCCAGAAAGC; SOD1-4-F, TGCAAATTTGTGTCTACTCAGTC, and SOD1-4-R, CCGCGACTAACAATCAAAGTC; SOD1-5-F, GGTAGTGATTACTTGACAGC, and SOD1-5-R, CAGGTACTTTAAAGCAACTC. PCR products were sequenced on an ABI3730xl sequencer (Applied Biosystems). Each mutation was confirmed on genomic DNA.

Sequencing was performed on FUS (NM_004960, exons 5, 6, 14, 15), using a 96- capillary DNA Analyzer 3730XL and a BigDye Terminator 3.1 sequencing kit (Applied Biosystems, Foster City, California) as described previously². The following primers were used in this study for exon 5, 6, 14 and 15 respectively: FUS-5-F, CACGACGTTGTAAAACGACTGGACTCCACTA AAAG-TGAAAGG, and FUS-5-R, GGATAACAATTTCACACAGGAAAATGGGCTGCAGACAAAG; FUS-6-F, GAGGGTTCCTGTCTTGTTTC, and FUS-6-R, CCTCACAGATCCCTAGACAAC; FUS-14-F, CACGACGTTGTAAAACGACGAGCTGGGACCAAAGAATCC, and FUS-14-R, GGATAACAAT- TTCACA-CAGGCCCCTGAGTTAATTTTCCTTCC; FUS-15-F, CACGACGTTGTAAAACGACGGTA- GGAGGGGCAGATAGG, and FUS-15-R, GGATAACAATTTCACACAGGCTTGGGTGATCAGGAATT. All mutations were confirmed in independent experiments on genomic DNA. Sequence data were analyzed in PolyPhred.

Mutational screening of exon 6 of TARDBP (NM_007375) was performed by touchdown PCR using the following primers TDP43-6-F, AGTAAAACGACGGCCAGTTGAATCAGTGGTTTAATCTTCTTTG; and TDP43-6-R, GCAGG- AAACAGCTATGACCAAAATTTGAATTCCCACCATTC as described previously³. These primers anneal to adjacent intronic and 3’UTR regions of exon 6 and contain 5’ tails encoding M13 forward and reverse. PCR-products were subsequently purified by incubation with Exonuclease I and Shrimp Alkaline Phosphatase, sequenced with M13 primers using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) and then resolved by capillary electrophoresis on an ABI 3730XL DNA Analyzer (Applied Biosystems). Sequence analysis was performed using the

64 PHRED/PHRAP/Consed software suite (http://www.phrap.org) and variations in the sequences were identified with the Polyphred v6.15 software.

Sequencing was performed on the single coding exon of ANG (NM_001097577, exon 2), using a 96- capillary DNA Analyzer 3730XL (Applied Biosystems, Foster City, CA) and BigDye Terminator 3.1 chemistry as described previously⁴. The following 3 primers were used in this study: ANG-1-F, GTTCTTGG GTCTACCACACC and ANG-1-R, AATGGAAGGCAAGGA CAGC. The sequences were aligned using the PHRED/PHRAP/ Consed package (http://www.phrap.org), and variants were identified using the software application PolyPhred. When a variant was identified, this was confirmed by independent experiments using newly prepared samples from stock DNA.

Coding regions of all six exons CHMP2B (NM_014043.3) were screened for mutations using touchdown PCR, as described previously⁵ The following primers were used: CHMP2B-1-F, CCGCAGACGTGAGGAAAG, and CHMP2B-1-R, CTCCAGGGACAGTAGGCAGA; CHMP2B-2-F, GCGCCCAGCCAATATAAGAT, and CHMP2B-2-R, GCCATGTGCCTTCTTCCTAGT; CHMP2B-3-F, CTTCATGATCGGG-GACAAAG, and CHMP2B-3-R, CAGGAGGTGCTTTTAAATCTGC; CHMP2B- 4-F, TTTGATGTGTTCCCTTTTGACTT, and CHMP2B-4-R, TCATCATTTCTGCCTTCGTG; CHMP2B-5-F, TTCACTGAGTTTGCCTTCTGT, and CHMP2B-5-R, CGTGCATTAGGAAA- CATTTGG; CHMP2B-6-F, GGAGGTGCATGGTTTTTATTTC, and CHMP2B-6-R, TTGGCA- GCTGTAACCACCTA (for PCR), GAAATCTGCACTGTGCTTGG (for sequencing). Sanger sequencing and data analysis were performed with BigDye Terminator 3.1 sequencing kit (Applied Biosystems, Foster City, California), DNA Analyzer 3730XL (Applied Biosystems) and PolyPhred. Each mutation was confirmed on genomic DNA.

High-Throughput Next-Generation Sequencing Additional screening for non-synonymous mutations in FUS (exon 5, 6, 14, 15), TARDBP (exon 6), ANG (exon 2) and CHMP2B (exon 1-6) was carried out on a MiSeq high- throughput next-generation sequencing platform (Illumina). MyIllumina software was used to create a Truseq Custom Amplicon kit, which involved the use of specific DNA oligo stretches to sequence all exons and flanking region of the abovementioned genes of interest. Truseq Custom Amplicon Library Preparation protocol was followed to create genetic libraries containing all regions of interest of our samples. By barcoding each sample with indexed primers, ligated to the target regions and amplified by means of standard PCR, we multiplexed 95 samples of patients and control subjects per sequencing run. We chose a paired-end read with a 2x 250 read length for the approximately 95 amplicons representing our regions of interest, which ensured excellent coverage of over 500 resequencing reactions for each amplicon.

65 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

Sequence Analysis Viewer software (Illumina) was used to monitor the quality of every sequencing run, and secondary analyses were performed on MiSeq Reporter software to accomplish variant calling for every sample. Quality filters were applied for all variants in each sample, and only those variants that passed all filters were taken into account. Next, using Command-line, those variants that were not characterised by MiSeq Reporter as silent, intronic or non-coding were added to the final dataset, after manual cross-reference of a selected sample with the UCSC genome browser and IGV (Integrative Genomics Viewer) for accuracy of variant calls.

Fragment-length Analysis A repeat primed PCR was performed to assess the GGGGCC repeat in C9orf72 (NM_018325), as described elsewhere⁶. In short, a primed PCR protocol was used on 100ng genomic DNA with the following primer sequences: forward primer 5’ – 6FAM-AGTCGCTAGAGGCGAAAGC – 3’, reverse primer 5’ – TACGCATCCCAGTTTGAGACGGGGGCCGGGGCCGGGGCCGGGG – 3’, and anchor primer 5’ – TACGCATCCCAGTTTGAGACG – 3’. Fragment analysis was accomplished on an ABI3730xl sequencer and fragment sizes were analysed with GeneMapper software version 3.7. Furthermore, all samples were genotyped at the repeat sites at least two times. Alleles with 30 or more GGGGCC hexanucleotide repeats were defined as expanded.

The CAG repeat of ATXN2 (NM_002973) was amplified using following primers: forward primer 5’ – 6FAM-GGGCCCCTCACCATGTCG – 3’ and reverse primer 5’ – CGGGCTTGCGGACATTGG – 3’, as described previously7. The GCG repeat in exon 1 of NIPA1 (NM_144599) was genotyped with the following primer sequences: a fluorescently labelled forward primer 5’ – 6FAM-GCCCCTCTTCCTGCTCCT – 3’ and reverse primer with sequence 5’ – CGATGCCCTTCTTCTGTAGC – 3’, as described previously⁸. Repeat lengths were determined on a ABI3130xl sequencer. For determination of the fragment length Peak Scanner software version 1.0 (Applied Biosystems) was used.

Multiplexed Ligation-dependent Probe Amplification Copy number variation in SMN1 (NM_000344) was identified by multiplexed ligation- dependent probe amplification (MLPA) assays were run using standard protocols (www.mlpa.com), as described previously⁹. We used the SALSA P060 MLPA kit (MRC Holland, the Netherlands), containing 2 probes specifically targeted to SMN1, and control probes targeted to other chromosomal loci for normalization and assay quality control. A total of 50 –100 ng of genomic DNA was used in each MLPA assay. Data normalization and analysis were performed with GeneMarker software (SoftGenetics, State College, PA) using standard parameters.

66 Taqman Allelic Discrimination Assay Taqman allelic discrimination assay was used to determine the single nucleotide polymorphism (SNP) in UNC13A at position rs12608932, as previously described¹⁰. The following primer and probe sequences were used [VIC/FAM]: ATCCATCCACCCATCAATTTATCCA[A/C]CCATCCATTTTTCGTCTGTCCACCA. Allelic PCR products were analysed using specific probes on the ABI Prism 7900HT Sequence 3 Detection System. SDS software version 2.3 (Applied Biosystems) was used to analyze allelic variant calls for each sample.

67 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

Table S1. Clinical information on ALS patients with co-occurring variants

Gene 1 Gene 2 Gene 3 Gender Age at Onset Site of Onset Survival (y) (m) SOD1 (I99V) SMN1 M 60 Bulbar 51 TARDBP (N352S) UNC13A M 42 Cervical 58 ANG (K17I) ATXN2 SMN1 F 77 Bulbar 21 CHMP2B (E201Q) UNC13A M 75 Spinal 4 C9orf72 ATXN2 UNC13A M 62 Bulbar 22 C9orf72 NIPA1 F 50 Lumbosacral 32 C9orf72 NIPA1 M 52 Bulbar 22 C9orf72 NIPA1 F 59 Lumbosacral 6 C9orf72 NIPA1 M 57 Lumbosacral 30 C9orf72 NIPA1 M 52 Lumbosacral 14 C9orf72 NIPA1 F 62 Lumbosacral 75 C9orf72 NIPA1 M 37 Cervical 87 C9orf72 SMN1 UNC13A F 63 Bulbar 30 C9orf72 SMN1 M 63 Lumbosacral 55 C9orf72 SMN1 F 66 Lumbosacral 61 C9orf72 UNC13A M 54 Bulbar 26 C9orf72 UNC13A F 56 Bulbar 23 C9orf72 UNC13A M 60 Bulbar 33 C9orf72 UNC13A M 53 Lumbosacral 14 C9orf72 UNC13A F 56 Bulbar 36 ATXN2 SMN1 F 51 Cervical 67 NIPA1 UNC13A M 50 Cervical 48 NIPA1 UNC13A F 77 Lumbosacral 13 NIPA1 UNC13A M 74 Bulbar 16 NIPA1 UNC13A M 60 Bulbar 30 SMN1 UNC13A M 55 Lumbosacral 34 SMN1 UNC13A F 62 Bulbar 19 SMN1 UNC13A M 70 Cervical 29 SMN1 UNC13A F 71 Cervical 45 SMN1 UNC13A F 69 Thoracic 37 SMN1 UNC13A F 73 Thoracic 14 M, male; F, female; y, years; m, months

68 Table S2. Possible genetic modifiers of C9orf72 repeat carriers

n Age at Onset, Bulbar onset n Survival, yrs., mean (SD) (%) mean (SD) NIPA1 - 39 60 (7.9) 18 (47) 2.6 (1.1) NIPA1 + 7 52 (8.2) 1 (14) 3.2 (2.6) 3 ATXN2 - 45 59 (8.4) 18 (40) 2.7 (1.4) ATXN2 + 1 62 (-) 1 (100) 1.8 (-)

SMN1 - 43 59 (8.5) 18 (42) 2.6 (1.4) SMN1 + 3 64 (1.7) 1 (33) 3.5 (1.5)

UNC13A - 39 60 (8.9) 13 (33) 2.8 (1.5) UNC13A + 7 58 (3.9) 6 (86) 2.2 (0.6) Patients with (NIPA1+) and without (NIPA1-) a NIPA1 long repeat; patients with (ATNX2+) and without (ATXN2-) intermediate repeat; patients with (SMN1+) and without (SMN1-) SMN1 duplications; patients with (UNC13A+) and without (UNC13A-) the recessive model of the UNC13A SNP (rs12608932).

69 Chapter 3 | Genetic modifiers of C9orf72 repeat carriers

SUPPLEMENTARY REFERENCES

1. van Es, M. A. et al. Large-scale SOD1 mutation screening provides evidence for genetic heterogeneity in amyotrophic lateral sclerosis. Journal of Neurology, Neurosurgery & Psychiatry 81, 562–566 (2010). 2. Groen, E. J. N. et al. FUS mutations in familial amyotrophic lateral sclerosis in the Netherlands. Arch. Neurol. 67, 224–230 (2010). 3. Ticozzi, N. et al. Mutational analysis of TARDBP in neurodegenerative diseases. Neurobiology of Aging 32, 2096–2099 (2011). 4. van Es, M. A. et al. Angiogenin variants in Parkinson disease and amyotrophic lateral sclerosis. Ann Neurol. 70, 964–973 (2011). 5. van Blitterswijk, M. et al. Genetic Overlap between Apparently Sporadic Motor Neuron Diseases. PLoS ONE 7, e48983–6 (2012). 6. van Rheenen, W. et al. Hexanucleotide repeat expansions in C9orf72 in the spectrum of motor neuron diseases. Neurology 79, 878–882 (2012). 7. Van Damme, P. et al. Expanded ATXN2 CAG repeat size in ALS identifies genetic overlap between ALS and SCA2. Neurology 76, 2066–2072 (2011). 8. Blauw, H. M. et al. NIPA1 polyalanine repeat expansions are associated with amyotrophic lateral sclerosis. Human Molecular Genetics 21, 2497–2502 (2012). 9. Blauw, H. M. et al. SMN1 gene duplications are associated with sporadic ALS. Neurology 78, 776–780 (2012). 10. van Es, M. A. et al. Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat Genet 41, 1083–1087 (2009).

70 3

71 72 Chapter 4

Genome-wide association analyses identify new risk variants and the genetic 4 architecture of amyotrophic lateral sclerosis.

Wouter van Rheenen*, Aleksey Shatunov*, Annelot M. Dekker et al.

Nature Genetics 48, 1043–1048 (2016)

* These authors contributed equally to this manuscript. The complete author list is provided at the end of this manuscript.

73 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

ABSTRACT

To elucidate the genetic architecture of amyotrophic lateral sclerosis (ALS) and find associated loci, we assembled a custom imputation reference panel from whole genome- sequenced ALS patients and matched controls (N = 1,861). Through imputation and mixed-model association analysis in 12,577 cases and 23,475 controls, combined with 2,579 cases and 2,767 controls in an independent replication cohort, we fine mapped a novel locus on chromosome 21 and identified C21orf2 as an ALS risk gene. In addition, we identified MOBP and SCFD1 as novel associated risk loci. We established evidence for ALS being a complex genetic trait with a polygenic architecture. Furthermore, we estimated the SNP-based heritability at 8.5%, with a distinct and important role for low frequency (1–10%) variants. This study motivates the interrogation of larger sample sizes with full genome coverage to identify rare causal variants that underpin ALS risk.

74 ALS is a fatal neurodegenerative disease that affects 1 in 400 people, death occurring within three to five years¹. Twin-based studies estimate heritability to be around 65% and 5–10% of ALS patients have a positive family history¹,². Both are indicative of an important genetic component in ALS etiology. Following the initial discovery of the C9orf72 locus in GWASs³-⁵, the identification of the pathogenic hexanucleotide repeat expansion in this locus revolutionized the field of ALS genetics and biology⁶,⁷. The majority of ALS heritability, however, remains unexplained and only two additional risk loci have been identified robustly since³,⁸. 4

To discover new genetic risk loci and elucidate the genetic architecture of ALS, we genotyped 7,763 new cases and 4,669 controls and additionally collected genotype data of published GWAS in ALS. In total, we analyzed 14,791 cases and 26,898 controls from 41 cohorts (Supplementary Table 1, Supplementary Methods). We combined these cohorts based on genotyping platform and nationality to form 27 case-control strata. In total 12,577 cases and 23,475 controls passed quality control (Online methods, Supplementary Tables 2–5).

For imputation purposes, we obtained high-coverage (~43.7X) whole genome sequencing data from 1,246 ALS patients and 615 controls from The Netherlands (Online methods, Supplementary Fig. 1). After quality control, we constructed a reference panel including 18,741,510 single nucleotide variants. Imputing this custom reference panel into Dutch ALS cases increased imputation accuracy of low-frequency variants (minor allele frequency, MAF 0.5–10%) considerably compared to commonly used reference panels: the 1000 Genomes Project phase 1 (1000GP)⁹ and Genome of The Netherlands (GoNL)¹⁰ (Fig. 1a). The improvement was also observed when imputing into ALS cases from the UK (Fig. 1b). To benefit from the global diversity of haplotypes, the custom and 1000GP panels were combined, which further improved imputation. Given these results, we used the merged reference panel to impute all strata in our study.

In total we imputed 8,697,640 variants passing quality control in the 27 strata and separately tested these for association with ALS risk by logistic regression. Results were then included in an inverse-variance weighted fixed effects meta-analysis, which revealed 4 loci at genome-wide significance (p < 5 × 10-⁸) (Fig. 2a). The previously reported C9orf72 (rs3849943)3–5,8, UNC13A (rs12608932)3,5 and SARM1 (rs35714695)8 loci all reached genome-wide significance, as did a novel association for a non-synonymous variant in C21orf2 (rs75087725, p = 8.7 × 10-¹¹, Supplementary Tables 6–10).

75 Dutch ALS cases UK ALS cases 1.0 1.0 0.8 0.8 2 2 0.6 0.6 e r a t e r a t g r e a g r e a g 0.4 0.4 0.2 0.2

1000GP (n = 1,092) 1000GP (n = 1,092) GoNL (n = 499) GoNL (n = 499) ALS enriched panel (n = 1,861) ALS enriched panel (n = 1,861)

0.0 1000GP + ALS (n = 2,953) 0.0 1000GP + ALS (n = 2,953)

0.005 0.010 0.020 0.050 0.100 0.200 0.500 1.000 0.005 0.010 0.020 0.050 0.100 0.200 0.500 1.000

allele frequency allele frequency

Figure 1. Imputation accuracy comparison. The aggregate r2 value between imputed and sequenced genotypes on chromosome 20 using different reference panels for imputation. Allele frequencies are calculated from the Dutch samples included in the Genome of the Netherlands cohort. The highest imputation accuracy was achieved when imputing from the merged custom and 1000GP panels. This difference is most pronounced for low frequency (0.5–10%) alleles in both ALS cases from The Netherlands (a) and United Kingdom (b).

Table 1. Discovery and replication of novel genome-wide significant loci.

Discovery Replication Combined

SNP MAFcases OR Pmeta PLMM MAFcases P Pcombined I2 MAFcontrols MAFcontrols OR rs75087725 0.02 0.01 1.45 8.65 x 10-11 2.65 x 10-⁹ 0.02 0.01 1.65 3.89 x 10-3 3.08 x 10-1⁰ 0.00* rs616147 0.30 0.28 1.10 4.14 x 10-⁵ 1.43 x 10-⁸ 0.31 0.28 1.13 2.35 x 10-3 4.19 x 10-1⁰ 0.00* rs10139154 0.34 0.31 1.09 1.92 x 10-⁵ 4.95 x 10-⁸ 0.33 0.31 1.06 9.55 x 10-2 3.45 x 10-⁸ 0.05* rs7813314 0.09 0.10 0.87 7.46 x 10-⁷ 3.14 x 10-⁸ 0.12 0.10 1.17 7.75 x 10-3 1.05 x 10-⁵ 0.80** Genome-wide significant loci from the discovery phase including 12,557 cases and 23,475 controls were directly genotyped and tested for association in the replication phase including 2,579 cases and 2,767 controls. The three top associated SNPs in the MOBP (rs616147), SCFD1 (rs10139154) and C21orf2 (rs75087725) loci replicated with associations in identical directions as in the discovery phase and an association in the combined analysis that exceeded the discovery phase. Chr = chromosome; SNP = single nucleotide polymorphism, MAF = minor allele

frequency, OR = odds ratio, Pmeta = meta-analysis p-value, PLMM = linear mixed model p-value, Pcombined = meta-analysis of discovery linear mixed model and associations from replication phase. * Cochrane’s Q test: p > 0.1, ** Cochrane’s Q test: p = 4.0 × 10-⁶

76 Interestingly, this variant was present on only 10 haplotypes in the 1000GP reference panel (MAF = 1.3%), compared to 62 haplotypes in our custom reference panel (MAF = 1.7%). As a result, more strata passed quality control for this variant by passing the allele frequency threshold of 1% (Supplementary Table 11). This demonstrates the benefit of the merged reference panel with ALS-specific content, which improved imputation and resulted in a genome-wide significant association.

Linear mixed models (LMM) can improve power while controlling for sample structure¹¹, 4 particularly in our study that included a large number of imperfectly balanced strata. Even though LMM for ascertained case-control data has a potential small loss of power¹¹, we judged the advantage of combining all strata while controlling the false positive rate, to be more important and therefore jointly analyzed all strata in a LMM to identify additional risk loci. There was no overall inflation of the linear mixed model’s test statistic compared to the meta-analysis (Supplementary Fig. 2). We observed modest inflation in the QQ-plot (λGC = 1.12, λ1000 = 1.01, Supplementary Fig. 3). LD score regression yielded an intercept of 1.10 (standard error 7.8 × 10-³). While the LD score regression intercept can indicate residual population stratification, which is fully corrected for in a LMM, the intercept can also reflect a distinct genetic architecture where most causal variants are rare, or a non-infinitesimal architecture12. The linear mixed model identified all four genome-wide significant associations from the meta-analysis. Furthermore, three additional loci that included the MOBP gene on 3p22.1 (rs616147), SCFD1 on 14q12 (rs10139154) and a long non-coding RNA on 8p23.2 (rs7813314) were associated at genome-wide significance (Fig. 2b, Table 1, Supplementary Tables 12–14). Interestingly, the SNPs in the MOBP locus have been reported in a GWAS on progressive supranuclear palsy (PSP)¹³ and as a modifier for survival in frontotemporal dementia (FTD)¹⁴. The putative pleiotropic effect of variants within this locus suggests a shared neurodegenerative pathway between ALS, FTD and PSP. We also found rs74654358 at 12q14.2 in the TBK1 gene approximating genome- wide significance (MAF = 4.9%, OR = 1.21 for A allele, p = 6.6 × 10-⁸). This gene was recently identified as an ALS risk gene through exome sequencing¹⁵,¹⁶.

In the replication phase, we genotyped the newly discovered associated SNPs in nine independent replication cohorts, totaling 2,579 cases and 2,767 controls. In these cohorts we replicated the signals for the C21orf2, MOBP and SCFD1 loci, with lower p-values in the combined analysis than the discovery phase (combined p-value = 3.08 × 10-¹⁰, p = 4.19 × 10–10 and p = 3.45 × 10-⁸ for rs75087725, rs616147 and rs10139154 respectively, Table 1, Supplementary Fig. 4)¹⁷. The combined signal for rs7813314 was less significant due to an opposite effect between the discovery and replication

77 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

25 Meta-analysis

20 C9orf72

15 alue) v P ( 10 o g l C21orf2 − 10 SARM UNC13A -8 P = 5 x 10

5

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Chromosomes

25 Linear mixed model

C9orf72

20

15 alue) v P ( 10 o g l SARM − UNC13A 10 C21orf2 MOBP -8 LOC101927815 P = 5 x 10 TBK1 SCFD1

5

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chromosomes

Figure 2. Meta-analysis and linear mixed model associations. (a) Manhattan plot for meta-analysis results. This yielded four genome-wide significant associations highlighted with names indicating the closest gene. The associated SNP in C21orf2 is a non-synonymous variant not found in previous GWAS. (b) Manhattan plot for linear mixed model results. This association analysis yielded three additional loci reaching genome-wide significance (MOBP, LOC101927815 and SCFD1). SNPs in the previously identified ALS risk gene TBK1 approached genome-wide significance (p = 6.6 × 10-⁸). Since the C21orf2 SNP was removed from a Swedish stratum because of a MAF < 1%, this SNP was tested separately, but is presented here together with all other SNPs with a MAF > 1% in every stratum. Here, LOC101927815 is colored grey because the association for this locus could not be replicated.

78 heritability by MAF heritability by chromosome

0.6 v

0.010 ALS (merged panel) ALS (HM3) SCZ (HM3) 0.5 0.008 7 5 1

4 0.4 9 tion) 0.006

17

14 0.3 16 10 3 2 heritability 13

0.004 11

12 itability (propo r 15 8 6 he r 0.2 22

0.002 20 18 21 19 0.1 4 r2 = 0.46 0.000 0.0 50 100 150 200 250 0.01 − 0.1 0.1 − 0.2 0.2 − 0.3 0.3 − 0.4 0.4 − 0.5

chromosome length (Mb) minor allele frequency

Figure 3. Partitioned heritability. (a) The heritability estimates per chromosome were strongly correlated with chromosome length (p = 4.9 × 10-⁴). (b) For ALS there was a clear trend where more heritability was explained within the lower allele frequency bins. This effect was still observed when, for a fair comparison between ALS and a previous study partitioning heritability for schizophrenia (SCZ) using identical methods²², SNPs present in HapMap3 (HM3) were included. The pattern for ALS resembles that observed in a rare variant model simulation performed in this study. Error bars reflect standard errors.

phase, indicating non-replication. Although replication yielded similar effect estimates for rs10139154 compared to the discovery phase, this was not statistically significant (p = 0.09) in the replication phase alone. This reflects the limited sample size of our replication phase, which is inherent to the low prevalence of ALS and warrants even larger sample sizes to replicate this signal robustly. There was no evidence for residual association within each locus after conditioning on the top SNP, indicating that all risk loci are independent signals. Apart from the C9orf72, UNC13A and SARM1 loci, we found no evidence for associations previously described in smaller GWAS (Supplementary Table 15).

The associated low-frequency non-synonymous SNP in C21orf2 suggested that this gene could directly be involved in ALS risk. Indeed, we found no evidence that linkage disequilibrium of sequenced variants beyond C21orf2 explained the association within this locus (Supplementary Fig. 5). In addition, we investigated the burden of rare coding mutations in a set of whole genome sequenced cases (N = 2,562) and controls (N = 1,138). After quality control these variants were tested using a pooled association test for rare variants corrected for population structure (T5 and T1 for 5% and 1% allele frequency, Supplementary Note). This revealed an excess of non-synonymous and loss-of-function mutations in C21orf2 among ALS cases that persists after conditioning on rs75087725

79 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

(pT5 = 9.2 × 10-⁵, pT1 = 0.01, Supplementary Fig. 6), which further supports that C21orf2 contributes to ALS risk.

In an effort to fine-map the other loci to susceptibility genes, we searched for SNPs in these loci with cis-eQTL effects observed in brain and other tissues (Supplementary Note, Supplementary Table 16)¹⁸. There was overlap with previously identified brain cis-eQTLs for five regions (Supplementary Fig. 7, Supplementary Table 17, Supplementary Data Set 1). Interestingly, within the C9orf72 locus we found that proxies of rs3849943 (LD r² = 0.21 - 0.56) had a brain cis-eQTL effect on C9orf72 only (minimal p = 5.27 × 10-⁷), which harbors the hexanucleotide repeat expansion that drives this GWAS signal. Additionally, we found that rs12608932 and its proxies within the UNC13A locus had exon-level cis-eQTL effect on KCNN1 in frontal cortex (p = 1.15 × 10-³)¹⁹. Another overlap was observed in the SARM1 locus where rs35714695 and its proxies had the strongest exon-level cis-eQTL effect on POLDIP2 in multiple brain tissues (p = 2.32 × 10-³). Within the SCFD1 locus rs10139154 and proxies had a cis-eQTL effect on SCFD1 in cerebellar tissue (p = 7.71 × 10-⁴). For the MOBP locus, rs1768208 and proxies had a cis-eQTL effect on RPSA (p = 7.71 × 10-⁴).

To describe the genetic architecture of ALS, we calculated polygenic scores that can be used to predict phenotypes for traits with a polygenic architecture²⁰. We calculated the SNP effects using a linear mixed model in 18 of the 27 strata and subsequently assessed their predictive ability in the other 9 independent strata. This revealed that a significant, albeit modest, proportion of the phenotypic variance could be explained by all SNPs (Nagelkerke r² = 0.44%, r² = 0.15% on the liability scale, p = 2.7 × 10-¹⁰, Supplementary Fig. 8). This finding adds to the existing evidence that ALS is a complex genetic trait with a polygenic architecture. To further quantify the contribution of common SNPs to ALS risk, we estimated the SNP-based heritability using three approaches, all assuming a population baseline risk of 0.25%²¹. GCTA-REML estimated the SNP-based heritability at 8.5% (SE 0.5%). Haseman-Elston regression yielded a very similar 7.9% and LD score regression estimated the SNP-based heritability at 8.2% (SE 0.5%). The heritability estimates per chromosome were strongly correlated with chromosome length (p = 4.9 × 10-⁴, r² = 0.46, Fig. 3a), which again is indicative of the polygenic architecture of ALS.

We found that the genome-wide significant loci only explained 0.2% of the heritability and thus the bulk of the heritability (8.3%, SE 0.3%) was captured in SNPs below genome- wide significance. This implies that many genetic risk variants have yet to be discovered. Understanding where these unidentified risk variants remain across the allele frequency spectrum will inform designing future studies to identify these variants. We, therefore,

80 estimated heritability partitioned by minor allele frequency. Furthermore, we contrasted this to common polygenic traits studied in GWASs such as schizophrenia. We observed a clear trend that indicated that most variance is explained by low-frequency SNPs (Fig. 3b). Exclusion of the C9orf72 locus, which harbors the rare pathogenic repeat expansion, and the other genome-wide significant loci did not affect this trend (Supplementary fig. 9). This architecture is different from that expected for common polygenic traits and reflects a polygenic rare-variant architecture observed in simulations²². 4 To gain better insight into the biological pathways that explain the associated loci found in this study we looked for enriched pathways using DEPICT²³. This revealed SNAP receptor (SNARE) activity as the only enriched category (FDR < 0.05, Supplementary Fig. 10). SNARE complexes play a central role in neurotransmitter release and synaptic function²⁴, which are both perturbed in ALS²⁵.

Although the biological role of C21orf2, a conserved -rich repeat protein, remains poorly characterized, it is part of the ciliome and is required for the formation and/or maintenance of primary cilia²⁶. Defects in primary cilia are associated with various neurological disorders and cilia numbers are decreased in G93A SOD1 mice, a well-characterized ALS model²⁷. C21orf2 has also been localized to mitochondria in immune cells²⁸ and is part of the interactome of the protein product of NEK1, which has previously been associated with ALS¹⁵. Both proteins appear to be involved in DNA repair mechanisms²⁹. Although future studies are needed to dissect the function of C21orf2 in ALS pathophysiology it is tempting to speculate that defects in C21orf2 lead to primary cilium and/or mitochondrial dysfunction or inefficient DNA repair and thereby adult onset disease. The other associated loci will require more extensive studies to fine-map causal variants. The SARM1 gene has been suggested as a susceptibility gene for ALS, mainly because of its role in Wallerian degeneration and interaction with UNC13A⁸,³⁰. Although these are indeed interesting observations, the brain cis-eQTL effect on POLDIP2 suggests that POLDIP2 and not SARM1 could in fact be the causal gene within this locus. Similarly, KCNN1, which encodes a neuronal potassium channel involved in neuronal excitability, could be the causal gene either through a direct eQTL effect or rare variants in LD with the associated SNP in UNC13A.

In conclusion, we identified a key role for rare variation in ALS and discovered SNPs in novel complex loci. Our study therefore informs future study design in ALS genetics: the combination of larger sample sizes, full genome coverage and targeted genome editing experiments, leveraged together to fine map novel loci, identify rare causal variants and thereby elucidate the biology of ALS.

81 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

ONLINE METHODS

Software packages used, their version, web source, and references are described in the Supplementary Table 18.

GWAS discovery phase and quality control Details on the acquired genotype data from previously published GWAS are described in Supplementary Table 1. Methods for case and control ascertainment for each cohort are described in the Supplementary Note. All cases and controls gave written informed consent and the relevant institutional review boards approved this study. To obtain genotype data for newly genotyped individuals, genomic DNA was hybridized to the Illumina OmniExpress array according to manufacturer’s protocol. Subsequent quality control included:

1. Removing low quality SNPs and individuals from each cohort. 2. Combining unbalanced cohorts based on nationality and genotyping platform to form case-control strata. 3. Removing low quality SNPs, related individuals and population outliers per stratum. 4. Calculate genomic inflation factors per stratum.

More details are described in the Supplementary Note and Supplementary Fig. 11. The number of SNPs and individuals failing each QC step per cohort and stratum are displayed in Supplementary Tables 2–5.

Whole genome sequencing (custom reference panel) Individuals were whole genome sequenced on the Illumina HiSeq 2500 platform using PCR free library preparation and 100bp paired-end sequencing yielding a minimum 35X coverage. Reads were aligned to the hg19 human genome build and after variant calling (Isaac variant caller) additional SNV and sample quality control was performed (Supplementary Note and Supplementary Fig. 12). Individuals in our custom reference panel were also included in the GWAS in strata sNL2, sNL3 and sNL4.

Merging reference panels All high quality calls in the custom reference panel were phased using SHAPEIT2 software. After checking strand and allele inconsistencies, both the 1000 Genomes Project (1000GP) reference panel (release 05-21-2011)³¹ and custom reference panel

82 were imputed up to the union of their variants as described previously³². Those variants with inconsistent allele frequencies between the two panels were removed.

Imputation accuracy performance To assess the imputation accuracy between different reference panels, 109 unrelated ALS cases of Dutch ancestry sequenced by Complete Genomics and 67 ALS cases from the UK sequenced by Illumina were selected as a test panel. All variants not present on the Illumina Omni1M array were masked and the SNVs on chromosome 4 20 were subsequently imputed back using four different reference panels (1000GP, GoNL, custom panel and merged panel). Concordance between the imputed alleles and sequenced alleles was assessed within each allele frequency bin where allele frequencies are calculated from the Dutch samples included in the Genome of the Netherlands cohort.

GWAS imputation Pre-phasing was performed per stratum using SHAPEIT2 with the 1000GP phase 1 (release 05-21-2011) haplotypes³¹ as a reference panel. Subsequently, strata were imputed up to the merged reference panel in 5 megabase chunks using IMPUTE2. Imputed variants with a MAF < 1% or INFO score < 0.3 were excluded from further analysis. Variants with allele frequency differences between strata, defined as deviating > 10SD from the normalized mean allele frequency difference between those strata and an absolute difference > 5%, were excluded, since they are likely to represent sequencing or genotyping artifacts. Imputation concordance scores for cases and controls were compared to assess biases in imputation accuracy (Supplementary Table 19).

Meta-analysis Logistic regression was performed on imputed genotype dosages under an additive model using SNPTEST software. Based on scree plots, 1 to 4 principal components were included per stratum. These results were then combined in an inverse-variance weighted fixed effect meta-analysis using METAL. No marked heterogeneity across strata was observed as the Cochrane’s Q test statistics did not deviate from the null- distribution (λ = 0.96). Therefore, no SNPs were removed due to excessive heterogeneity. The genomic inflation factor was calculated and the quantile-quantile plot is provided in Supplementary Fig. 3a.

83 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Linear mixed model All strata were combined including SNPs that passed quality control in every stratum. Subsequently the genetic relationship matrices (GRM) were calculated per chromosome including all SNPs using the Genome­Wide Complex Trait Analysis (GCTA) software package. Each SNP was then tested in a linear mixed model including a GRM composed of all chromosomes excluding the target chromosome (leave one chromosome out, LOCO). The genomic inflation factor was calculated and the quantile-quantile plot is provided as Supplementary Fig. 3b.

Replication For the replication phase independent ALS cases and controls from Australia, Belgium, France, Germany, Ireland, Italy, The Netherlands and Turkey that were not used in the discovery phase were included. A pre-designed TaqMan genotyping assay was used to replicate rs75087725 and rs616147. Sanger sequencing was performed to replicate rs10139154 and rs7813314 (Supplementary Note and Supplementary Table 20). All genotypes were tested in a logistic regression per country and subsequently meta- analyzed.

Rare variant analysis in C21orf2 The burden of non-synonymous rare variants in C21orf2 was assessed in whole genome sequencing data obtained from ALS cases and controls from The Netherlands, Belgium, Ireland, United Kingdom and the United States. After quality control the burden of non-synonymous and loss-of-function mutations in C21orf2 were tested for association per country and subsequently meta-analyzed. More details are provided in the Supplementary Note and Supplementary Fig. 13.

Polygenic risk scores To assess the predictive accuracy of polygenic risk scores in an independent dataset SNP weights were assigned based on the linear mixed model (GCTA-LOCO) analysis in 18/27 strata. SNPs in high LD (r² > 0.5) within a 250 kb window were clumped. Subsequently, polygenic risk scores for cases and controls in the 9 independent strata were calculated based on their genotype dosages using PLINK v1.9. To obtain the Nagelkerke R2 and corresponding p-values these scores were then regressed on their true phenotype in a logistic regression where (based on scree plots) the first three PCs, sex and stratum were included as covariates.

84 SNP-based heritability estimates GCTA-REML. GRMs were calculated using GCTA software including genotype dosages passing quality control in all strata. Based on the diagonal of the GRM individuals representing subpopulations that contain an abundance of rare alleles (diagonal values mean +/- 2SD) were removed (Supplementary Fig. 14a). Pairs where relatedness (off-diagonal) exceeded 0.05 were removed as well (Supplementary Fig. 14b). The eigenvectors for the first 10 PCs were included as fixed effects to account for more subtle population structure. The prevalence of ALS was defined as the life-time morbid 4 risk for ALS (i.e. 1/400)²¹. To estimate the SNP-based heritability for all non-genome- wide significant SNPs, genotypes for the SNPs reaching genome-wide significance were modeled as fixed effect. The variance explained by the GRM therefore reflects the SNP-based heritability of all non-genome-wide significant SNPs. SNP-based heritability partitioned by chromosome or MAF was calculated by including multiple GRMs, calculated on SNPs from each chromosome or within the respective frequency bin, in one model.

Haseman-Elston regression The Phenotype correlation - Genotype correlation (PCGC) regression software package was used to calculate heritability based on the Haseman-Elston regression including the eigenvectors for the first 10 PCs as covariates. The prevalence was again defined as the life-time morbid risk (1/400).

LD score regression Summary statistics from GCTA-LOCO and LD scores calculated from European individuals in 1000GP were used for LD score regression. Strongly associated SNPs (p < 5 × 10-⁸) and variants not in HapMap3 were excluded. Considering adequate correction for population structure and distant relatedness in the linear mixed model, the intercept was constrained to 1.0¹².

Biological pathway analysis (DEPICT) Functional interpretation of associated GWAS loci was carried out using DEPICT, using locus definition based on 1000GP phase 1 data. This method prioritizes genes in the affected loci, predicts involved pathways, biological processes and tissues, using gene co-regulation data from 77,840 expression arrays. Three separate analyses were performed for GWAS loci reaching p = 10-⁴, p = 10-⁵ or p = 10-⁶. One thousand permutations were used for adjusting the nominal enrichment p-values for biases and additionally 200 permutations were used for FDR calculation.

85 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

REFERENCES

1. Hardiman, O., van den Berg, L. H. & Kiernan, M. C. Clinical diagnosis and management of amyotrophic lateral sclerosis. Nat Rev Neurol 7, 639–649 (2011). 2. Al-Chalabi, A. et al. An estimate of amyotrophic lateral sclerosis heritability using twin data. J. Neurol. Neurosurg. Psychiatry 81, 1324–1326 (2010). 3. van Es, M. A. et al. Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat. Genet. 41, 1083–1087 (2009). 4. Laaksovirta, H. et al. Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet Neurol. 9, 978–985 (2010). 5. Shatunov, A. et al. Chromosome 9p21 in sporadic amyotrophic lateral sclerosis in the UK and seven other countries: a genome-wide association study. Lancet Neurol. 9, 986–994 (2010). 6. DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in noncoding region of C9orf72 causes chromosome 9p-linked FTD and ALS. Neuron 72, 245–256 (2011). 7. Renton, A. E. et al. A hexanucleotide repeat expansion in C9orf72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72, 257–268 (2011). 8. Fogh, I. et al. A genome-wide association meta-analysis identifies a novel locus at 17q11.2 associated with sporadic amyotrophic lateral sclerosis. Hum. Mol. Genet. 23, 2220–2231 (2014). 9. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). 10. Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat. Genet. 46, 818–825 (2014). 11. Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014). 12. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome- wide association studies. Nat. Genet. 47, 291–295 (2015). 13. Höglinger, G. U. et al. Identification of common variants influencing risk of the tauopathy progressive supranuclear palsy. Nat. Genet. (2011). 14. Irwin, D. J. et al. Myelin oligodendrocyte basic protein and prognosis in behavioral-variant frontotemporal dementia. Neurology 83, 502–509 (2014). 15. Cirulli, E. T. et al. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. Science 347, 1436–1441 (2015). 16. Freischmidt, A. et al. Haploinsufficiency of TBK1 causes familial ALS and fronto-temporal dementia. Nat. Neurosci. 18, 631–636 (2015). 17. kol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006). 18. Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, (2010). 19. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014) 20. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

86 21. Johnston, C. A. et al. Amyotrophic lateral sclerosis in an urban setting: a population based study of inner city London. J. Neurol. 253, 1642–1643 (2006). 22. Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012). 23. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890–20 (2015). 24. Ramakrishnan, N. A., Drescher, M. J. & Drescher, D. G. The SNARE complex in neuronal and sensory cells. Mol. Cell. Neurosci. 50, 58–69 (2012). 25. Ferraiuolo, L., Kirby, J., Grierson, A. J., Sendtner, M. & Shaw, P. J. Molecular pathways of motor neuron injury 4 in amyotrophic lateral sclerosis. Nat. Rev. Neurol. 7, 616–630 (2011). 26. Lai, C. K. et al. Functional characterization of putative cilia genes by high- content analysis. Mol. Biol. Cell. 22, 1104–1119 (2011). 27. Ma, X., Peterson, R. & Turnbull, J. Adenylyl cyclase type 3, a marker of primary cilia, is reduced in primary cell culture and in lumbar spinal cord in situ in G93A SOD1 mice. BMC. Neurosci. 12, 71 (2011). 28. Krohn, K., Ovod, V., Vilja, P., Heino, M. & Scott, H. Immunochemical characterization of a novel mitochondrially located protein encoded by a nuclear gene within the DFNB8/10 critical region on 21q22. 3. Biochem. Biophys. Res. Commun. (1997). 29. Fang, X. et al. The NEK1 interactor, C21orf2, is required for efficient DNA damage repair. Acta Biochim. Biophys. Sin. 47, 834–841 (2015) 30. Vérièpe, J., Fossouo, L. & Parker, J. A. Neurodegeneration in C. elegans models of ALS requires TIR-1/SARM1 immune pathway activation in neurons. Nat. Commun. 6, 7319 (2015). 31. Delaneau, O., Marchini, J. & 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014). 32. Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457-70 (2011).

87 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

FULL AUTHOR LIST

Wouter van Rheenen*, Aleksey Shatunov*, Annelot M. Dekker, Russell L. McLaughlin, Frank P. Diekstra, Sara L. Pulit, Rick A.A. van der Spek, Urmo Võsa, Simone de Jong, Matthew R. Robinson, Jian Yang, Isabella Fogh, Perry T.C. van Doormaal, Gijs H.P. Tazelaar, Max Koppers, Anna M. Blokhuis, William Sproviero, Ashley R. Jones, Kevin P. Kenna, Kristel R. van Eijk, Oliver Harschnitz, Raymond D. Schellevis, William J. Brands, Jelena Medic, Androniki Menelaou, Alice Vajda, Nicola Ticozzi, Kuang Lin, Boris Rogelj, Katarina Vrabec, Metka Ravnik-Glavač, Blaž Koritnik, Janez Zidar, Lea Leonardis, Leja Dolenc Grošelj, Stéphanie Millecamps, François Salachas, Vincent Meininger, Mamede de Carvalho, Susana Pinto, Jesus S. Mora, Ricardo Rojas-García, Meraida Polak, Siddharthan Chandran, Shuna Colville, Robert Swingler, Karen E. Morrison, Pamela J. Shaw, John Hardy, Richard W. Orrell, Alan Pittman, Katie Sidle, Pietro Fratta, Andrea Malaspina, Simon Topp, Susanne Petri, Susanne Abdulla, Carsten Drepper, Michael Sendtner, Thomas Meyer, Roel A. Ophoff, Kim A. Staats, Martina Wiedau- Pazos, Catherine Lomen-Hoerth, Vivianna M. Van Deerlin, John Q. Trojanowski, Lauren Elman, Leo McCluskey, A. Nazli Basak, Ceren Tunca, Hamid Hamzeiy, Yesim Parman, Thomas Meitinger, Peter Lichtner, Milena Radivojkov-Blagojevic, Christian R. Andres, Cindy Maurel, Gilbert Bensimon, Bernhard Landwehrmeyer, Alexis Brice, Christine A.M. Payan, Safaa Saker-Delye, Alexandra Dürr, Nicholas W. Wood, Lukas Tittmann, Wolfgang Lieb, Andre Franke, Marcella Rietschel, Sven Cichon, Markus M. Nöthen, Philippe Amouyel, Christophe Tzourio, Jean-François Dartigues, Andre G. Uitterlinden, Fernando Rivadeneira, Karol Estrada, Albert Hofman, Charles Curtis, Hylke M. Blauw, Anneke J. van der Kooi, Marianne de Visser, An Goris, Markus Weber, Christopher E. Shaw, Bradley N. Smith, Orietta Pansarasa, Cristina Cereda, Roberto Del Bo, Giacomo P. Comi, Sandra D’Alfonso, Cinzia Bertolin, Gianni Sorarù, Letizia Mazzini, Viviana Pensato, Cinzia Gellera, Cinzia Tiloca, Antonia Ratti, Andrea Calvo, Cristina Moglia, Maura Brunetti, Simona Arcuti, Rosa Capozzo, Chiara Zecca, Christian Lunetta, Silvana Penco, Nilo Riva, Alessandro Padovani, Massimiliano Filosto, Bernard Muller, Robbert Jan Stuit, PARALS registry, SLALOM group, SLAP registry, FALS Sequencing Consortium, SLAGEN Consortium, NNIPPS Study Group, Ian Blair, Katharine Zhang, Emily P. McCann, Jennifer A. Fifita, Garth A. Nicholson, Dominic B. Rowe, Roger Pamphlett, Matthew C. Kiernan, Julian Grosskreutz, Otto W. Witte, Thomas Ringer, Tino Prell, Beatrice Stubendorff, Ingo Kurth, Christian A. Hübner, P. Nigel Leigh, Federico Casale, Adriano Chio, Ettore Beghi, Elisabetta Pupillo, Rosanna Tortelli, Giancarlo Logroscino, John Powell, Albert C. Ludolph, Jochen H. Weishaupt, Wim Robberecht, Philip Van Damme, Lude Franke, Tune H. Pers, Robert H. Brown, Jonathan D. Glass, John E. Landers, Orla Hardiman,

88 Peter M. Andersen, Philippe Corcia, Patrick Vourc’h, Vincenzo Silani, Naomi R. Wray, Peter M. Visscher, Paul I.W. de Bakker, Michael A. van Es, R. Jeroen Pasterkamp, Cathryn M. Lewis, Gerome Breen, Ammar Al-Chalabi**, Leonard H. van den Berg**, Jan H. Veldink**

* These authors contributed equally 4 ** These authors jointly directed this work

89 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

SUPPLEMENTARY METHODS

Custom reference panel sequencing and quality control Whole genome sequencing ALS cases from The Netherlands and population-based controls matched for sex, age and geographical region were selected for whole genome sequencing as well as the affected individuals of 11 identical discordant twin pairs. All DNA samples were sequenced with Illumina’s FastTrack services (San Diego, USA) using PCR free library preparation and paired-end (100bp) sequencing on the HiSeq 2500 platform (Illumina®, San Diego, Illumina) to yield 35X coverage at minimum. Reads were aligned to the hg19 human genome build using BWA alignment software and the Isaac variant caller was used to call single nucleotide variants. Quality scores for reference calls were retrieved from genome VCFs (gVCF) and variants not passing Isaac’s quality filter were set to missing.

SNV quality control In total, 75,818,355 SNVs were called. Quality control was performed using PLINK v1.9. First multi-allelic and non-autosomal variants were excluded (n = 12,430,295). Subsequently SNVs with a call-rate 0.98 (for reference and variant alleles) and those deviating from Hardy-Weinberg equilibrium (p ≤ 1.0 ×10-⁴) were removed (n = 23,480,030). Finally singletons (n = 21,166,529), that are notoriously hard to phase, were excluded. Ultimately, 18,741,501 SNVs were included in our ALS-enriched reference panel.

Sample quality control Samples discordant for reported and genetically determined gender were excluded (n = 5). Comparing genotypes derived from whole genome sequencing and the Illumina2.5M SNP array revealed high concordance between both methods (mean 99.3%, SD 2.1%) and very little discordant calls (mean 0.03%, SD 0.2%) meaning that most differences were caused by missing calls in the sequencing data or SNP-array (mean 0.6%, SD 2.0%). Five individuals were removed due to higher sequencing versus SNP-array discordance (> 1.0%). For the remaining 1,861 individuals over 98% of the 18,741,510 SNVs were successfully called (mean 99.8%). To assess population substructures all individuals were projected along the first two principal components calculated on the HapMap3 individuals (Supplementary figure 11). Since population diversity will improve imputation accuracy no population outliers were removed. Finally, 1,246 ALS cases and 615 healthy controls passed quality control and were included in the reference panel.

90 GWAS sample ascertainment A full description of the GWAS cohorts is provided with the online supplementary information at: http://www.nature.com/ng/journal/v48/n9/full/ng.3622.html

GWAS quality control Quality control (QC) was first performed per cohort to remove low quality SNPs and individuals using PLINK 1.9. SNPs were first annotated according to dbSNP137 and mapped to the hg19 reference genome. Subsequently, multi-allelic and AT/CG SNPs were 4 removed as well as SNPs with a call-rate < 98%, < 10 minor allele observations per cohort, biased missingness as determined by haplotype and non-autosomal SNPs. Individuals with gender mismatches or an excessive number of heterozygous SNPs (F < -0.2) were removed. To check strand inconsistencies or annotation errors, allele frequencies between each cohort were compared to those observed in the European population represented in 1000GP. The number of SNPs and individuals failing each QC step within each cohort are described in Supplementary Table 2. Considering the low number of overlapping SNPs between all different platforms (n = 48,229), cohorts and the presence of cohorts with cases only or controls only, cohorts were combined based on reported nationality and genotyping platform. Quality control per stratum included removal of SNPs that deviated from Hardy-Weinberg equilibrium (p < 1 × 10-⁵ and p < 1 × 10-⁹ in controls and cases respectively) and those with biased missingness between cases and controls (p < 1 × 10-³). Subsequently, related and duplicate individuals across all strata (pi-hat > 0.1) were removed. Individuals were projected along the first four principal components (PC) calculated on HapMap3 individuals using EIGENSTRAT. Population outliers, defined as deviation > 10SD from the HapMap CEU population mean on PCs 1-4 or > 4SD from its stratum mean on PC1-2, were removed. After removing population outliers, PCs were recalculated on a LD-pruned set of SNPs for each stratum and again outliers (> 5SD on PC1-4) were removed. The procedure for removing outliers, using PCs and its results are summarized in Supplementary Figure 12. After removing all outliers, PCs were again recalculated. Based on scree plots per stratum the eigenvectors for the first 1-4 PCs were included in the logistic regression. To calculate genomic inflation factors per stratum, the test statistic’s empirical quantiles were obtained applying logistic regression in an additive model. The number of SNPs and individuals failing each QC step across all strata are displayed in Supplementary Table 3.

91 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Genotyping experiments in replication phase TaqMan assay (rs75087725 and rs616147) The reaction mix included 1.0 μl of genomic DNA (10 ng/μl), 0.25 μl TaqMan genotyping assay 20X (Life Technologies), 2.5 μl MasterMix 2X (Life Technologies) and 1.25 ul MilliQ. The thermocycler program included 30 sec at 60˚C, 10 min at 95˚C followed by 40 cycles of 15 sec at 95˚C and 1 min at 60˚C and a final step of 30 sec at 60˚C. Fluorescent signals were analyzed on a QuantStudio 6 Flex Real-Time PCR System and genotypes were determined by allelic discrimination using QuantStudio 6 Flex Real-Time PCR System Software (Life Technologies).

Sanger sequencing (rs10139154 and rs7813314) The oligonucleotide primers were used for amplification (Supplementary Table 7). PCR reactions consisted of 1.0 μL genomic DNA (50 ng/μl), 1.0 μL 10xNH4 reaction buffer, 0.2 μL dinucleotide triphosphate (dNTP; 10mM each), 0.3 μL MgCl2 (50mM), 0.1 μL Biotaq (5U/μL), 7.0 μL milli-Q, 0.2 μL of each primer (10 μM) in a total volume of 10 μL. The thermocycler program included initialization of 3 min at 96˚C followed by 35 cycles of 30 sec at 96˚C, 45 sec at 57˚C and 1 min at 72˚C and a final step of 5 min at 72˚C. The PCR products were electrophoresed in 1.2-1.5% agarose gel containing 0.02‰ Ethidium Bromide and visualized using a Proxima AQ 4.2 Imager. Based on initial sequencing results, the reverse primer was used for analysis of both loci. The thermocycler program for sequencing consisted of 1 min at 96˚C followed by 25 cycles of 10 sec at 96˚C, 5 sec at 50˚C and 4 min at 60˚C. Sequencing was performed on an Applied Biosystems 3730 DNA-Analyzer using BigDye Terminator 3.1 sequencing kit (Applied Biosystems).

Cases and controls were randomly assigned on plates and experimenters were blinded for case-control status when calling genotypes.

Rare variant association analysis for C21orf2 Data merging For burden testing in the C21orf2 locus, cases and controls were available from the following collection sites: the Netherlands (N = 2,089), Belgium (N = 421), Ireland (N = 386), the United Kingdom (N = 687), and the United States (N = 393). Samples were received as individual-level genome VCF (gVCF) files containing single nucleotide variants (SNVs) and insertions/deletions (indels). The gVCF files were merged together by cohort using the Illumina aggregation (“agg”) tool available on GitHub (https:// github.com/Illumina/agg). The agg tool ensures that all samples in a given cohort were genotyped at the union of all sites observed across all samples. The resulting file is a (per-

92 chromosome) VCF file that contains only those sites where at least one non-reference allele is observed in the set of cases and controls.

Sample quality control Sample and site QC were performed using PLINK1.9 and the most recent version of VCFtools. After the samples within each cohort had been merged into per-chromosome VCF files, we removed all sites with a quality (QUAL) score < 30. Genotypes with a genotype quality < 10 were set to missing on an individual-level basis. We then checked 4 all samples for missingness across the autosomal chromosomes; all samples had missingness < 10% and thus no samples were excluded at this point in QC. Principal components were calculated on a set of LD-pruned (R2 > 0.2) high-quality sites (MAF > 0.1, call-rate > 99.9%, not A/T or C/G, biallelic, outside the MHC, LCT locus and inversions on chromosome 8 and 17). All cases and controls were projected on to the reference HapMap3 populations using EIGENSTRAT. Individuals not of European ancestry (using the European populations in HapMap 3 as a reference) were removed from the analysis. The cleaned principal component plots are displayed in Supplementary figure 13. Using this same set of high-quality SNPs, we also removed samples with excessively high or low inbreeding coefficients (further than 3 standard deviations from the mean of the inbreeding coefficient distribution in the cohort). A single individual, preferably cases or those with the lowest overall missingness, from each related pair (pi-hat > 0.125, indicating a cousin relationship or closer) was removed. Finally, we checked the average depth of coverage across each of the samples; all samples looked appropriately covered (within 6 SD of the mean of the average depth distribution) and no sample was removed at this step.

Site quality control We then performed site QC on sites on chromosome 21 only (as this is the chromosome containing the C21orf2 locus). All sites with missingness < 5% or with an average read depth < 5 or > 50 were removed from the analysis. Sites out of Hardy-Weinberg in controls (p < 10-⁶) or with excessive differential missingness between cases and controls (p < 10-⁶) were also removed. Once sample and site QC was complete, the following samples were available for burden testing: a, The Netherlands: 1275 cases, 677 controls b, Ireland: 251 cases, 134 controls c, Belgium: 260 cases, 133 controls d, United States: 275 cases, 65 controls e, United Kingdom: 501 cases, 129 controls

93 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Functional annotation Sites passing QC were functionally annotated using ANNOVAR. All variants with a MAF < 0.05 that were either non-synonymous or loss of function (stop gain or frameshift indel) were kept for burden testing.

Association testing Principal components were recalculated per cohort on cases and controls that passed quality control. To check that the calculated PCs appropriately corrected for population stratification, we ran single-variant analysis (per cohort) on all of chromosome 21 using logistic regression and including the top ten principal components as covariates. We plotted quantile-quantile plots of the results and checked the genomic inflation factor for signs of population stratification. The data looked appropriately distributed (highest lambda = 1.06, in the Netherlands cohort) and thus we proceeded to burden testing in C21orf2 (Supplementary Figure 13).

Per cohort, we extracted all non-synonymous and loss of function variants in C21orf2 with MAF < 5% for burden testing. Burden tests were corrected with the top ten principal components as covariates. Burden testing was performed using ScoreSeq, which performs five burden tests: T1, T5, Frequency-weighted (also called the Madsen- Browning test), SKAT, and EREC. The first three tests assume the same direction of effect of all variants across a given locus. The last two tests allow for multiple directions of effect in the same locus. We set the software to perform 1 million permutations to evaluate significance of the burden test. Subsequently, these results were meta- analyzed using MASS, the companion software to ScoreSeq.

QTL analysis Linkage disequilibrium data and cis-eQTLs for different tissues were acquired from SNiPA webtool (http://snipa.helmholtz-muenchen.de/snipa/, accessed 28.03.2016)²⁴, using 1000G p1v3 for European population and ENSEMBL v80 genome annotations. Brain cis-eQTLs were manually curated from several brain eQTL studies (Supplementary Table 7)²⁵-³⁶. Additionally, cis-eQTLs effects observed in non-brain tissues were described for all SNPs within the genome-wide significant loci³⁶-⁴⁰. For BRAINEAC database, cis- eQTLs were queried for SNP-gene combinations locating in the suggestive GWAS loci (LMM, p < 10-⁴), as defined by DEPICT (v139, R² > 0.5).

Stranded RNA-seq tracks were acquired from Epigenome Roadmap Project data portal (http://egg2.wustl.edu/roadmap/web_portal/) for one fetal brain⁴¹.

94 SUPPLEMENTARY FIGURES

A selection of the supplementary methods and figures is provided here. The full supplementary information is provided online with the published version of this chapter: http://www.nature.com/ng/journal/v48/n9/full/ng.3622.html.

linear mixed model QQ-plot meta-analysis QQ-plot 4

6 6 alue) alue) v v P ( ( P 10 4 10 4 g l o g o l − − v e d r b s e b s e r v d o 2 o 2

λGC = 1.118 λ = 1.015 λ GC GC1000 = 1.007 λGC1000 = 1.001 0 0

0 2 4 6 0 2 4 6 − expected log10(P value) expected −log10(P value)

Supplementary Figure 1. Quantile-quantile plots. (a) Meta-analysis and (b) Linear mixed model. For presentation purposes p-values < 5 × 10-⁸ are plotted at 5 × 10-⁸.

95 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

rs75087725 (C21orf2) Cases Contols MAF(cases) MAF(controls) OR 95%−CI P Odds Ratio

Discovery Linear Mixed Model 12577 23475 0.019 0.014 1.45 [1.28; 1.65] 2.65e-09

Replication Australia 519 661 0.010 0.008 1.28 [0.53; 3.10] Belgium 93 92 0.038 0.016 2.41 [0.60; 9.64] France 606 530 0.021 0.008 2.49 [1.15; 5.39] Germany 116 46 0.022 0.022 0.99 [0.19; 5.30] Ireland 209 359 0.007 0.008 0.88 [0.25; 3.12] Italy 348 227 0.016 0.007 2.44 [0.67; 8.83] Netherlands 547 778 0.016 0.011 1.49 [0.77; 2.86] Turkey 141 74 0.000 0.014 NA Replication 2579 2767 1.65 [1.14; 2.38] 3.89e-03

Discovery + replication 15156 26242 1.47 [1.31; 1.66] 3.08e-10

0.2 0.5 1 2 5

rs616147 (MOBP) Cases Contols MAF(cases) MAF(controls) OR 95%−CI P Odds Ratio

Discovery Linear Mixed Model 12577 23475 0.303 0.276 1.10 [1.06; 1.14] 1.43e-08

Replication Australia 519 661 0.279 0.265 1.07 [0.89; 1.29] Belgium 93 92 0.280 0.272 1.04 [0.66; 1.63] France 606 530 0.310 0.283 1.13 [0.95; 1.36] Germany 116 46 0.291 0.228 1.38 [0.79; 2.41] Ireland 209 359 0.301 0.246 1.32 [1.00; 1.73] Italy 348 227 0.362 0.303 1.32 [1.01; 1.71] Netherlands 547 778 0.299 0.289 1.05 [0.88; 1.26] Turkey 141 74 0.375 0.372 1.01 [0.67; 1.52] Replication 2579 2767 1.13 [1.04; 1.23] 2.35e-03

Discovery + replication 15156 26242 1.10 [1.07; 1.14] 4.19e-10

0.5 1 2

rs10139154 (SCFD1) Cases Contols MAF(cases) MAF(controls) OR 95%−CI P Odds Ratio

Discovery Linear Mixed Model 12577 23475 0.337 0.312 1.09 [1.06; 1.13] 4.95e-08

Replication Australia 519 661 0.311 0.298 1.06 [0.89; 1.26] Belgium 93 92 0.367 0.285 1.48 [0.94; 2.32] France 606 530 0.307 0.333 0.89 [0.74; 1.06] Germany 116 46 0.306 0.275 1.19 [0.65; 2.16] Ireland 209 359 0.293 0.246 1.29 [0.98; 1.71] Italy 348 227 0.408 0.384 1.10 [0.82; 1.48] Netherlands 547 778 0.331 0.320 1.06 [0.89; 1.25] Turkey 141 74 0.383 0.351 1.15 [0.76; 1.75] Replication 2579 2767 1.06 [0.97; 1.15] 9.55e-02

Discovery + replication 15156 26242 1.09 [1.06; 1.12] 3.45e-08

0.5 1 2

rs7813314 (lncRNA) Cases Contols MAF(cases) MAF(controls) OR 95%−CI P Odds Ratio

Discovery Linear Mixed Model 12577 23475 0.093 0.102 0.87 [0.82; 0.91] 3.14e-08

Replication Australia 519 661 0.161 0.101 1.59 [1.27; 1.98] Belgium 93 92 0.063 0.12 0.46 [0.21; 1.02] France 606 530 0.098 0.089 1.12 [0.83; 1.53] Germany 116 46 0.072 0.081 0.87 [0.31; 2.42] Ireland 209 359 0.116 0.091 1.33 [0.89; 1.99] Italy 348 227 0.109 0.096 1.17 [0.71; 1.92] Netherlands 547 778 0.110 0.111 0.99 [0.77; 1.27] Turkey 141 74 0.101 0.164 0.58 [0.32; 1.03] Replication 2579 2767 1.17 [1.03; 1.33] 7.75e-03

Discovery + replication 15156 26242 0.90 [0.86; 0.94] 1.05e-05

0.5 1 2

Supplementary Figure 4. Replication results. Forest plot for the inverse variance fixed effect meta-analysis of the discovery phase and replication cohorts. OR = odds ratio, CI = confidence interval.

96 C21orf2 NM_001271441 ("−" strand)

p.T150I (43x) p.R133H (1x) p.G153S (2x) p.R117H (3x) p.T182I (3x) p.R106C (1x) p.G183S (1x) p.R73P (2x) p.G301R (2x) p.P192L (3x) p.P75L (1x) Cases p.A374T (5x) p.S327L (1x) p.S207L (1x) p.V58L (51x) p.V20M (1x) (n = 1,275)

Netherlands7 7 6 5 4 3 2 1 1

p.A353T (1x) Controls p.P333L (2x) p.R133H (2x) p.V58L (15x) p.V20M (1x) (n = 677) p.A305P (1x) p.E146Q (1x) p.G301R (1x) p.C260W (1x) p.T150I (15x) p.W294S (1x) p.S207L (1x) p.P174L (1x) p.P192L (1x) p.T182I (2x) 4 p.T150I (13x) p.V246G (1x) p.E148Q (1x) p.P75L (1x) Cases p.A374T (2x) p.C298R (1x) p.T182I (1x) p.R133H (1x) p.V58L (18x) (n = 260)

Belgium 7 6 5margan is lief 4 3 2 1

Controls p.V246G (1x) p.T150I (8x) p.V58L (5x) (n = 133)

p.A167T (1x) p.R361Q (1x) p.T150I (15x) p.Y68* (1x) Cases p.A374T (3x) p.P333L (1x) p.V246G (3x) p.T182I (1x) p.V58L (6x) (n = 251) p.R73P (1x)

1 Ireland 7 7 6 5 4 3 2 1

Controls p.T150I (5x) (n = 134)

p.G153S (2x) c.998_999TG (1x) p.R172W (1x) p.T150I (10x) p.R278C (1x) p.T182I (1x) p.R117H (2x) Cases p.A374T (3x) p.T232I (1x) p.V58L (11x) (n = 275)

United States 7 6 5 4 3 2 1

Controls p.A374T (1x) p.T150I (2x) p.V58L (1x) (n = 65) c.767_767delinsCCGTGGGGAGGGAGCATGGAGCCTCACAGGGCCT (1x) p.G301R (1x)

p.G153S (1x) p.T182I (1x) p.T150I (30x) p.A184T (1x) p.R117H (1x) p.R106C (2x) p.G350E (1x) c.631_633G (1x) p.V58L (28x) Cases p.A374T (2x) p.S327L (1x) p.V246G (3x) p.T46M (1x) p.C25S (1x) (n = 501)

United Kingdom7 7 6 5 4 3 2 1

Controls p.P333L (1x) p.T150I (8x) p.R73P (1x) p.V58L (2x) (n = 129)

Supplementary Figure 6. C21orf2 rare variant burden. Summary of the rare (MAF < 0.05) non-synonymous and loss-of-function mutations in the canonical transcript of C21orf2. Conditioning on the SNP found to be associated in the GWAS (rs75087725, p.V58L, colored grey) there was an increased burden of non-synonymous and loss-of-function mutations among ALS cases (pT5 = 9.2 × 10-⁵, pT1 = 0.01). Odds-ratios (calculated by counting alleles in cases and controls per stratum, unadjusted for PCs combined in a Cochran-Mantel Haenszel test) are 1.63 and 1.48 for the T5 and T1 burden respectively. The two loss-of-function mutations observed in cases are colored red. 97 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

a b P value P value

× −8 × −16 5 10 5 10 1 1×10−9 0.007 0.004 0.006 0.005 0.003 2 2 0.004 e r e r k k e r e r 0.002 0.003 Nagel k Nagel k 0.002 0.001 0.001 0.000 0.000 × −8 × −7 × −6 × −5 × −4 × −3 5 10 5 10 5 10 5 10 5 10 5 10 0.05 0.1 0.2 0.3 0.4 0.5 all 5 × 10−8 5 × 10−7 5 × 10−6 5 × 10−5 5 × 10−4 5 × 10−3 0.05 0.1 0.2 0.3 0.4 0.5 all

SNPs 6 15 51 259 1.6K 11.8K 83.8K 151K 272K 383K 488K 587K 953K SNPs 1 7 35 226 1.5K 10.5K 75.2K 136K 244K 344K 437K 525K 854K GWAS lower P value cutoff GWAS lower P value cutoff

Supplementary Figure 8. Polygenic risk scores. (a) Polygenic risk score analyses where 9 cohorts were used as targets. Best predictions were made when including the 6 genome-wide significant SNPs from the C9orf72 and UNC13A loci only. (b) Polygenic risk score analyses excluding all variants on chromosome 9. Increased polygenic risk score predictions were made when including more variants by lowering the p-value threshold. Note that the overall prediction accuracy is lower than when SNPs on chromosome 9 were included.

Supplementary Tables Supplementary tables 2-5,11,16,18-20 are provided online with the published version of this chapter: http://www.nature.com/ng/journal/v48/n9/full/ng.3622.html.

98 Supplementary Table 1. Description of cohorts

Cohort Name Country Cases Controls Platform 1 NL1 The Netherlands 461 450 Illumina317K 2 BE1 Belgium 311 371 Illumina370K 3 NL2 The Netherlands 582 629 Illumina370K 4 SW1 Sweden 493 500 Illumina370K 5 NL3 The Netherlands 0 5,974 Illumina550K 6 FR1 France 251 724 Illumina317K 7 UK1 United Kingdom 245 221 Illumina317K 8 US1 United States 753 811 Illumina317K 4 9 IR1 Ireland 221 211 Illumina550K 10 IR2 Ireland 103 127 Illumina610K 11 UK2 United Kingdom 661 0 Illumina550K 12 US2 United States 0 527 Illumina550K 13 US3 United States 276 0 Illumina550K 14 IT1 Italy 141 0 Illumina610K 15 IT2 Italy 261 246 Illumina550K 16 UK3 United Kingdom 0 2,501 Illumina1M 17 UK4 United Kingdom 0 2,699 Illumina1M 18 US4 United States 0 867 Illumina370K 19 US5 United States 0 1,986 Illumina1M 20 FIN1 Finland 0 201 Illumina370K 21 FIN2 Finland 401 191 Illumina370K 22 FIN3 Finland 0 103 Illumina1M 23 IT3 Italy 1,792 1,107 Illumina660W 24 FR2 France 0 1,100 Illumina550K 25 GER1 Germany 0 677 Illumina550K 26 NL4 The Netherlands 1,226 2,262 IlluminaOmniExpress 27 GER2 Germany 580 286 IlluminaOmniExpress 28 IT4 Italy 311 100 IlluminaOmniExpress 29 PU1 Portugal 40 54 IlluminaOmniExpress 30 SP1 Spain 105 63 IlluminaOmniExpress 31 SWISS1 Switzerland 228 236 IlluminaOmniExpress 32 BE2 Belgium 225 250 IlluminaOmniExpress 33 FIN4 Finland 144 0 IlluminaOmniExpress 34 IR3 Ireland 268 478 IlluminaOmniExpress 35 SW2 Sweden 281 271 IlluminaOmniExpress 36 US6 United States 65 42 IlluminaOmniExpress 37 GER3 Germany 1,519 0 IlluminaCoreExome 38 FR3 France 363 0 IlluminaOmniExpress 39 US7 United States 573 0 IlluminaOmniExpress 40 UK5 United Kingdom 1,214 14 IlluminaOmniExpress 41 NL5 The Netherlands 697 619 Illumina2.5M Total 14,791 26,898 New 7,839 4,675

99 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Supplementary Table 6. rs3849943 details

r2 100 25 rs3849943 ● ● ● ●●● 0.8 sNL1 ●●● sBE1 ● ● 0.6 ● ●● sNL2 ●● ● ● 80 ●●●●●●● ● 0.4 sSW1 ●●●●● ● Recombination rate (cM/Mb) 20 ●●●●● ● ● 0.2 sFR1 sUK1 sUS1 sIR1 ●●●●●● ● 60 sUK2 15 ●●● ●● sUS2 ● ● sIT1 (p − value) sFIN1 10 ● g ● ● ● sNL3 l o ● ● ● 40 sGER1 − 10 ●●● ●●● ● ●● sIT2 ● ●●● ● ●●● sIB1 ● ●● ● ●● ● ●●●●● sSWISS1 ● ● ● ●● ● ●●●●● sBE2 ● ● ●●●● ● 20 sSW2 5 ● ● ●● ● ● ● ●● sFIN2 ● ●●● ●●● ● ●●●●●●●● sIR2 ● ● ●●●●●●●●●●●●●●● ● ● ● ●●● ● ● ●● ● ●●●● ● ● ●●●●●● ●● ● ● ●●●●● ● ● ●● ● ●●●●●●●●● ●●●●●●● ● ●●●●●● ●●●●● ●● ● ● ●●●●●●●●●●● sUS3 ●●●●●● ●●●●●●●●● ●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●●●●●●●●●● ●● ● ●●● ●●●●● ●●●●● ●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ● ●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●● ●●● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ● ●● ● ●● sFR2 ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● sUK3 0 0 sGER2 sIT3 sNL4 IFT74 LINC00032 IFNK LINGO2 -19 TEK EQTN C9orf72 meta-analysis: p = 3.99 x 10 LMM: p = 1.71 x 10-24 MOB3B I2: 65.9%, p = 7.6 x 10-7

27.2 27.4 27.6 27.8 28 0.63 1.00 1.26 2.00 3.16 3.98 Position on chr9 (Mb) Odds ratio Stratum information for rs3849943 NAME IMPUTED MAFCASES MAFCONTROLS HWE INFO ORSTRATUM PSTRATUM sNL1 1 0.2636 0.2298 0.1281 1.0000 1.1864 0.119 sBE1 1 0.2475 0.2524 0.1362 1.0000 0.9761 0.849 sNL2 1 0.3207 0.2292 0.2738 0.9998 1.5676 8.06×10-⁴ sSW1 1 0.2813 0.1977 0.8460 0.9999 1.5511 3.64×10-³ sFR1 1 0.3458 0.2342 0.5863 0.9978 1.7078 9.60×10-⁵ sUK1 1 0.3036 0.2107 0.6294 0.9999 1.6815 0.005 sUS1 1 0.2416 0.2401 0.5499 0.9999 0.9995 0.996 sIR1 1 0.2371 0.2221 0.4261 0.9995 1.0905 0.520 sUK2 1 0.3038 0.2407 0.1130 0.9996 1.3843 6.03×10-⁶ sUS2 1 0.2669 0.2359 0.6243 0.9999 1.1859 0.173 sIT1 1 0.2745 0.2848 0.6382 0.9991 0.9656 0.790 sFIN1 1 0.2963 0.1667 0.7117 1.0000 2.1011 2.27×10-⁸ sNL3 1 0.2589 0.2233 0.2512 1.0000 1.2245 0.002 sGER1 1 0.2799 0.2229 0.5928 1.0000 1.3566 0.015 sIT2 1 0.2862 0.2313 0.7714 0.9996 1.3134 0.149 sIB1 1 0.2778 0.3172 0.1598 0.9981 0.8364 0.389 sSWISS1 1 0.2537 0.2489 0.3690 0.9997 0.9884 0.944 sBE2 1 0.2854 0.2314 0.2788 1.0000 1.3042 0.075 sSW2 1 0.2565 0.1702 0.4949 1.0000 1.6732 0.004 sFIN2 1 0.2889 0.1701 0.7313 1.0000 2.5689 2.97×10-⁴ sIR2 1 0.2728 0.2168 0.5754 0.9996 1.3633 0.016 sUS3 1 0.2612 0.2319 1.0000 0.9999 1.1331 0.123 sFR2 1 0.2599 0.2254 0.0458 1.0000 1.1632 0.142 sUK3 1 0.2650 0.2598 0.6400 0.9998 1.0279 0.647 sGER2 1 0.2605 0.2277 0.2164 0.9646 1.1095 0.265 sIT3 1 0.2946 0.2721 0.8776 0.9997 1.1273 0.053 sNL4 1 0.2500 0.2298 0.0099 1.0000 1.1144 0.252

100 Supplementary Table 7. rs12608932 details

2 100 rs12608932 r 10 ● ● Recombination rate (cM/Mb) ● sNL1 0.8 sBE1 ● 80 8 sNL2 0.6 sSW1 ● 0.4 sFR1 60 sUK1 6 0.2 sUS1

(p − value) sIR1 ● 10 ● sUK2 ● 40 sUS2 l o g 4 ● ● ● − ●●● ● sIT1 ●● ● ●● ● ●●● ● ● ●● ● sFIN1 ●● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ●●●●● ●● ● ● ● sNL3 2 ● ● ● ● ●● ● ●●● ● ●● ●●● ●●●●● ●● ●●● ● ● ●● ● 20 ●● ● ● ● ●●● ●● ●●●●● ●● ● ● ●● ● ●● ●● ●● ●●●● ●●●●● ● ●● ●●●●●●● sGER1 ●● ● ● ● ●●●●●● ●●●●●●●● ● ●●●●●●●● ● ●● ●●●● ●●●●●●●●●●●●●●●●●● ● ●●●● ●● ●●● ● ●●●●●●●●●●●●●●● ●●●●●●●●● ●●●●●● ●● ●●●●●● ●●● ●●●●●●●●●●●●●● ● ●●●● ● ●●●●●● ●●●●●●●● ●●● ● ●●● ●●●●●●●●●● ●●● ●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●● ●●●●● ●● sIT2 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● sIB1 0 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● 0 sWISS1 sBE2 sSW2 MYO9B ABHD8 BST2 PGLS UNC13A FCHO1 SLC5A5 ARRDC2 sFIN2 sIR2 USE1 DDA1 MVB12A FAM129C MAP1S INSL3 CCDC124 IL12RB1 sUS3 sFR2 OCEL1 ANO8 TMEM221 COLGALT1 B3GNT3 KCNN1 MAST3 sUK3 sGER2 NR2F6 GTPBP3 NXNL1 JAK3 sIT3 sNL4 USHBP1 PLVAP SLC27A1 RPL18A Meta-analysis: p = 1.74 x 10-8 4 BABAM1 SNORA68 LMM: p = 2.69 x 10-10 ANKLE1 I2: 0.0%, p = 0.67

MRPL34 0.79 0.89 1.00 1.12 1.26 1.41 1.58 1.78 2.00 17.4 17.6 17.8 18 18.2 Odds ratio Position on chr19 (Mb)

Stratum information for rs12608932 NAME IMPUTED MAFCASES MAFCONTROLS HWE INFO ORSTRATUM PSTRATUM sNL1 0 0.403 0.376 0.406 1.000 1.119 0.259 sBE1 0 0.433 0.363 0.003 1.000 1.305 0.016 sNL2 0 0.407 0.347 0.343 1.000 1.256 0.060 sSW1 0 0.380 0.317 0.574 1.000 1.334 0.030 sFR1 0 0.377 0.320 1.000 1.000 1.289 0.054 sUK1 0 0.384 0.368 0.236 1.000 1.066 0.684 sUS1 0 0.370 0.330 0.537 1.000 1.188 0.018 sIR1 0 0.349 0.349 0.116 1.000 1.000 0.998 sUK2 0 0.371 0.352 0.704 1.000 1.095 0.169 sUS2 0 0.361 0.314 0.759 1.000 1.213 0.074 sIT1 0 0.325 0.324 0.143 1.000 0.969 0.804 sFIN1 0 0.495 0.429 0.675 1.000 1.200 0.089 sNL3 0 0.378 0.357 0.222 1.000 1.098 0.113 sGER1 0 0.337 0.333 0.211 1.000 1.016 0.890 sIT2 0 0.328 0.290 1.000 1.000 1.179 0.354 sIB1 0 0.389 0.338 0.823 1.000 1.200 0.366 sSWISS1 0 0.350 0.342 0.101 1.000 0.988 0.938 sBE2 0 0.380 0.384 0.057 1.000 0.986 0.912 sSW2 0 0.407 0.402 0.892 1.000 0.976 0.864 sFIN2 0 0.478 0.402 1.000 1.000 1.282 0.209 sIR2 0 0.377 0.342 0.526 1.000 1.162 0.188 sUS3 0 0.352 0.357 0.243 1.000 1.030 0.683 sFR2 0 0.365 0.327 0.474 1.000 1.168 0.096 sUK3 0 0.359 0.356 0.632 1.000 1.014 0.800 sGER2 1 0.367 0.358 0.494 0.537 1.174 0.142 sIT3 0 0.303 0.295 0.609 1.000 1.051 0.406 sNL4 0 0.381 0.326 0.075 1.000 1.289 0.005

101 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Supplementary Table 8. rs35714695 details

rs35714695 2 100 Recombination rate (cM/Mb) 10 ●● r ● 0.8 80 8 sNL1 ● 0.6 sBE1 ● 60 sNL2 0.4 6 ●● ● sSW1 (p − value) ● ● ●●●●● sFR1 10 ●● ● 0.2 ●● ● 40 sUK1 4 ● ●● ●● ●

l o g ● ●●●●● ● ●● sUS1 − ● ● ● ● ● ● sIR1 ●● ●●● ●●●●● ●●●● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ●●● ●● ●●●●● ● ● ●●● ● ● ● ● ● 20 sUK2 2 ● ●● ● ● ● ● ●● ●●●●●●●●●●●● ● ● ● ●● ● ● ● ● ●● ●●●●●● ● ● ● ● ●●● ● ●● ●● ● ●●●●●●●●●● ● ●● ● ●●● ●●●●●●● ●●●●●●●●● ●●● ● ● ● ●●●●●●● ●● ●● ● ● ● ●● ● ● ●●●●●●● ●●● ●●●● ●●●●● ●●●●●●●●●●●●● ●● ●●● ● ●● ●●●●●●●●●●●● ●● ●●●●● sUS2 ●●●●●●●●●●●● ●●●●●● ● ●●● ● ●● ● ●●●●●● ●●●● ● ● ●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ● ● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●● ●●●● ●●● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● sIT1 0 ●●●●●●●●●●●●●●●●● ●●● ● ● ●● ● ●●●●●●● ●●●● ●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●● 0 sFIN1 sNL3 sGER1 LYRM9 NLK PPY2 IFT20 SLC13A2 SPAG5 PROCA1 ERAL1 sIT2 sIB1 PYY2 TMEM97 FOXN1 SDF2 NEK8 MIR451A sCZ1 sBE2 KRT18P55 UNC119 SUPT6H FAM222B sSW2 sFIN2 TNFAIP1 PIGS RAB34 MIR451B sIR2 sUS3 POLDIP2 ALDOC NARR MIR144 sFR2 3 genes sUK3 TMEM199 SPAG5−AS1 TLCD1 MIR4732 omitted sGER2 sIT3 MIR4723 SGK494 RPL23A FLOT2 sNL4

SEBOX KIAA0100 TRAF4 Meta-analysis: p = 1.29 x 10-8 LMM: p = 8.96 x 10-11 VTN SNORD42B I2: 0.0%, p = 0.76 SARM1 SNORD4A

SLC46A1 SNORD42A 0.50 0.63 0.79 1.00 1.26 1.58 2.00 2.51 26.4 26.6 26.8 27 27.2 SNORD4B Position on chr17 (Mb) Odds ratio Stratum information for rs35714695 NAME IMPUTED MAFCASES MAFCONTROLS HWE INFO ORSTRATUM PSTRATUM sNL1 1 0.162 0.191 0.428 0.992 0.820 0.115 sBE1 1 0.160 0.210 0.612 0.993 0.706 0.020 sNL2 1 0.166 0.182 0.336 0.995 0.906 0.541 sSW1 1 0.152 0.173 0.522 0.995 0.842 0.319 sFR1 1 0.201 0.187 0.248 0.992 1.098 0.570 sUK1 1 0.146 0.200 0.609 0.987 0.677 0.065 sUS1 1 0.165 0.174 0.568 0.978 0.936 0.494 sIR1 1 0.131 0.161 0.307 0.995 0.792 0.142 sUK2 1 0.175 0.187 0.799 0.996 0.917 0.300 sUS2 1 0.158 0.165 0.423 0.989 0.947 0.718 sIT1 1 0.161 0.199 0.419 0.992 0.703 0.024 sFIN1 1 0.134 0.150 1.000 0.996 0.919 0.579 sNL3 1 0.168 0.184 0.349 0.996 0.897 0.148 sGER1 1 0.165 0.198 1.000 0.993 0.797 0.108 sIT2 1 0.171 0.143 0.058 0.982 1.248 0.357 sIB1 1 0.155 0.136 1.000 0.999 1.089 0.755 sSWISS1 1 0.197 0.181 0.820 0.998 1.175 0.379 sBE2 1 0.165 0.184 1.000 0.993 0.878 0.463 sSW2 1 0.155 0.198 0.676 0.995 0.791 0.195 sFIN2 1 0.137 0.113 0.342 1.000 1.481 0.215 sIR2 1 0.148 0.179 0.143 0.996 0.805 0.138 sUS3 1 0.162 0.183 0.297 0.995 0.917 0.363 sFR2 1 0.179 0.192 0.032 0.991 0.898 0.351 sUK3 1 0.171 0.188 1.000 0.995 0.890 0.090 sGER2 1 0.156 0.177 0.787 0.802 0.821 0.100 sIT3 1 0.162 0.193 0.555 0.987 0.810 0.004 sNL4 1 0.163 0.193 0.211 0.998 0.817 0.077

102 Supplementary Table 9. rs75087725 details

10 r2 100

0.8 rs616147

0.6 Recombination rate (cM/Mb) 8 ●● 80 sNL1 0.4 sBE1 ● ● ●●●●● ● sNL2 ●● 0.2 ●●●●●● sSW1 ●●●● sFR1 ●●●●● 6 ●●●●●●●●●● 60 sUK1 ●●● sUS1 sIR1

(p − value) ● sUK2 10 ● sUS2 sIT1 l o g 4 ● ●● 40 − ● ● sFIN1 ● ● sNL3 ● ● ● ● sGER1 ●●● ● ● ● ● ● ● sIT2 ● ●●● ● ● ●● ●● ●●●●●●●●●● ● ● ● ●●●●● ●●●●●●● sIB1 2 ● ●●●●●●●●●●● ●● ● ●● ● ● ● ●●●● 20 ●●● ● ● ● ●●●● ●●●●●●●●●●●●● ● ● ● ● ●●●●● sSIWSS1 ● ●●●●●●● ●●● ●●●●●●●● ●●●●●●● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●● ●●● ●●●●●●●●● ● ● ● ● ●●●●● ● ●●●●●●●●●●● ●●● sBE2 ● ●● ● ●●● ● ●●●●●●●●● ●●●●●●●●●●●●● ●●● ● ● ●●●● ●●●●● ●●●●●●●●●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ● ●●● ● sSW2 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●● ●●●● ●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●● ● ●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●● ● ● ● ●●●●● ●●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● sFIN2 ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●● ● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● sIR2 0 0 sUS3 sFR2 sUK3 WDR48 XIRP1 CCR8 MOBP MYRIP sGER2 sIT3 GORASP1 CX3CR1 SLC25A38 sNL4 4 −05 TTC21A RPSA Meta-analysis: p = 4.14 x 10 LMM: p = 1.43 x 10−08 MIR6822 SNORA6 I2: 0.0%, p = 0.94

CSRNP1 SNORA62 0.63 0.79 1.00 1.26 1.58 2.00 2.51

39.2 39.4 39.6 39.8 40 Odds ratio Position on chr3 (Mb)

Stratum information rs75087725 NAME IMPUTED MAFCASES MAFCONTROLS HWE INFO ORSTRATUM PSTRATUM sNL1 1 0.014 0.009 1.000 0.882 1.667 0.302 sBE1 1 0.018 0.015 1.000 0.875 1.280 0.608 sNL2 1 0.025 0.014 1.000 0.883 2.012 0.122 sSW1 1 0.017 0.008 1.000 0.925 2.204 0.169 sFR1 1 0.028 0.013 1.000 0.834 2.504 0.059 sUK1 1 0.010 0.019 1.000 0.660 0.351 0.219 sUS1 1 0.027 0.014 1.000 0.813 2.281 0.003 sIR1 1 0.012 0.012 1.000 0.795 1.031 0.957 sUK2 1 0.025 0.014 1.000 0.875 1.879 0.009 sUS2 1 0.015 0.015 1.000 0.868 1.031 0.950 sIT1 1 0.012 0.013 1.000 0.755 0.967 0.956 sFIN1 1 0.032 0.027 0.228 0.959 1.065 0.847 sNL3 1 0.019 0.013 1.000 0.925 1.514 0.076 sGER1 1 0.014 0.010 1.000 0.827 1.482 0.475 sIT2 1 0.019 0.023 1.000 0.842 0.807 0.739 sIB1 1 0.016 0.027 1.000 0.850 0.483 0.328 sSWISS1 1 0.025 0.017 1.000 0.882 1.278 0.652 sBE2 1 0.035 0.022 1.000 0.936 1.684 0.221 sSW2 1 0.010 0.002 0.891 1.000 - - sFIN2 1 0.019 0.034 1.000 0.950 0.611 0.454 sIR2 1 0.017 0.007 1.000 0.796 2.892 0.068 sUS3 1 0.018 0.014 1.000 0.864 1.470 0.201 sFR2 1 0.030 0.016 0.210 0.866 2.133 0.020 sUK3 1 0.023 0.013 1.000 0.877 1.912 0.002 sGER2 1 0.016 0.013 1.000 0.758 0.893 0.770 sIT3 1 0.015 0.008 1.000 0.762 2.533 0.003 sNL4 1 0.021 0.013 1.000 0.984 1.695 0.120 * SNP failed imputation QC because of MAF < 0.01

103 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Supplementary Table 10. functional details for rs75087725

General info Functional prediction ExAC allele frequency Chromosome 21 PolyPhen-2 0.015 Finnish 0.026 (Benign) Basepair 45753117 European 0.011 Gene C21orf2 Conservation scores Latino 0.003 SNP rs75087725 PhyloP -0.118 African 0.002 Mutation p.V58L PhastCons 0.026 South Asian 0.001 East Asian 0.000

Amino-acid conservation Human C R S V P E L Rhesus C R S V P E L Mouse C S R V P E L Dog C Q S V P E L Zebrafish C S S L H E L

104 Supplementary Table 12. rs616147 details

10 r2 100

0.8 rs616147 sNL1

0.6 Recombination rate (cM/Mb) 8 ●● 80 sBE1 0.4 ● ● sNL2 ●●●●● ● sSW1 ●● 0.2 ●●●●●● sFR1 ●●●● ● ● sUK1 ●●●●●● ●●● 6 ●●●●●● 60 sUS1 sIR1 sUK2 (p − value) ●

10 ● sUS2 sIT1 l o g 4 ● ●● 40 sFIN1 − ● ● ● ● sNL3 ● ● ● ● sGER1 ●●● ● ● ● ● ● ● sIT2 ● ●●● ● ● ●● ●● ●●●●●●●●●● ● ● ● ●●●●● ●●●●●●● sIB1 2 ● ●●●●●●●●●●● ●● ● ●● ● ● ● ●●●● 20 ●●● ● ● ● ●●●● ●●●●●●●●●●●●● ● ● ● ● ●●●●● sSIWSS1 ● ●●●●●●● ●●● ●●●●●●●● ●●●●●●● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●●●●●●●● ●●●●● ●●●●●●●●●● ●●● ●●●●●●●●● ● ● ● ● ●●●●● ● ●●●●●●●●●●● ●●● sBE2 ● ●● ● ●●● ● ●●●●●●●●● ●●●●●●●●●●●●● ●●● ● ● ●●●● ●●●●● ●●●●●●●●●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ● ●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●● ●●●●●●●●●●●● ● ●●● ● sSW2 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●● ●●●● ●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●● ● ●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●● ● ● ● ●●●●● ●●● ● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● sFIN2 ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●● ● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● sIR2 0 0 sUS3 sFR2 sUK3 WDR48 XIRP1 CCR8 MOBP MYRIP sGER2 sIT3 GORASP1 CX3CR1 SLC25A38 sNL4 4 TTC21A RPSA Meta-analysis: p = 4.14 x 10−05 LMM: p = 1.43 x 10−08 MIR6822 SNORA6 I2: 0.0%, p = 0.94 CSRNP1 SNORA62 0.63 0.79 1.00 1.26 1.58 2.00 2.51 39.2 39.4 39.6 39.8 40 Position on chr3 (Mb) Odds ratio

Stratum information for rs616147 NAME IMPUTED MAF MAF HWE INFO OR P CASES CONTROLS STRATUM STRATUM sNL1 1 0.302 0.275 0.086 0.983 1.135 0.223 sBE1 1 0.320 0.308 0.597 0.975 1.061 0.643 sNL2 1 0.288 0.268 0.610 0.983 1.082 0.556 sSW1 1 0.286 0.313 1.000 0.988 0.894 0.420 sFR1 1 0.316 0.273 0.430 0.982 1.241 0.129 sUK1 1 0.300 0.252 0.668 0.975 1.297 0.152 sUS1 1 0.294 0.273 0.679 0.982 1.105 0.200 sIR1 1 0.291 0.272 0.782 1.000 1.096 0.459 sUK2 1 0.287 0.261 0.689 1.000 1.141 0.062 sUS2 1 0.316 0.289 0.160 1.000 1.131 0.278 sIT1 1 0.341 0.328 0.558 0.999 1.073 0.570 sFIN1 1 0.323 0.274 0.242 0.987 1.143 0.263 sNL3 1 0.289 0.281 0.817 1.000 1.047 0.464 sGER1 1 0.279 0.270 1.000 0.999 1.043 0.722 sIT2 1 0.342 0.376 0.178 0.998 0.852 0.376 sIB1 1 0.278 0.242 0.780 0.999 1.197 0.420 sSWISS1 1 0.293 0.310 1.000 1.000 0.914 0.579 sBE2 1 0.300 0.279 0.874 0.999 1.107 0.491 sSW2 1 0.306 0.296 0.754 1.000 1.045 0.779 sFIN2 1 0.337 0.278 0.456 1.000 1.560 0.047 sIR2 1 0.258 0.291 0.298 0.999 0.833 0.154 sUS3 1 0.296 0.274 0.465 0.999 1.091 0.266 sFR2 1 0.307 0.291 0.939 0.996 1.108 0.303 sUK3 1 0.282 0.266 0.474 0.999 1.083 0.175 sGER2 1 0.291 0.259 0.355 0.876 1.040 0.692 sIT3 1 0.346 0.320 0.944 0.998 1.111 0.077 sNL4 1 0.292 0.277 0.665 0.999 1.082 0.391

105 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Supplementary Table 13. rs7813314 details

10 r2 100

0.8 sNL1 0.6 sBE1 sNL2 8 rs7813314 0.4 80 sSW1 ● Recombination rate (cM/Mb) ●● 0.2 sFR1 ● sUK1 sUS1 ● sIR1 6 ● 60 sUK2 ● sUS2 ● sIT1

(p − value) sFIN1 10 ● sNL3 g ● l o ● ● sGER1 4 ● ● 40 − ● ● sIT2 ●●● ● ●●●●●● ●●●● sIB1 ● ●● ● ●● sCZ1 ● ●●●● ●●●●● ● ●●● ●● ● ●●●●●●●● ●● ●●●●● ● ● sBE2 ● ●●●●● ● ● ●●●●●●● ●● ● ● ● ● ●●● ● ● ● ● ●● ● ●●●●●●● sSW2 ● ●●●●●● ● ● ●● ●● ●● ●● ●● ●●●●●●●●●●●● 2 ● ●●●●●● ● ● ● ●● ● ●● ●●● ●● ●●●●● ● ●●●● ●●●● 20 sFIN2 ● ● ● ●●●●●●●●●● ● ● ●●● ●●●● ●●● ● ●●●●● ● ● ● ● ●● ● ●●●●●●●●● ●●●●● ● ● ● ● ● ●● ●●●●●●● ●● ● ●● ●●●● ●●● ●● sIR2 ● ● ●●● ● ●●●● ●●●● ● ● ● ●●●●●●●●●● ●●●● ● ●●● ●●●● ●●●●● ●● ● ●●● ●● ● ●● ●●●●●●●● ●●●●●●●●●●●●●●● ● ● ● ● ●● ●●●●●●● ●●● ●●● ●● ●●● ●●●●● ●●●●●● ●● ●●●● sUS3 ●●● ●● ●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●● ●●●● ●●●●●●●●●●●● ● ●●●●●●●●●● ●●●●●● ● ●● ●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● sFR2 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● sUK3 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● sGER2 0 0 sIT3 sNL4

KBTBD11 CSMD1 Meta-analysis: p = 7.46 x 10-7 -8 MYOM2 LMM: p = 3.14 x 10 I2: 0.0%, p = 0.86 MIR7160 0.50 0.63 0.79 1.00 1.26 1.58 2.00 2 2.2 2.4 2.6 2.8 Position on chr8 (Mb) Odds ratio

Stratum information for rs7813314 NAME IMPUTED MAF MAF HWE INFO OR P CASES CONTROLS STRATUM STRATUM sNL1 1 0.080 0.092 0.065 0.995 0.871 0.420 sBE1 1 0.100 0.097 0.752 0.963 1.029 0.883 sNL2 1 0.096 0.100 0.425 0.941 0.976 0.905 sSW1 1 0.090 0.095 0.709 0.982 0.998 0.995 sFR1 1 0.100 0.103 0.830 0.990 0.969 0.880 sUK1 1 0.086 0.105 0.676 0.989 0.799 0.401 sUS1 1 0.101 0.098 0.276 0.911 1.042 0.735 sIR1 1 0.087 0.098 1.000 0.988 0.876 0.499 sUK2 1 0.075 0.101 0.915 0.994 0.734 0.007 sUS2 1 0.081 0.117 0.386 0.989 0.668 0.025 sIT1 1 0.109 0.138 0.588 0.994 0.728 0.078 sFIN1 1 0.098 0.116 0.802 0.985 0.782 0.162 sNL3 1 0.087 0.096 0.136 0.998 0.902 0.292 sGER1 1 0.083 0.083 1.000 0.996 0.999 0.995 sIT2 1 0.116 0.157 0.682 0.996 0.689 0.139 sIB1 1 0.107 0.122 0.630 0.997 0.847 0.574 sSWISS1 1 0.098 0.084 0.375 0.996 1.221 0.454 sBE2 1 0.095 0.128 0.777 0.998 0.701 0.109 sSW2 1 0.111 0.089 0.678 0.995 1.202 0.431 sFIN2 1 0.078 0.093 0.540 0.999 0.917 0.799 sIR2 1 0.081 0.094 0.156 0.998 0.850 0.417 sUS3 1 0.089 0.099 0.032 0.994 0.828 0.125 sFR2 1 0.089 0.104 0.866 0.992 0.878 0.408 sUK3 1 0.085 0.105 0.335 0.988 0.792 0.010 sGER2 1 0.082 0.095 1.000 0.981 0.742 0.036 sIT3 1 0.110 0.124 0.399 0.990 0.880 0.138 sNL4 1 0.096 0.100 0.012 1.000 0.961 0.770

106 Supplementary Table 14. rs10139154 details

10 r2 100

0.8 sNL1 0.6 sBE1 8 0.4 80 sNL2

rs10139154 Recombination rate (cM/Mb) sSW1 ● ● 0.2 ●● sFR1 ●●●● ● sUK1 ● ●●●●●●●●●●●●●●●●● ● ● ●●● sUS1 ●●●●● ● ● ●●●●● ●● ● ● ● ● sIR1 ● ● ● ● 6 ●●●● ● ●● ●●● 60 ● ● ● ● sUK2 ● ● ● ● sUS2 ●● sIT1 (p − value) ●●● ● ●●●●● ●● ● 10 ●●● ● ●● ●● ●● ● ● sFIN1 g ●● ● ● ● sNL3 l o ● ● 4 ● ● 40 − sGER1 ●● ● ● ● ●● ● ●● sIT2 ● ●●● ●● ● ● ●●●● ● ●●●●● ● ●● ●●●● sIB1 ● ● ● ●● ●●●●● ● ●●●●● ● ● ● ●●●●●● sSWISS1 ●● ● ● ● ●● ● sBE2 ●●●● ● ●● ● ● ●●● ● ● ● 2 ●●● ●●● ●● ●●●● ●● 20 sSW2 ● ● ● ●● ● ● ●●●● ● ●● ● ●● ●● ● ● ● ●●● ●● ● ●● ●● ● ●● ● ●●●● ● ● sFIN2 ●●●● ● ● ●●● ●●●●●●●● ●●●●●● ●●●● ● ● ● ● ● ● ●● ● ●●●●● ● ●● ●● ●●●●●●●● ●●●●●● ●●● ● ●●●●● ●● ●●●● ● ● ●● ●●● ●●●●●●●● ●●●●● ● ●●●●●●●●●●● sIR2 ● ● ●●●● ● ●● ●●● ●● ●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●● ● ●●● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●● ●●●● ●●●●● ● ●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●● ●●● ●● ● ●●●●●●●●●●●●●● ●●●●● ●●●● ●●● ●●●●●●●●● ●● ● ●●● ●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● sUS3 ●●●●●●●●●●●●●● ●●●● ● ●●●●● ●●●●●●● ●●●●● ● ● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ● ● ● ●●● ● ●●●● ●●●●●● ● ●●●●●●●● ●●●●●●●●●●●●●●●● ●● ● ● ●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●● ●●●●● ●● ● sFR2 ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●● ● ●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● sUK3 0 0 sGER2 sIT3 4 sNL4 G2E3 COCH MIR624 HECTD1 -5 SCFD1 LOC100506071 AP4S1 Meta-analysis: p = 1.92 x 10 LMM: p = 4.95 x 10-8 STRN3 I2: 0.0%, p = 0.97

30.8 31 31.2 31.4 31.6 0.63 0.79 1.00 1.26 1.58 2.00 Position on chr14 (Mb) Odds ratio Stratum information for rs10139154 NAME IMPUTED MAF MAF HWE INFO OR P CASES CONTROLS STRATUM STRATUM sNL1 1 0.331 0.318 0.573 1.000 1.062 0.565 sBE1 1 0.337 0.313 0.512 0.993 1.112 0.372 sNL2 1 0.314 0.310 0.593 0.994 1.010 0.938 sSW1 1 0.347 0.312 0.668 1.000 1.144 0.324 sFR1 1 0.342 0.316 0.588 0.993 1.129 0.374 sUK1 1 0.304 0.305 0.850 1.000 0.994 0.969 sUS1 1 0.310 0.318 0.032 0.993 0.965 0.645 sIR1 1 0.312 0.309 0.093 0.999 1.013 0.910 sUK2 1 0.328 0.296 0.517 1.000 1.150 0.039 sUS2 1 0.336 0.357 0.847 0.998 0.913 0.416 sIT1 1 0.361 0.348 0.776 0.999 1.113 0.392 sFIN1 1 0.343 0.308 0.469 0.996 1.163 0.185 sNL3 1 0.339 0.314 0.233 1.000 1.129 0.048 sGER1 1 0.323 0.320 0.086 1.000 1.016 0.886 sIT2 1 0.376 0.383 0.829 0.999 0.970 0.858 sIB1 1 0.294 0.328 0.359 0.999 0.851 0.459 sSWISS1 1 0.350 0.296 0.630 0.999 1.253 0.156 sBE2 1 0.312 0.273 0.627 0.999 1.197 0.210 sSW2 1 0.332 0.309 0.760 0.999 1.115 0.454 sFIN2 1 0.381 0.294 0.626 0.999 1.384 0.133 sIR2 1 0.301 0.281 0.813 1.000 1.103 0.420 sUS3 1 0.345 0.306 0.155 1.000 1.174 0.029 sFR2 1 0.333 0.320 0.277 0.980 1.047 0.645 sUK3 1 0.317 0.298 0.534 1.000 1.096 0.107 sGER2 1 0.341 0.316 0.275 0.964 1.110 0.229 sIT3 1 0.365 0.348 0.382 0.999 1.071 0.243 sNL4 1 0.337 0.334 0.846 1.000 1.004 0.962

107 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

Supplementary Table 15. previously associated loci

Discovery GWAS Current GWAS* Locus SNP MAF MAF p-value OR p-value Power CASES CONTROLS FGGY⁵⁶ rs6700125 0.32 0.41 1.8 × 10-⁵ 1.06 0.07 1.00 ITPR2³ rs2306677 0.11 0.07 3.3 × 10-⁶ 1.03 0.54 1.00 SUN3¹⁰ rs2708909 0.45 0.50 7.0 × 10-⁷ 0.97 0.28 1.00 C7orf57¹⁰ rs2708851 0.45 0.50 1.2 × 10-⁶ 0.97 0.24 1.00 DPP6⁴ rs10260404 0.42 0.35 5.4 × 10-⁸ 0.97 0.20 1.00 CAMK1G⁵⁷ rs6703183 0.41 0.34 2.9 × 10-⁸ 0.96 0.13 1.00 SUSD2⁵⁷ rs8141797 0.15 0.10 2.4 × 10-⁹ 1.02 0.70 1.00 18q11.2¹⁸ rs1788776 0.41 0.38 8.4 × 10-⁶ 1.02 0.43 1.00 CYP27A1⁵⁸ rs4674345 - - 1.8 × 10-⁴ 0.99 0.57 - CENPV⁵⁹ rs7477 ~ 0.5 ~ 0.5 2.9 × 10-⁷ 1.01 0.70 - 8q24.13⁵⁹ rs12546767 ~ 0.1 ~ 0.1 2.7 × 10-⁶ 0.93 0.09 - * Newly genotyped individuals only, excluding possible sample overlap with all discovery GWAS.

108 Supplementary Table 17. details for top brain cis-eQTLs

GWAS Brain GWAS Brain cis-eQTL R2 cis-eQTL Brain tissue eQTL p* Source top SNP p(LMM) SNP gene rs616147 rs1768208 0.99 1.8 × 10-8 RPSA cerebellum 7.7 × 10-4 GTEx36 cerebellar rs2965067 0.65 6.3 × 10-8 RPSA 9.8 × 10-4 GTEx36 hemisphere rs1472508 0.28 0.026 RPSA cortex 0.023 GTEx36 caudate basal 4 rs1472508 0.28 0.026 RPSA 0.019 GTEx36 ganglia rs12638676 0.31 8.3 × 10-7 SLC25A38 meta-analysis 5.9 × 10-3 Kim et al.35 mixed brain rs2039845 0.15 0.027 SLC25A38 1.0 × 10-3 Webster et. al27 sample cerebellum / 7.1 × rs1707953 0.10 0.74 MOBP Zou et al. 32 temporal cortex 10-18 rs3849943 rs10812605 0.56 5.8 × 10-12 C9orf72 cerebellum 7.7 × 10-4 GTEx36 cerebellar rs2492816 0.45 2.0 × 10-11 C9orf72 9.8 × 10-4 GTEx36 hemisphere nucleus rs2492816 0.45 2.0 × 10-11 C9orf72 0.041 GTEx36 accumbens rs4879541 0.33 8.5 × 10-10 C9orf72 frontal cortex 0.013 GTEx36 rs2244606 0.24 2.5 × 10-10 C9orf72 meta-analysis 5.3 × 10-7 Kim et al.35 rs10139154 rs10139154 1.00 5.0 × 10-8 SCFD1 cerebellum 7.7 × 10-4 GTEx36 cerebellar rs10139154 1.00 5.0 × 10-8 SCFD1 9.8 × 10-4 GTEx36 hemisphere rs35714695 rs35714695 1.00 9.0 × 10-11 POLDIP2 cortex 2.3 × 10-3 GTEx36 rs739438 0.54 4.1 × 10-6 POLDIP2 substantia nigra 0.038 BRAINEAC33 nucleus rs9913833 0.22 0.012 POLDIP2 0.012 GTEx36 accumbens putamen basal rs4795434 0.22 0.013 POLDIP2 0.048 GTEx36 ganglia anterior cingulate rs7212510 0.21 0.016 TMEM199 0.011 GTEx36 cortex rs12947270 0.33 0.014 SARM1 cortex 0.043 GTEx36 rs12608932 rs12608932 1.00 2.7 × 10-10 KCNN1 frontal cortex 7.0 × 10-3 BRAINEAC33 For each cis-eQTL in each brain region, details are provided only for the SNP in strongest LD with the top SNP in the GWAS. A full description of all brain and non-brain cis-eQTLs is provided in the supplementary Excel file. * Corrected p-values as reported in original article; for BRAINEAC Benjamini-Hockhberg correction was performed for all eQTL SNPs overlapping with suggestive DEPICT GWAS loci (GWAS LMM p < 10 × 10-⁴; r² > 0.5); for GTEx, q-value for eGene is reported.

109 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

CONSORTIUM MEMBERS

FALS Sequencing Consortium members Bradley N. Smith, Nicola Ticozzi, Claudia Fallini, Athina Soragia Gkazi, Simon Topp, Jason Kost, Emma L. Scotter, Kevin P. Kenna, Pamela Keagle, Jack W. Miller, Cinzia Tiloca, Caroline Vance, Eric W. Danielson, Claire Troakes, Claudia Colombrita, Safa Al-Sarraj, Elizabeth A. Lewis, Andrew King, Daniela Calini, Viviana Pensato, Barbara Castellotti, Jacqueline de Belleroche, Frank Baas, Anneloor L.M.A. ten Asbroek, Peter C. Sapp, Diane McKenna-Yasek, Russell L. McLaughlin, Meraida Polak, Seneshaw Asress, Jesús Esteban-Pérez, José Luis Muñoz-Blanco, Sandra D’Alfonso, Letizia Mazzini, Giacomo P. Comi, Roberto Del Bo, Mauro Ceroni, Stella Gagliardi, Giorgia Querin, Cinzia Bertolin, Wouter van Rheenen, Frank P. Diekstra, Rosa Rademakers, Marka van Blitterswijk, Kevin B. Boylan, Giuseppe Lauria, Stefano Duga, Stefania Corti, Cristina Cereda, Lucia Corrado, Gianni Sorarù, Kelly L. Williams, Garth A. Nicholson, Ian P. Blair, Claire Leblond- Manry, Guy A. Rouleau, Orla Hardiman, Karen E. Morrison, Jan H. Veldink, Leonard H. van den Berg, Ammar Al-Chalabi, Hardev Pall, Pamela J. Shaw, Martin R. Turner, Kevin Talbot, Franco Taroni, Alberto García-Redondo, Zheyang Wu, Jonathan D. Glass, Cinzia Gellera, Antonia Ratti, Robert H. Brown, Jr., Vincenzo Silani, Christopher E. Shaw and John E. Landers

Italian Consortium for the Genetics of ALS (SLAGEN) Consortium members Daniela Calini, Isabella Fogh, Antonia Ratti, Vincenzo Silani, Nicola Ticozzi, Cinzia Tiloca, Barbara Castellotti, Cinzia Gellera, Viviana Pensato, Franco Taroni, Cristina Cereda, Mauro Ceroni, Stella Gagliardi, Giacomo Comi, Stefania Corti, Roberto Del Bo, Lucia Corrado, Sandra D’Alfonso, Letizia Mazzini, Elena Pegoraro, Giorgia Querin and Gianni Sorarù

Registro Lombardo Sclerosi Laterale Amyotrofica (SLALOM) group members Francesca Gerardi, Fabrizio Rinaldi, Maria Sofia Cotelli, Luca Chiveri, Maria Cristina Guaita and Patrizia Perrone

Piemonte and Valle d’Aosta Registry for Amyotrophic Lateral Sclerosis (PARALS) group members Stefania Cammarosano, Antonio Canosa, Dario Cocito, Leonardo Lopiano, Luca Durelli, Bruno Ferrero, Antonio Bertolotto, Alessandro Mauro, Luca Pradotto, Roberto Cantello, Enrica Bersano, Dario Giobbe, Maurizio Gionco, Daniela Leotta, Lucia Appendino, Roberto Cavallo, Enrico Odddenino, Claudio Geda, Fabio Poglio, Paola Santimaria, Umberto Massazza, Antonio Villani, Roberto Conti, Fabrizio Pisano, Mario Palermo,

110 Franco Vergnano, Paolo Provera, Maria Teresa Penza, Marco Aguggia, Nicoletta Di Vito, Piero Meineri, Ilaria Pastore, Paolo Ghiglione, Danilo Seliak, Nicola Launaro, Giovanni Astegiano and Bottacchi Edo

Sclerosi Laterale Amyotrofica-Puglia (SLAP) registry members Isabella Laura Simone, Stefano Zoccolella, Michele Zarrelli and Franco Apollo

Neuroprotection and Natural History in Parkinson Plus Syndromes (NNIPPS) 4 study group members William Camu, Jean Sebastien Hulot, Francois Viallet, Philippe Couratier, David Maltete, Christine Tranchant, Marie Vidailhet.

111 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

SUPPLEMENTARY REFERENCES

1. Brooks, B. R. El Escorial World Federation of Neurology criteria for the diagnosis of amyotrophic lateral sclerosis. Subcommittee on Motor Neuron Diseases/Amyotrophic Lateral Sclerosis of the World Federation of Neurology Research Group on Neuromuscular Diseases and the El Escorial ‘Clinical limits of amyotrophic lateral sclerosis’ workshop contributors. J. Neurol. Sci. 124 Suppl, 96–107 (1994). 2. Huisman, M. H. B. et al. Population based epidemiology of amyotrophic lateral sclerosis using capture- recapture methodology. J. Neurol. Neurosurg. Psychiatry 82, 1165–1170 (2011). 3. van Es, M. A. et al. ITPR2 as a susceptibility gene in sporadic amyotrophic lateral sclerosis: a genome-wide association study. Lancet Neurol. 6, 869–877 (2007). 4. van Es, M. A. et al. Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis. Nat. Genet. 40, 29–31 (2008). 5. Hofman, A. et al. The Rotterdam Study: objectives and design update. Eur. J. Epidemiol. 22, 819–829 (2007). 6. Landers, J. E. et al. Reduced expression of the Kinesin-Associated Protein 3 (KIFAP3) gene increases survival in sporadic amyotrophic lateral sclerosis. Proc. Natl. Acad. Sci. 106, 9004–9009 (2009). 7. Cronin, S. et al. A genome-wide association study of sporadic ALS in a homogenous Irish population. Hum. Mol. Genet. 17, 768–774 (2008). 8. van Es, M. A. et al. Genome-wide association study identifies 19p13.3 UNC13A( ) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat. Genet. 41, 1083–1087 (2009). 9. Shatunov, A. et al. Chromosome 9p21 in sporadic amyotrophic lateral sclerosis in the UK and seven other countries: a genome-wide association study. Lancet Neurol. 9, 986–994 (2010). 10. Chiò, A. et al. A two-stage genome-wide association study of sporadic amyotrophic lateral sclerosis. Hum. Mol. Genet. 18, 1524–1532 (2009). 11. Brooks, B. R., Miller, R. G., Swash, M., Munsat, T. L. & World Federation of Neurology Research Group on Motor Neuron Diseases. El Escorial revisited: revised criteria for the diagnosis of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Other Motor Neuron Disord. 1, 293–299 (2000). 12. Schymick, J. C. et al. Genome-wide genotyping in amyotrophic lateral sclerosis and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol. 6, 322–328 (2007). 13. Traynor, B. J. et al. Kinesin-associated protein 3 (KIFAP3) has no effect on survival in a population-based cohort of ALS patients. Proc. Natl. Acad. Sci. 107, 12335–12338 (2010). 14. Pankratz, N. et al. Genomewide association study for susceptibility genes contributing to familial Parkinson disease. Hum. Genet. 124, 593–605 (2009). 15. Hamza, T. H. et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson’s disease. Nat. Genet. 42, 781–785 (2010). 16. Laaksovirta, H. et al. Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet Neurol. 9, 978–985 (2010). 17. Polvikoski, T. et al. Prevalence of Alzheimer’s disease in very elderly people: a prospective neuropathological study. Neurology 56, 1690–1696 (2001). 18. Fogh, I. et al. A genome-wide association meta-analysis identifies a novel locus at 17q11.2 associated with sporadic amyotrophic lateral sclerosis. Hum. Mol. Genet. 23, 2220–2231 (2014). 19. 3C Study Group. Vascular factors and risk of dementia: design of the Three-City Study and baseline characteristics of the study population. Neuroepidemiology 22, 316–325 (2003).

112 20. Krawczak, M. et al. PopGen: population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships. Community Genet. 9, 55–61 (2006). 21. Goris, A. et al. No evidence for shared genetic basis of common variants in multiple sclerosis and amyotrophic lateral sclerosis. Hum. Mol. Genet. 23, 1916–1922 (2014). 22. Andersen, P. M. et al. EFNS guidelines on the clinical management of amyotrophic lateral sclerosis (MALS)- -revised report of an EFNS task force. Eur. J. Neurol. 19, 360–375 (2012). 23. McCluskey, L. et al. ALS-Plus syndrome: non-pyramidal features in a large ALS cohort. J. Neurol. Sci. 345, 118–124 (2014). 24. Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmüller, G. SNiPA: an interactive, genetic variant- centered annotation browser. Bioinformatics 31, 1334–1336 (2015). 4 25. Myers, A. J. et al. A survey of genetic human cortical gene expression. Nat. Genet. 39, 1494–1499 (2007). 26. Heinzen, E. L. et al. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 6, e1 (2008) 27. Webster, J. A. et al. Genetic control of human brain transcript expression in Alzheimer disease. Am. J. Hum. Genet. 84, 445–458 (2009). 28. Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 6, e1000952 (2010). 29. Colantuoni, C. et al. Temporal dynamics and genetic control of in the human prefrontal cortex. Nature 478, 519–523 (2011). 30. Liu, C. Brain expression quantitative trait locus mapping informs genetic studies of psychiatric diseases. Neurosc. Bull. 27, 123–133 (2011). 31. Kim, S., Cho, H., Lee, D. & Webster, M. J. Association between SNPs and gene expression in multiple regions of the human brain. Transl. Psychiatry 2, e113 (2012). 32. Zou, F. et al. Brain expression genome-wide association study (eGWAS) identifies human disease- associated variants. PLoS Genet. 8, e1002707 (2012). 33. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosc. 17, 1418–1428 (2014). 34. Deelen, P. et al. Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels. Genome Med. 7, 30 (2015). 35. Kim, Y. et al. A meta-analysis of gene expression quantitative trait loci in brain. Transl. Psychiatry 4, e459 (2014). 36. GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015). 37. Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013). 38. Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012). 39. Fairfax, B. P. et al. Genetics of gene expression in primary immune cells identifies cell type-specific master regulators and roles of HLA alleles. Nat. Genet. 44, 502–510 (2012). 40. Zeller, T. et al. Genetics and beyond--the transcriptome of human monocytes and disease susceptibility. PLoS ONE 5, e10693 (2010) 41. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

113 Chapter 4 | GWAS identifies new risk variants and genetic architecture of ALS

42. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high- throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). 43. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890–20 (2015). 44. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006). 45. Yang,, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011). 46. Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457-70 (2011). 47. Tang,, Z.-Z. & Lin, D.-Y. MASS: meta-analysis of score statistics for sequencing studies. Bioinformatics 29, 1803–1805 (2013). 48. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010). 49. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome- wide association studies. Nat. Genet. 47, 291–295 (2015). 50. Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl. Acad. Sci. 111, E5272–81 (2014). 51. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). 52. Lin, D.-Y. & Tang,, Z.-Z. A general framework for detecting disease associations with rare variants in sequencing studies. Am. J. Hum. Genet. 89, 354–367 (2011). 53. Delaneau, O., Marchini, J. & 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014). 54. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007). 55. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011). 56. Dunckley, T. et al. Whole-genome analysis of sporadic amyotrophic lateral sclerosis. N. Engl. J. Med. 357, 775–788 (2007). 57. Deng, M. et al. Genome-wide association analyses in Han Chinese identify two new susceptibility loci for amyotrophic lateral sclerosis. Nat. Genet. 45, 697–700 (2013). 58. Diekstra, F. P. et al. Mapping of Gene Expression Reveals CYP27A1 as a Susceptibility Gene for Sporadic ALS. PLoS ONE 7, e35333 (2012). 59. Diekstra, F. P. et al. C9orf72 and UNC13A are shared risk loci for amyotrophic lateral sclerosis and frontotemporal dementia: A genome- wide meta-analysis. Ann. Neurol. 76, 120–133 (2014).

114 4

115 116 Chapter 5

Exome array analysis of rare and low frequency variants in amyotrophic lateral sclerosis. 5

Annelot M. Dekker, Frank P. Diekstra, Sara L. Pulit, Gijs H.P. Tazelaar, Rick A. van der Spek, Wouter van Rheenen, Kristel R. van Eijk, Andrea Calvo, Maura Brunetti, Philip Van Damme, Wim Robberecht, Orla Hardiman, Russell McLaughlin, Adriano Chiò, Michael Sendtner, Albert C. Ludolph, Jochen H. Weishaupt, Jesus S. Mora Pardina, Leonard H. van den Berg**, Jan H. Veldink**

Submitted

** These authors jointly supervised this work

117 Chapter 5 | Exome array analysis in ALS

ABSTRACT

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease that affects 1 in ~350 individuals. Genetic association studies have established ALS as a multifactorial disease with heritability estimated at ~61%, and recent studies show a prominent role for rare variation in its genetic architecture. To identify rare variants associated with disease onset we performed exome array genotyping in 4,244 cases and 3,106 controls from European cohorts. In this largest exome-wide study of rare variants in ALS to date, we performed single-variant association testing, gene-based burden, and exome- wide individual set-unique burden (ISUB) testing to identify single or aggregated rare variation that increases disease risk. In single-variant testing no variants reached exome-wide significance, likely due to limited statistical power. Gene-based burden testing of rare non-synonymous and loss-of-function variants showed NEK1 as the top associated gene. ISUB analysis did not show an increased exome-wide burden of deleterious variants in patients, possibly suggesting a more region-specific role for rare variation. Complete summary statistics are released publicly. This study did not implicate new risk loci, emphasizing the immediate need for future large-scale collaborations in ALS that will expand available sample sizes, increase genome coverage, and improve our ability to detect rare variants associated to ALS.

118 INTRODUCTION

Amyotrophic lateral sclerosis (ALS) is a rapidly progressing and fatal neurodegenerative disease with an estimated lifetime risk of approximately 1 in 350¹. The central hallmark of the disease is dysfunction and gradual loss of upper and lower motor neurons, resulting in progressive muscle weakness and eventual death due to respiratory failure. Median survival is three to five years after symptom onset²,³. A subset of patients shows clinical signs of extramotor system involvement, most notably cognitive impairment⁴. There is no cure to date, and the sole effective drug, riluzole, only extends survival by approximately three months⁵. 5 ALS is considered a complex disease to which both environmental and genetic risk factors contribute to disease susceptibility. Twin-based heritability is estimated at approximately 61%⁶. Over the past decades multiple genes and genomic regions have been implicated in ALS via linkage studies, genome-wide association studies (GWAS), and more recently, large scale sequencing efforts⁷. A recent GWAS in ALS, in approximately 14,000 cases and 30,000 controls, identified a total of 7 loci associated to the disease. The top loci together explain 0.2% of the estimated 8.5% SNP-based heritability; heritability estimates across all SNPs indicate that the bulk of the SNP- based heritability is captured in low-frequency (defined as minor allele frequency (MAF) 1-10%) variants beneath genome-wide significance⁸. These findings are consistent with a disease architecture in which rare and low-frequency variants play an important role⁹.

Imputation of low-frequency (MAF 0.5 – 5.0%) variants in genotyping data has become more accurate with the increasing public availability of large reference panels, and with the combination of public reference panels with study-specific reference panels¹⁰,¹¹. High quality imputation of rare variants, (MAF < 0.5%), however, has remained challenging¹²,¹³. The exome array provides a low-cost alternative to large-scale exome or genome sequencing, albeit with lower resolution. In this largest exome-wide study in ALS to date, we used the exome array to investigate the role of low-frequency and rare variants by genotyping over 240,000 primarily functional coding variants in 7350 ALS patients and controls drawn from different European cohorts. Using this approach, we aimed to further elucidate the role of these low-frequency and rare variants in the genetic landscape of ALS.

119 Chapter 5 | Exome array analysis in ALS

RESULTS

Analyses comprised 4,244 cases and 3,106 controls from 6 European countries. Of 242,901 genotyped sites, 233,331 sites passed quality control, of which 100,896 were non-monomorphic in cases and controls. A breakdown of the included samples per country is provided in Supplementary Table 1.

Single-variant testing We performed single-variant testing using logistic regression, adjusting for sex and population principal components, and assuming an additive model. We found the genomic inflation factor (λGC) to be 0.823 (Fig. 1A), reflecting the high prevalence of rare (i.e., MAF < 0.5%) variants in the data and therefore low power to detect effects at these rare alleles. No variants reached exome-wide significance (p < 5x10-⁷; Figure 1B).

Multiple top-associated signals (MAF 0.23–0.29) were located on chromosome 9 (lowest p-value 1.72 x10-⁶; Suppl. Table 2). These signals represent the large intronic hexanucleotide repeat expansion in the gene C9orf72, the most common genetic cause of ALS identified to date¹⁴,¹⁵. The strongest associated rare variants (MAF < 0.5%), rs200161705 (odds ratio (OR) = 2.74, p = 5.76x10-⁵) and rs181906086 (OR = 0.35, p = 1.68x10-⁴), are both missense variants as annotated by Variant Effect Predictor16 and are located in the genes NEK1 and CAPN14 respectively (Supplementary Table 2).

Previous work has shown that, in testing rare variation down to minor allele count < 400, the Firth test has the best combination of type I error and power in balanced and moderately unbalanced case-control studies¹⁷. We therefore repeated our analysis using Firth logistic regression and found the results to be highly consistent with our initial regression results (data not shown).

Gene-based analysis Next, to test for an association between aggregated rare variants clustering in genes and ALS, we performed gene-based analysis using the unified optimal sequence kernel association test (SKAT-O)¹⁸. As SKAT-O is a combination of burden and variance statistics, the test maximizes discovery power under different genetic architecture models (i.e., by varying the proportion of causal variants within the gene that either have the same direction of effect or presence of both protective and deleterious variants, and then testing for association). We observed no evidence for residual population stratification based on the standardized genomic inflation factor λGC of 1.016 after correction for principal components, sex and cohort (Supplementary Fig. 1). No individual gene

120 5 ⁷ Manhattan plot of p-values of exome-wide association testing comprising 7350 individuals (4,244 cases and 3,106 controls) and 100,896 non-monomorphic and 100,896 non-monomorphic Manhattan 7350 individuals (4,244 cases and 3,106 controls) plot of p-values comprising of exome-wide testing association Figure 1. Quantile-quantile Figure plot of p-values plot and Manhattan of the single-variant analysis. association A. Quantile-quantile minor allele frequency. binned by plot of single-variant regression using logistic analysis association B. line dotted The regression. logistic by derived of association the significance x axis depicts position and the y axis shows The chromosomal variants. of p = 5x10 - threshold the exome-wide significance to corresponds

121 Chapter 5 | Exome array analysis in ALS

surpassed the predefined Bonferroni-corrected significance threshold (p = 3.45x10-⁶, after adjusting for 14,488 genes). For the top genes most associated to ALS (p < 1x10-³; Table 1), we calculated exact p-values using 500,000 case-control permutations. The strongest signal in the gene-based burden test using SKAT-O was NEK1 (p = 1.21 x10-⁵; Fig. 2). For the majority of the top associating genes repeating the analysis using Firth test showed highly concordant results (Table 1).

Individual set-unique burden analysis Previous studies have indicated that in some complex diseases, disease-affected individuals carry an excess number of missense and nonsense variants19. Therefore, we next sought to investigate the exome-wide burden per individual of particular classes of variants by estimating the individual set-unique burden (ISUB) for each individual. We first annotated all variants available for analysis. Of the 24,844 variants carried only by the 4,244 ALS cases, 23,231 variants are nonsynonymous and 6,600 are deleterious, according to the CONDEL algorithm²⁰; these variants comprise the ‘set-unique’ variants in cases. Of the 11,856 set-unique variants in the 3,106 controls, 11,088 are nonsynonymous and 3,135 are deleterious. We used CONDEL to assign deleteriousness scores to each variant and then summed and normalized these scores. Finally, for all set-unique nonsynonymous and loss-of-function variants (called the ‘NS’ group of variants) as well as for the observed loss-of-function variants predicted to be deleterious (the ‘DEL’ group of variants), we calculated the exome-wide individual set- unique burden (ISUB) per individual.

We found the ISUB scores based on the NS group of variants to be significantly higher in ALS patients compared to controls (p = 2.9x10-¹³⁷). We observed no such difference for ISUB calculated from the DEL group of variants (p = 0.22) (Table 2, Supplementary Fig 3). To control for the case-control imbalance in our sample (case-control ratio ~1.36:1) we repeated the ISUB analysis in balanced cohorts only (i.e., including 5,069 individuals from the Dutch, Belgian, and Irish cohorts with case-control ratios nearing 1:1). In this balanced subset, we observed no significant difference in ISUB score based either on nonsynonymous variants (p = 0.39) or on deleterious variants (p = 0.53, Table 2, Supplementary Fig. 2, Supplementary Fig. 4).

We additionally confirmed that the differing number of observed variants per individual (N = 7 and N = 4 respectively, p = 1.38-²⁷⁶) in the imbalanced (i.e., complete) dataset drove the significant difference in set-unique burden between cases and controls (Supplementary Table 3). Balancing the case-control ratio eliminated this difference (N = 7 for nonsynonymous variants in both the case and control datasets, p = 0.19)

122 34 33 32 31 30 29 34 28 27

26 5 25 p.N745K OR: 1.49 MAF: 0.00710 p.N745K OR: 1.49 MAF: 0.00480 24 23 p.R742C p.R742C OR: 0.83 OR: 0.83 MAF: 0.00064 MAF: 0.00047 22 OR: 0.59 OR: 0.59 p.R721W p.R721W MAF: 0.00012 MAF: 0.00016 21 20 19 18 17 16 15 14 13 12 11 10 9 p.R261C OR: NA MAF: 0.00024 8 7 p.R261H OR: 2.74 p.R261H OR: 2.74 OR: NA p.Y229C MAF: 0.00880 MAF: 0.00340 6 MAF: 0.00024 5 4 3 2 1 Figure 2. Gene mutation plot of NEK1 2. Gene mutation Figure NEK1 in the gene-based across test the distribution of SNVs verticalThe lines represent in basepairs. their size to relative exons represent or ANG e circles The in the single-variant (OR) analysis. and odds ratio minor allele frequency (MAF) amino acid ch ANG e, with their corresponding (SKAT-O), Controls (n = 3106) Cases (n = 4244) 123 Chapter 5 | Exome array analysis in ALS 0.53 P value P value 0.39 Number of Number of variants 6 2 5 4 4 21 8 2 ³ using SKAT-O. Positions Positions ³ using SKAT-O. ⁵ ⁵ ³ ³ ⁴ ⁴ ⁴ 1.47 / 1.20 1.62 ISUB score (mean / (mean / ISUB score median / sd) 1.42 / 1.21 1.20 3.97 / 3.43 4.55 3.83 / 3.49 2.82 Nominal p value p value Nominal Firth test 3.06 x10 - 2.06 x10 - 1.18 x10 - 1.34 x10 - 5.20 x10 - 0.37 5.05 x10 - 8.52 x10 - 4201 No. of of No. SNVs 4025 14,831 14,310 ⁵ ⁵ ⁴ ⁴ ⁴ ³ ³ ³ 2489 No. of of No. individuals BALANCED COHORTS 2580 2489 2580 ¹³⁷ Exact p value Exact p value SKAT-O 1.21 x10 - 3.21 x10 - 1.73 x10 - 2.25 x10 - 4.77 x10 - 5.74 x10 - 6.14 x10 - 7.00 x10 - 0.22 P value P value 2.9x10 - ⁵ ⁵ ⁴ ⁴ ⁴ ⁴ ⁴ ⁴ Nominal p value p value Nominal SKAT-O 2.73 x10 - 3.08 x10 - 2.16 x10 - 2.50 x10 - 5.94 x10 - 7.10 x10 - 7.55 x10 - 7.90 x10 - 1.58 / 1.23 1.57 SUB score (mean / (mean / SUB score median / sd) 4.33 / 3.49 4.16 1.38 / 1.13 1.49 2.30 / 1.89 2.10 End (bp) 170533778 16472520 31456724 85956212 113803843 71031220 99675800 67774257 6600 No. of of No. SNVs 23,231 3135 11,088 4244 No. of of No. individuals ALL COHORTS 4244 3106 3106 Start (bp) 170314421 16453626 31395922 85932774 113777113 70910630 99645286 67624653 Chr 4 17 2 16 13 12 15 8 Case Phenotype Case Control Control Gene NEK1 ZNF287 CAPN14 IRF8 F10 PTPRB SYNM SGK3 DEL Variant Variant type NS Results of gene-based burden testing using SKAT-O and Firth test, results limited to genes exceeding p < 1x10 - genes exceeding to limited results test, and Firth Results of gene-based using SKAT-O testing burden 500,000 case-control from human build 37. Exact generated permutations. for p values given analysis. 2. Results individual set unique burden Table Table 1. Top genes associated to ALS, as found through gene-based through ALS, as found testing. to burden genes associated Top 1. Table Results given for analysis comprising all individuals (all cohorts; N = 7350) and for a subset of samples comprising balanced case-control all individuals (all cohorts; comprising balanced a subset of samples comprising analysis N = 7350) and for for Results given with regression logistic for given N = 5069). P values and Ireland, Belgium Netherlands, The cohorts cohorts; only (balanced samples from NS = all nonsynonymous variants, DEL = deleterious country and components first six principalcovariates. included as of sample origin and loss-of-function variants.

124 (Supplementary Table 3, Supplementary Fig. 5). The scores of deleteriousness per variant, as measured by CONDEL, did not differ between cases and controls for the imbalanced dataset (p = 0.58) and the balanced dataset (p = 0.87) (Supplementary Table 4, Supplementary Fig. 5). Repeating the analysis after removing outlying scores (defined as > 5 standard deviations from set mean ISUB score) or logarithmic transformation of the data did not change the results.

DISCUSSION

Using the exome array in 7,350 ALS cases and matched controls, we sought to 5 investigate the role of low-frequency and rare variants in the etiology of ALS. Despite being the largest study of rare variants in ALS to date, no associations reached the predefined significance levels in single-variant association testing, gene-based burden testing and exome-wide individual set-unique burden (ISUB) testing. The strongest associating signals have previously been linked to ALS.

In accordance with other exome array studies, we had minimal power to find significant associations at rare, modest effect variants. At a minor allele frequency (MAF) of 1%, variants with large effect sizes (odds ratio > 2.3) could be detected with 80% power; larger effect sizes are necessary to have sufficient power (>80%) for variants with lower frequency. Because the exome array content is nearly entirely comprised of rare variants that, by definition, typically have low LD with other variants, our power to detect other associated variants with ALS through LD-tagging was also low. Given our results, we can conclude that none of the rare variants captured on the exome array are large-effect variants associated with ALS.

For gene-based burden testing our analysis yielded sufficient power to detect genes with a large proportion of causal variants (≥ 50%) contributing to disease risk (including genes with ≤ 25% protective variants). For genetic architectures with lower percentages of causal variants and higher percentages of protective variants, power was low. Different genes associated to ALS likely contain different genetic architectures. For example, previous work showed that mutations in several ALS-related genes predominantly affect specific regions of the gene (e.g. mutations causing exon 7 skipping in FUS and mutations primarily located at the C-terminal cargo-binding region in KIF5A), whereas in other genes (e.g. SOD1) mutations are more dispersed²¹-²⁴. This architectural complexity makes selecting appropriate settings for power analyses difficult; while we can evaluate power for simpler architectures (e.g., genes containing

125 Chapter 5 | Exome array analysis in ALS

mostly causal risk variation), it is difficult to evaluate power for a gene with a much more complex architecture (e.g., with both risk and protective variants that must occur on specific haplotypes and/or in combination with specific genetic backgrounds)²⁵.

Additional limitations include the fact that of the ~240,000 variants present on the array, the majority were monomorphic in our dataset. Illustrative of this low resolution is the observation that for gene-based burden testing of KIF5A, a gene recently implicated in ALS pathogenesis, we could only include 3 variants that were not located in the ALS-associated domains²³,²⁴. As a result, we did not find an association between KIF5A and disease onset. Our analyses also only include European-ancestry samples; given the strong geographic localization of rare variants, such a design leaves the many rare variants found only in non-European samples untested. Further, since most of the rare variants that have been identified in rare variant association studies to date have modest-to-weak effect sizes, much larger sample sizes or denser coverage of the genome are necessary in order to robustly implicate novel associations that increase risk of disease.

Despite power limitations, we can still draw several key conclusions. First, we confirm several of the strongest associations previously found in ALS, demonstrating the validity of our approach. The common variants associating most strongly with ALS in this study map to the gene C9orf72, a well-established genetic risk factor in ALS. These signals are driven by a massive hexanucleotide repeat (GGGGCC) expansion between non-coding exons 1a and 1b, an expansion which is the most frequent genetic cause of ALS discovered to date²⁶. The strongest associated rare variant (rs200161705) located in NEK1 was recently discovered as a risk variant in an inbred population in The Netherlands and subsequently replicated in an international cohort including ALS patients with and without a positive family history²⁷. Since these initial findings, several studies have confirmed NEK1 as an ALS risk gene²⁷-²⁹. In our study, NEK1 was the top hit in the gene-based analysis, demonstrating how, in some instances, application of the exome array can be a more efficient approach than sequencing for finding large effect variants. Future studies that use arrays with both exome and common variant content, to allow for not only interrogation of rare coding variants but also interrogation of common variation through genotyping and imputation, will be well positioned to identify signals like that residing in NEK1.

Because CAPN14 gene was among the top findings in both single-variant and gene- based analysis, it represents a tantalizing finding but one that requires additional follow-up in larger samples. CAPN14 is a member of the calpain family; calpains

126 are proteases that are activated by calcium, possibly through increased glutamatergic neuronal transmission. Calpains have been identified as regulators of axonal survival in injury-induced and developmental degeneration via necrotic and apoptotic pathways ³⁰-³² and an ALS mouse model has shown calpain inhibition to be neuroprotective³³. Calpain14 is primarily expressed in esophageal mucosa and has also been associated with eosinophilic esophagitis³⁴,³⁵. Although an association between the calpain family and ALS is possible, future studies will be necessary to further establish and later unravel a potential role for the gene in disease etiology.

While our ISUB analysis did not reveal an exome-wide increase in individual set-unique burden between cases compared to controls, our results do demonstrate the importance 5 of careful study design, in particular of case-control studies seeking to investigate rare variation. A case-control imbalance, allowing for the detection of more rare variants in one group than the other, can induce spurious associations if not handled properly. An identical analysis of genome-wide individual burden of nonsynonymous and loss- of-function variants performed in schizophrenia found a difference between cases and controls in a comparatively smaller sample (N = 2,003)¹⁹. Despite the described genetic overlap of ALS and schizophrenia, the genetic architecture of schizophrenia is likely quite different from ALS³⁶. GWAS in schizophrenia have revealed a prevalent role for common variants in the disease, and schizophrenia is also more heritable (h2 ~ 80%)³⁷. Given the highly polygenic architecture of schizophrenia, an ISUB analysis in that phenotype may better capture the many disease-associated loci scattered across the genome. In contrast, the architecture of ALS may be less polygenic and/or more region-specific, and inclusion of exome-wide variation may contribute additional noise to the analysis, thus obscuring potential statistical signal. Additional studies that help to first pinpoint these potential localized signals are necessary to test this hypothesis.

With the lowering costs of high-coverage sequencing and the increasing availability of large reference panels that allow for imputation of tens of millions of low-frequency variants, high-quality studies of rare variants in ALS are expected in the coming years³⁸. Given that our analysis of functional coding variants in thousands of European ALS patients and controls did not implicate new risk loci, our analyses reaffirm the need for future studies of rare variants in ALS that are internationally collaborative, allowing for the collection of larger and more ancestrally diverse sample sizes and denser coverage of the genome. In order to facilitate large scale rare variant analyses summary statistics will be made publicly available through the Project MinE databrowser website at http://databrowser.projectmine.com/.

127 Chapter 5 | Exome array analysis in ALS

METHODS

Study population All 4,495 patients and 3,227 controls included in this study were recruited at specialized neuromuscular centers in Belgium, Germany, Ireland, Italy, Spain and the Netherlands. Patients were diagnosed with possible, probable or definite ALS according to the 1994 El-Escorial criteria³⁹. All controls were free of neuromuscular diseases and matched for age, sex and geographical location.

Genotyping and quality control We conducted genotyping using Illumina HumanExome-12v1 BeadChips in accordance with the manufacturer’s protocol. A description of the exome chip design is available from http://genome.sph.umich.edu/wiki/Exome_Chip_Design. We applied the GenTrain 2.0 clustering algorithm for genotype calling as implemented in the Illumina GenomeStudio software package. Initial genotype calls were made based on the HumanExome cluster file provided by Illumina. More accurate cluster boundaries were determined based on the actual study data by exclusion of samples with a GenCall quality score in the lowest 10th percentile of genotyped variants (p10GC < 0.38) or call rate < 0.99. The final genotype calls of the whole dataset were obtained using these more precise cluster boundaries. We then performed additional sample and variant quality control (QC) was performed using PLINK 1.9, excluding all samples with missing call rates higher than 5% or sex discordance⁴⁰,⁴¹. A subset of independent, high quality SNPs (non-AT/CG, autosomal variants not located in high linkage disequilibrium (LD) regions with minor allele frequency (MAF) > 0.05, genotyping rates > 99%, R² < 0.05 and Hardy-Weinberg equilibrium p values > 1x10-³) was used to determine heterozygosity rates, population stratification, intersample relatedness and sample duplication. Closely related and duplicated samples (pi- hat > 0.2) and samples failing heterozygosity checks (> 4 standard deviations) were removed. To assess population stratification, we calculated principal components on all remaining individuals and HapMap3 individuals using EIGENSTRAT⁴²,⁴³. We removed population outliers (defined as > 10 standard deviations from the HapMap CEU/TSI mean along the first two principal components). Non-autosomal variants and variants with haploid heterozygous calls, call rate < 98%, deviation from Hardy- Weinberg equilibrium in controls (p < 1x10-⁶) and biased missingness between cases and controls (p < 5x10-³) were removed from the dataset. The final dataset comprised a total of 7,350 individuals (4,244 cases and 3,106 controls) and 233,331 variants.

128 Statistical analyses We performed single-variant association testing in PLINK 1.9 using logistic regression under an additive genetic model⁴⁰,⁴¹. To correct optimally for population stratification and batch effects due to sample handling, we used the first five principal components, sex and country of sample origin as covariates. We tested all non-monomorphic variants in both cases and controls (N = 100,896). We corrected for multiple testing using a Bonferroni adjusted p value threshold (p = 0.05 / 100,896 = 4.95x10-⁷), which nears the exome-wide significance threshold of 5x10-⁷. Genomic inflation factors were calculated using R v3.2.2 (http://www.r-project.org). Since previous work showed that Firth test has the best combination of type I error and power in balanced and moderately unbalanced studies of rare variant (i.e. variants with a minor allele count < 400) we 5 repeated the single-variant analysis using Firth logistic regression in R v3.4.1. as implemented in the logistf package.

Gene-based analysis was conducted using the unified optimal sequence kernel association test (SKAT-O) as implemented in the R-package SKAT⁴⁴. Variants were functionally annotated using Ensembl Variant Effect Predictor (VEP)¹⁶. All non- synonymous and loss-of-function variants with a MAF < 0.01 and minor allele count > 1 were included. Genes containing one annotated variant were excluded. The model was run with sex, the first five principal components and country included in the model as covariates. Exact p-values were generated from 500,000 case/control permutations for hits with a nominal p-value < 0.001. Results were considered significant after Bonferroni correction for number of tests (p = 0.05 / 14,488 = 3.45x10-⁶). Burden testing was repeated using Firth logistic regression in R v3.4.1.

To investigate the exome wide burden of rare single nucleotide variants (SNVs) in ALS cases compared to controls an individual set-unique burden (ISUB) analysis was performed as described previously¹⁹. Only variants uniquely present in controls or uniquely present in cases were selected for this analysis. Allele frequencies were compared to the non-Finnish European cohort in ExAC and variants with a MAF higher than 0.5% were removed⁴⁵. Deleteriousness of the variants was assessed using the CONDEL algorithm, which assigns a weighted average of normalized scores from multiple prediction tools to each coding variant, as well as a ‘neutral’ or ‘deleterious’ label²⁰. Non-scored loss-of-function variants were assigned a deleterious label and the maximum score occurring in the corresponding dataset. The final individual burden score was computed by summing the scores for all observed non-synonymous and loss-of-function set-unique variants (NS) as well as for the set-unique variants predicted to be deleterious (DEL).

129 Chapter 5 | Exome array analysis in ALS

We assigned scores under a dominant model, assigning a score only once in case of a homozygous genotype. The analysis was first performed in the complete dataset comprising 4,244 cases and 3,106 controls. Since the case-control ratio is an important determinant in whether a variant appears to be unique, we repeated all analyses in a balanced case-control subset of Dutch, Belgian and Irish samples (2,489 cases and 2,580 controls). Differences in ISUB scores between cases and controls were assessed using logistic regression with the first six principal components and country included in the model as covariates. To reduce the effect of outlying scores on the results the analyses were repeated with outlying scores (> 5 standard deviations from set mean) removed as well as with logarithmic transformation of the data. R version 3.2.2 was used for all statistical analyses (http://www.r-project.org).

Power analysis To estimate detectable effect sizes given the low frequencies of rare single nucleotide variants, (Supplementary Fig. 6), we performed power analysis for single-variant associating testing using the online Genetic Power Calculator (http:// zzz.bwh.harvard.edu/gpc/)⁴⁶. Power analysis for gene-based burden analysis using SKAT-O as implemented in the R-package SKAT revealed sufficient power (> 86% at 200 simulations) to detect genes with a large proportion of causal variants (≥ 50%) among rare variants (MAF < 0.01) %) contributing to disease risk (including genes with ≤ 25% protective variants) (Supplementary Fig. 7).

Data availability Summary statistics for this study are publicly available through the Project MinE Data Browser at http://databrowser.projectmine.com/.

Ethics statement The study was approved by the Medical Ethical Committee from the University Medical Center Utrecht, The Netherlands. Also, the present study followed study protocols approved by Medical Ethical Committees for each of the participating institutions. Written informed consent was obtained from all participating individuals. All methods were performed in accordance with the relevant national and international guidelines and regulations.

130 REFERENCES

1. Johnston, C. A. et al. Amyotrophic lateral sclerosis in an urban setting. J Neurol 253, 1642–1643 (2006). 2. Huisman, M. H. B. et al. Population based epidemiology of amyotrophic lateral sclerosis using capture- recapture methodology. Journal of Neurology, Neurosurgery & Psychiatry 82, 1165–1170 (2011). 3. Rooney, J. et al. Survival Analysis of Irish Amyotrophic Lateral Sclerosis Patients Diagnosed from 1995– 2010. PLoS ONE 8, e74733–10 (2013). 4. Abrahams, S., Newton, J., Niven, E., Foley, J. & Bak, T. H. Screening for cognition and behaviour changes in ALS. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration 15, 9–14 (2014). 5. Miller RG, M. J. M. D. Riluzole for amyotrophic lateral sclerosis (ALS)/motor neurondisease (MND). The Cochrane Library 1–36 (2012). 6. Al-Chalabi, A. et al. An estimate of amyotrophic lateral sclerosis heritability using twin data. Journal of 5 Neurology, Neurosurgery & Psychiatry 81, 1324–1326 (2010). 7. Al-Chalabi, A., van den Berg MD, P. L. H. & Veldink, J. Gene discovery in amyotrophic lateral sclerosis: implications for clinical management. Nature Reviews Neurology 1–9 (2016). doi:10.1038/ nrneurol.2016.182 8. van Rheenen, W. et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat Genet 1–8 (2016). doi:10.1038/ng.3622 9. Al-Chalabi, A., Calvo, A., Chio, A., Colville, S. & Ellis, C. M. Analysis of amyotrophic lateral sclerosis as a multistep process: a population-based modelling study. The Lancet 13, 1108–1113 (2014). 10. McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 1–7 (2016). doi:10.1038/ng.3643 11. Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nature Communications 6, 1–9 (2015). 12. Auer, P. L. Rare variant association studies: considerations, challenges and opportunities. Genome Med 7, 1–11 (2015). 13. Hoffmann, T. J. & Witte, J. S. Strategies for Imputing and Analyzing Rare Variants in Association Studies. Trends in Genetics 31, 556–563 (2015). 14. Renton, A. E. et al. A Hexanucleotide Repeat Expansion in C9orf72 Is the Cause of Chromosome 9p21- Linked ALS-FTD. Neuron 72, 257–268 (2011). 15. DeJesus-Hernandez, M. et al. Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9orf72 Causes Chromosome 9p-Linked FTD and ALS. Neuron 72, 245–256 (2011). 16. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 1–14 (2016). 17. Ma, C., Blackwell, T., Boehnke, M., Scott, L. J.the GoT2D investigators. Recommended Joint and Meta- Analysis Strategies for Case-Control Association Testing of Single Low-Count Variants. Genet. Epidemiol. 37, 539–550 (2013). 18. Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775 (2012). 19. Loohuis, L. M. O. et al. Genome-wide burden of deleterious coding variants increased in schizophrenia. Nature Communications 6, 1–6 (2015). 20. González-Pérez, A. & López-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011).

131 Chapter 5 | Exome array analysis in ALS

21. Kaur, S. J., McKeown, S. R. & Rashid, S. Mutant SOD1 mediated pathogenesis of Amyotrophic Lateral Sclerosis. Gene 577, 109–118 (2016). 22. Zhou, Y., Liu, S., Liu, G., Öztürk, A. & Hicks, G. G. ALS-Associated FUS Mutations Result in Compromised FUS Alternative Splicing and Autoregulation. PLoS Genet 9, e1003895–17 (2013). 23. Nicolas, A. et al. Genome-wide Analyses Identify KIF5A as a Novel ALS Gene. Neuron 97, 1268–1282.e6 (2018). 24. Brenner, D. et al. Hot-spot KIF5A mutations cause familial ALS. Brain 141, 688–697 (2018). 25. Sham, P. C. & Purcell, S. M. Statistical power and significance testing in large-scale genetic studies. Nature Publishing Group 15, 335–346 (2014). 26. van Es, M. A. et al. Amyotrophic lateral sclerosis. The Lancet 0, (2017). 27. Kenna, K. P. et al. NEK1 variants confer susceptibility to amyotrophic lateral sclerosis. Nat Genet 1–8 (2016). doi:10.1038/ng.3626 28. Cirulli, E. T. et al. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. Science 347, 1436–1441 (2015). 29. Brenner, D. et al. NEK1mutations in familial amyotrophic lateral sclerosis. Brain 139, e28–e28 (2016). 30. Yang, J. et al. Regulation of Axon Degeneration after Injury and in Development by the Endogenous Calpain Inhibitor Calpastatin. Neuron 80, 1175–1189 (2013). 31. Vosler, P. S., Brennan, C. S. & Chen, J. Calpain-Mediated Signaling Mechanisms in Neuronal Injury and Neurodegeneration. Mol Neurobiol 38, 78–100 (2008). 32. Wright, A. L. & Vissel, B. CAST your vote: is calpain inhibition the answer to ALS? J. Neurochem. 137, 140– 141 (2016). 33. Rao, M. V., Campbell, J., Palaniappan, A., Kumar, A. & Nixon, R. A. Calpastatin inhibits motor neuron death and increases survival of hSOD1 G93Amice. J. Neurochem. 137, 253–265 (2016). 34. Lonsdale, J. et al. The Genotype-Tissue Expression (GTEx) project. Nature Publishing Group 45, 580–585 (2013). 35. Sleiman, P. M. A. et al. GWAS identifies four novel eosinophilic esophagitis loci. Nature Communications 5, 5593 (2014). 36. McLaughlin, R. L. et al. Genetic correlation between amyotrophic lateral sclerosis and schizophrenia. Nature Communications 8, 14774 (2017). 37. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014). 38. van Rheenen, W. et al. Project MinE: study design and pilot analyses of a large-scale whole- genome sequencing study in amyotrophic lateral sclerosis. European Journal of Human Genetics 7, 1–10 (2018). 39. Brooks, B. R. El Escorial World Federation of Neurology criteria for the diagnosis of amyotrophic lateral sclerosis. Subcommittee on Motor Neuron Diseases/Amyotrophic Lateral Sclerosis of the World Federation of Neurology Research Group on Neuromuscular Diseases and the El Escorial ‘Clinical limits of amyotrophic lateral sclerosis’ workshop contributors. Journal of the Neurological Sciences 124 Suppl, 96–107 (1994). 40. Purcell, S. et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics 81, 559–575 (2007). 41. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaSci 4, 7–16 (2015).

132 42. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909 (2006). 43. Consortium, T. I. H. 3. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010). 44. Lee, S. et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012). 45. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). 46. Purcell, S., Cherny, S. S. & Sham, P. C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003).

5

133 Chapter 5 | Exome array analysis in ALS

SUPPLEMENTARY FIGURES











Observed Distribution (-log  of P value)

       Expected Distribution (-log  of P value)

Supplementary Figure 1. Quantile-quantile plot gene-based analysis. QQ-plot of gene-based analysis using SKAT-O.

All cohorts Balanced cohorts

20000

15000

Label DEL

Variants NS

10000

5000

0

Case Control Case Control Phenotype

Supplementary Figure 2. Set-unique SNVs Histogram depicting the set-unique variants selected for ISUB analysis in cases and controls from all cohorts combined (cases N = 4244, controls N = 3106) and from balanced cohorts only (cases N = 2489, controls N = 2580). DEL = deleterious variants, NS = all non-synonymous and loss-of-function variants.

134

5

Supplementary Figure 3. Individual set-unique burden analysis in all cohorts. Boxplots of ISUB scores in analysis comprising all individuals (N = 7350) and individuals per included cohort for (A) deleterious variants only and (B) all non-synonymous and loss-of-function variants.

135 Chapter 5 | Exome array analysis in ALS

Supplementary Figure 4. Individual set unique burden analysis in balanced cohorts. Boxplots of ISUB scores in analysis comprising balanced case-control cohorts (samples from The Netherlands, Belgium and Ireland, N = 5069) and individuals per included cohort for (A) deleterious variants only and (B) all non-synonymous and loss-of-function variants.

136 5

Supplementary Figure 5. Breakdown individual set unique burden score. Analyses comprising all individuals (all cohorts; N = 7350) and subset of samples comprising balanced case-control cohorts only (balanced cohorts; samples from The Netherlands, Belgium and Ireland, N = 5069): (A) boxplot of CONDEL score per variant; (B) boxplot of set-unique variant count per individual.

137 Chapter 5 | Exome array analysis in ALS

80% power (4244 cases / 3106 controls) 0.25 0.20 0.15 MAF 0.10

MAF = 0.05 0.05

MAF = 0.01 MAF = 0.001 0.00

1 2 3 4 5

Genotype relative risk

Supplementary Figure 6. Power plot of single variant analysis. Results derived from Genetic Power Calculator (Purcell et al., 2003). Grey line corresponds to minimal minor allele frequency (MAF) and genotype relative risk needed at these sample sizes to obtain at least 80% power for single-locus tests at a significance level of 5x10-⁷. Dotted orange lines depict MAF cut-offs at 5%, 1% and 0,1%.

138 1.00

0.75

Causal percent 10 20 er

w 30 o P 0.50 40 50 60 5

0.25

0 10 20 30 40 50 Protective percent

Supplementary Figure 7. Power plot of gene-based burden test Power analysis for gene-based burden test (SKAT-O) under different genic architectures (different precentages of causal (10%, 20%, 30%, 40% and 50%) and protective (0%, 10%, 20%, 30%, 40%, 50% and 60%) variants contributing to the signal) at significance level of 3.44x10-⁶ (number of simulations: 200).

139 Chapter 5 | Exome array analysis in ALS

SUPPLEMENTARY TABLES

Supplementary Table 1. Study population.

Country Site Cases Controls Total The Netherlands Utrecht 1611 1693 3304 Belgium Leuven 472 485 957 Germany Wurzburg / Ulm 1269 343 1612 Ireland Dublin 406 402 808 Spain Madrid 188 87 275 Italy Turin 298 96 394 Total 4244 3106 7350 Overview of included samples from seven different sites across six European countries after quality control.

140 PCLO CAPN14 LINC00922 TOX3 C9orf72 NEK1 SSPO C9orf72 Gene

5 Intron Missense Intron Synonymous Intron Intergenic Missense Non-coding transcript Intergenic Downstream gene Downstream Annotation / 0.004 / 0.136 / / 0.004 0.059 / / MAF (%) ExAC 0.512 0.006 0.393 0.073 0.255 0.154 0.003 0.050 0.233 0.230 MAFcon MAFcon (%) 0.472 0.003 0.429 0.090 0.286 0.129 0.009 0.068 0.267 0.267 MAFcase MAFcase (%) ⁴ ⁴ ⁴ ⁴ ⁵ ⁵ ⁵ ⁵ ⁶ ⁶ 1.72 x10 - 1.68 x10 - 1.50 x10 - 1.00 x10 - 8.77 x10 - 6.52 x10 - 5.76 x10 - 2.25 x10 - 9.74 x10 - 1.72 x10 - p value 0.88 (0.82– 0.94) 0.88 (0.82– 0.94) 0.35 (0.20 – 0.60) 1.14 (1.07 – 1.23) 1.28 (1.13 – 1.46) 1.17 (1.08 – 1.26) 0.82 (0.74 – 0.90) 2.74 (1.68 – 4.47) 1.38 (1.19 – 1.60) 1.19 (1.10 – 1.29) 1.21 (1.12 – 1.31) OR (95% CI) 82450035 31414830 65532244 52478215 27561049 101019725 170506525 149481994 27536397 27543281 Position Position (bp) 7 2 16 16 9 5 4 7 9 9 Chr rs2715148 rs181906086 rs12929114 rs3743797 rs774359 rs1871332 rs200161705 rs73168055 rs2814707 rs3849942 RS number Results limited to top-10 associations. Positions given for human build 37. ExAC minor allele frequencies derived from ExAC browser, browser, ExAC from derived minor allele frequencies human build 37. ExAC for given Positions top-10 to Results associations. limited = MAF interval, confidence CI = ratio, OR = odds Predictor. Effect Variant based on Ensembl Annotation population. non-Finnish European minor allele frequency. Supplementary Table 2. Single nucleotide variant association test results using logistic regression. using logistic regression. results test association 2. Single nucleotide variant Table Supplementary

141 Chapter 5 | Exome array analysis in ALS 0.33 p value p value 0.19 p value p value 0.86 0.87 Scored SNVs / ind / ind SNVs Scored (mean / median sd) 0.62 / 0.58 0.11 0.62 / 0.58 0.10 0.48 / 0.46 0.12 0.48 / 0.46 0.12 2.36 / 2 2.62 Scored SNVs / ind / ind SNVs Scored (mean / median sd) 2.28 / 2 1.91 8.33 / 7 9.88 7.99 / 7 6.09 values given for Wilcoxon rank sum sum rank Wilcoxon for given values No. of of No. SNVs 4201 4025 14,831 14,310 P 4201 No. of of No. SNVs 4025 14,831 14,310 BALANCED COHORTS No. of of No. individuals 2489 2580 2489 2580 2489 No. of of No. individuals 2580 2489 BALANCED COHORTS 2580 ¹⁵⁰ ²⁷⁶ p value p value 0.28 0.58 2.89 x 2.89 x 10 - p value p value 1.38 x 10 - values given for Wilcoxon rank sum test. DEL = deleterious DEL = deleterious sum test. rank Wilcoxon for given values

P Scored SNVs / ind / ind SNVs Scored (mean / median / sd) 0.62 / 0.58 0.10 0.62 / 0.58 0.11 0.48 / 0.46 0.12 0.48 / 0.46 0.12 2.56 / 2 2.55 1.36 / 1 1.50 Scored SNVs / ind / ind SNVs Scored (mean / median sd) 9.08 / 7 8.96 4.80 / 4 4.48 No. of of No. SNVs 6600 3135 23,231 11,088 6600 3135 No. of of No. SNVs 23,231 11,088 No. of of No. individuals ALL COHORTS 4244 3106 4244 3106 4244 3106 No. of of No. individuals 4244 3106 ALL COHORTS Phenotype Case Control Case Control Case Control Phenotype Case Control given for analysis comprising all individuals (N = 7350) and for a subset of samples comprising balanced case-control balanced a subset of samples comprising all individuals (N = 7350) and for comprising cohorts analysis for given DEL Variant Variant type Variant Variant type NS DEL NS test. DEL = deleterious variants, NS = all non-synonymous and loss-of-function variants, sd = standard deviation, ind = individuals. and loss-of-function ind = individuals. deviation, NS = all non-synonymous sd = standard variants, DEL = deleterious variants, test. variant. per scored score of CONDEL 4. Comparison Table Supplementary Supplementary Table 3. Comparison of set-unique variant count per individual. per individual. count of set-unique variant 3. Comparison Table Supplementary variants, NS = all non-synonymous and loss-of-function variants, sd = standard deviation, SNV = single nucleotide variant. and loss-of-function SNV = single nucleotide variant. deviation, NS = all non-synonymous sd = standard variants, variants, Results given for analysis comprising all individuals (all cohorts; N = 7350) and for a subset of samples comprising balanced case-control all individuals (all cohorts; comprising balanced a subset of samples comprising analysis N = 7350) and for for Results given N = 5069). and Ireland, Belgium Netherlands, The cohorts cohorts; only (balanced samples from Results N = 5069). and Ireland, Belgium Netherlands, The only (samples from

142 5

143 144 Chapter 6

NEK1 variants confer susceptibility to amyotrophic lateral sclerosis.

Perry T.C. van Doormaal*, Kevin P. Kenna*, Annelot M. Dekker*, Nicola Ticozzi* et al. 6

Nature Genetics 48, 1037-1042 (2016)

* These authors contributed equally to this manuscript. The complete author list is provided at the end of this manuscript.

145 Chapter 6 | NEK1 variants confer susceptibility to ALS

ABSTRACT

To identify genetic factors contributing to amyotrophic lateral sclerosis (ALS), we conducted whole exome analyses of 1,022 index familial ALS (FALS) cases and 7,315 controls. In a new screening strategy, we performed gene-burden analyses trained with established ALS genes and identified a significant association between loss-of- function (LOF) NEK1 variants and FALS risk. Independently, autozygosity mapping for an isolated community in the Netherlands revealed an NEK1 p.Arg261His variant as a candidate risk factor. Replication analyses of sporadic ALS (SALS) cases and independent control cohorts confirmed significant disease association for both p.Arg261His (10,589 samples analyzed) and NEK1 LOF variants (3,362 samples analyzed). In total, we observed NEK1 risk variants in nearly 3% of ALS cases. NEK1 has been linked to several cellular functions including cilia formation, DNA damage response, microtubule stability, neuronal morphology and axonal polarity. Our results provide new and important insights into ALS etiopathogenesis and genetic etiology.

146 In recent years, the combination of exome sequencing, segregation analysis and bioinformatic filtering has proven to be an effective strategy to rapidly identify new disease genes¹. Unfortunately, this method can be difficult to apply to disorders such as ALS, for which late age of onset and low-to-modest variant penetrance make it difficult to obtain large informative multigenerational pedigrees. Owing to high genetic heterogeneity, ALS is also difficult to analyze using filtering methods designed to exploit unrelated patient groups². Recently, we demonstrated the utility of exome- wide rare variant burden (RVB) analysis as an alternate approach, identifying a replicable association between FALS risk and TUBA4A in a cohort of 363 cases³. In brief, RVB analysis is used to compare the combined frequency of rare variants in each gene in a case– control cohort. Candidate associations are identified by significant differences after multiple-test correction. Since this initial study, we extended our data set to include complete exome sequencing for 1,376 index FALS cases and 13,883 controls. Of these, 6 1,022 cases and 7,315 controls met all required data, inter-relatedness and ancestral quality control criteria (Supplementary Figs. 1 and 2, Supplementary Methods).

Successful detection of disease associations through RVB analysis can depend heavily on the appropriate setting of test parameters. As genetic loci often contain many alleles of no or low effect, prior filtering of variants based on minor allele frequency (MAF) and pathogenicity predictors can identify disease signatures otherwise masked by normal human variability. As appropriate MAF or pathogenicity predictor settings may not be obvious in advance, comprehensive assessment of all pursuable analysis strategies is desirable but can in turn introduce excessive multiple test burden. To overcome these limitations, we performed 308 distinct RVB analyses of 10 well-established ALS genes using 44 functional and 7 MAF filters (Fig. 1a). All tests included correction for gene coverage and ancestral covariates (Supplementary Methods). In the final cohort, 72 cases and 0 controls harbored known ALS pathogenic mutations within these ten genes (Supplementary Methods). An additional 26 cases harbored a repeat expansion in the C9orf72 gene. Tests differed in their capacity to detect individual known ALS genes (Supplementary Table 1), but we achieved the highest net sensitivity when we restricted analyses to variants with MAF < 0.001 and functional classifications of either nonsense, splice-altering⁴ or deemed deleterious by functional analysis through hidden Markov models (FATHMM)⁵. Under these settings, four genes exhibited disease association at exome-wide (Bonferroni-corrected, P<2.5x10-⁶) significance (SOD1, TARDBP, UBQLN2, FUS), three achieved near exome-wide significance (TUBA4A, TBK1, VCP), and three displayed modest to marginal disease association (PFN1, VAPB, OPTN) (Fig. 1b). Genes exhibiting the strongest disease associations included those reported as major ALS genes in population based studies, whereas those exhibiting weaker

147 letters

Figure 1 RVB analysis of FALS exomes. a 1 b (a) RVB analyses of 1,022 index FALS cases AUC/max (AUC) >13 SOD1 and 7,315 controls for 10 known ALS genes, to assess 308 different combinations of 0 1 12 0.8 MAF and functional prediction filters (supplementary table 1). The set of analysis 10 TARDBP parameters achieving the highest sensitivity 0.6 8 UBQLN2 (Obs )

for known ALS genes was identified as 10 that achieving the highest area under the 6 NEK1 FUS Sensitivity 0.4 curve (AUC) in a plot of sensitivity (proportion −log VCP TBK1 of training genes achieving significance) 4 TUBA4A STX12 KIF5A across an increasing minimum P-value 0.2 PFN1 threshold. Dotted vertical line denotes 2 VAPB OPTN Bonferroni-corrected P value for exome- λ = 0.95 0 wide significance. (b) Extension of the 0 0 2 4 6 0 1 2 3 highest performing known gene-trained −log (P) −log (Exp) analysis to the entire exome. Threshold 10 10 for exome-wide significance is denoted by the dotted red line. λ, observed genomic inflation factor. ‘Obs’ describes the P-value distribution for the observed data. ‘Exp’ describes the P-value distribution under null expectation. Chapter 6 | NEK1 variants confer susceptibility to ALS letters genetic lineage (Supplementary Fig. 6). Autozygosity mapping, Fig. 3 and Online Methods). RVB analysis confirmed a significant allowing for genetic heterogeneity, identified four candidate dis- excess of LOF variants in cases (23/2,303 SALS samples versus ease variants occurring in detectable runs of homozygosity (ROH) 0/1,059 controls, OR = 22.2, P = 1.5 × 10−4; Supplementary Table 2). Figure 1 RVB analysis of FALS exomes. (Supplementarya Fig. 7). These variants included a p.Arg261His Meta-analysis of discovery and replication LOF analyses yielded a 1 b (a) RVB analyses of 1,022 index FALS cases variant of NEK1. Two of theAUC/max four SALS (AUC) cases>13 were homozygous combined significanceSOD1 of P = 3.4 × 10−8 and OR = 8.8. and 7,315 controls for 10 known ALS genes, for p.Arg261His and two were heterozygous, raising the possibility In total, we detected 120 predicted nonsynonymous NEK1 variants to assess 308 different combinations of 0 1 12 0.8 MAF and functional prediction filters that even a single copy of the allele may increase disease risk. in FALS samples, SALS samples and controls. These were distributed (supplementary table 1). The set of analysis Clinical evaluation of the four cases did not find10 any overt dif- throughout the gene including in the sequence encoding protein TARDBP parameters achieving the highest sensitivity ferences0.6 in disease phenotype. None of the other three candidate kinase domain (PKD) and six coiled-coil domains thought to be 8 UBQLN2 (Obs ) for known ALS genes was identified as variants exhibited homozygosity in multiple patients or occurred involved in mediating protein–protein interactions (Supplementary 10 that achieving the highest area under the 6 NEK1 FUS at Sensitivity all0.4 in more than two patients. Analysis of the region identi- Fig. 3). After conditioning for LOF variants and p.Arg261His, we curve (AUC) in a plot of sensitivity (proportion −log fied a shared p.Arg261His haplotype spanning 3 Mb in all four observedVCP tentativeTBK1 excesses of case variants in analyses of rarer of training genes achieving significance) 4 TUBA4A STX12 samples (Supplementary Table 3). variantKIF5A categories, but larger sample sizes will be required to con- across an increasing minimum P-value 0.2 PFN1 threshold. Dotted vertical line denotes To validate the risk effects of p.Arg261His, we tested2 for diseaseVAPB asso- firm the pathogenicity beyond p.Arg261His and LOF variants OPTN Bonferroni-corrected P value for exome- ciation among 6,172 SALS cases and 4,417 matched controls from eight (Supplementaryλ = 0.95 Table 4). Analysis of other members of the NEK 0 wide significance. (b) Extension of the countries0 (Supplementary Figs. 8 and 9, and Online Methods). We gen- gene family (NEK2−NEK11) identified no associations in the FALS 0 2 4 6 0 1 2 3 highest performing known gene-trained Nature America, Inc. All rights reserved. Inc. Nature America, otyped this cohort using the Illumina exome chip or by whole-genome data set meeting multiple-test criteria (Supplementary Table 5). −log10 (P) −log10 (Exp) analysis to the entire exome. Threshold sequencing, allowing for checking of any overlap or detectable relatedness Although no other gene achieved discovery significance, ten for exome-wide significance is denoted by the dottedto red the line. FALS λ, observedcase–control genomic cohort, inflation which factor. was not‘Obs’ present. describes Meta-analysis the P-value distributioncandidate for loci the exhibited P < 1.0 × 10−3 in the FALS discovery analysis observed data. ‘Exp’ describes the P-value distributionFigure under 1. Rarenull expectation. Variant Burden Analysis of FALS Exomes. (a) Rare variant burden analyses of 1,022

© 201 6 of all independent population strata identified a clear minor allele excess (Table 1). These included the gene encoding the SNARE (soluble index FALS cases and 7,315 controls were performed for 10 known ALS genes. Analyses assessed 308 in cases with a combined significance of P = 4.8 × 10−5 and OR = 2.4 NSF attachment protein receptor) complex protein synataxin 12 genetic lineage (Supplementary Fig. 6). Autozygositydifferent(Fig. 2). combinationsWe mapping, also observed of Fig.minor disease 3 andallele association Online frequency Methods). in and the functionalFALS RVB case–control analysis prediction confirmed filters(STX12 (Supplementary ,a OR significant = 33.1, P = 9.7 × 10−5). Analysis of the SALS replication allowing for genetic heterogeneity, identifiedTable datafour (OR1).candidate The = 2.7,set ofP dis =analysis 1.5- ×excess 10 parameters−3) andof LOFa meta-analysisachieving variants the in highestof casesFALS, sensitivity (23/2,303SALS and for SALS knowncohort samples ALS identified genes versus was missense variants in 5/2,303 cases versus 0/1,059 in −7 −4 ease variants occurring in detectable runs of homozygosityidentifiedall controls as thatcombined (ROH) achieving (OR0/1,059 the = 2.4,highest controls, P = area1.2 × under OR 10 =). the22.2, curve P = (AUC)1.5 × in10 a plot; Supplementary of sensitivitycontrols. (proportionHowever, Table 2). the cohort was not sufficiently powered to assess (Supplementary Fig. 7). These variants includedof trainingDNA a p.Arg261His availability genes achieving facilitated Meta-analysis significance) segregation acrossof discovery analysis an increasing of and only replication minimum one NEK1 p-valueLOF events analyses threshold. of this yielded Dotted frequency, a and larger sample sizes will be required to variant of NEK1. Two of the four SALS casesLOF were variant, homozygous a p.Arg550* combined variant, significance which we ofalso P = detected 3.4 × 10−8 in and the OR establish= 8.8. effects on ALS risk (Supplementary Table 6). Another iden- for p.Arg261His and two were heterozygous, raisingverticalaffected the line mother possibility denotes of Bonferroni the identifiedIn total,corrected weproband. detected p-value To for120 validate exome-wide predicted the effectnonsynonymous significance. of tified (b) Extension NEK1candidate variants of genethe was the known hereditary spastic paraplegia that even a single copy of the allele may increasehighestLOF variants performingdisease observed risk. known in geneFALS trained andsamples, assess analysis SALS any potential tosamples the entire andcontribution exome.controls. Threshold Thesegene were forKIF5A exome-wide distributed6 (OR = 7.1, P = 4.8 × 10−4); however, no observed eleva- Clinical evaluation of the four cases did notsignificance tofind sporadic any overt is disease, denoted dif we- by analyzed throughoutthe dotted full red sequencingthe line. gene including data of the in NEK1 the sequencecod- tions encoding in patient protein variant frequencies within the SALS replication cohort ferences in disease phenotype. None of the othering regionthree candidate for 2,303 SALSkinase cases domain and 1,059 (PKD) controls and (sixSupplementary coiled-coil domains reached thought statistical to be significance (Supplementary Table 7). variants exhibited homozygosity in multiple patients or occurred involved in mediating protein–protein interactions (Supplementary at all in more than two patients. Analysis of the region identi- Fig. 3). After conditioning for LOF variants and p.Arg261His, we Cohort Cases Controls MAF (cases) MAF (controls) OR 95% CI P Figure 2 Replication analysis of fied a shared p.Arg261His haplotype spanning 3 Mb in all four observed tentative excesses of case variants in analyses of rarer NEK1 –3 p.Arg261His. NEK1 p.Arg261His genotypes FALS 1,022 7,315 0.0086 0.0036 2.66 1.48−4.57 1.5 × 10 samples (Supplementary Table 3). variant categories, but larger sample sizes will be required to con- were ascertained for 1,022 FALS samples, To validate the risk effects of p.Arg261His, we testedSALS for disease asso- firm the pathogenicity beyond p.Arg261His and LOF variants 6,172 SALS samples and 11,732 controls. ciation among 6,172 SALS cases and 4,417 matched controlsBelgium from46 eight6 476(Supplementary0.0097 0.0053Table 4).1.81 Analysis0.60−5.51 of other members of the NEK The SALS cohort was divided into seven Spain/Italy 472 183 0.0074 0.0055 1.36 0.28−6.58 geographically based case–control strata. countries (Supplementary Figs. 8 and 9, and Online Methods).Germany We1,22 gen9 - 288gene family0.0090 (NEK20.0017−NEK115.27) identified0.71−38.81 no associations in the FALS Ireland 565 526 0.0044 0.0019 2.32 0.45−12.05 Logistic regression was used to conduct tests Nature America, Inc. All rights reserved. Inc. Nature America, otyped this cohort using the Illumina exome chip or by whole-genome data set meeting multiple-test criteria (Supplementary Table 5). sequencing, allowing for checking of any overlap or detectableNetherlands relatedness1,839 1,982Although0.0109 no other0.0035 gene2.99 achieved1.63−5.47 discovery significance, ten of allelic association for all subcohorts and UK 1,335 893 0.0049 0.0022 2.59 0.71−6.69 was followed by a fixed-effects meta-analysis. to the FALS case–control cohort, which was not present.United Meta-analysisStates 266 69candidate0.0056 loci exhibited0.0072 P 0.84< 1.0 0.08−8.29× 10−3 in the FALS discovery analysis Total 6,172 4,417 0.0080 0.0033 2.41 1.58−3.71 4.8 × 10–5 In the distribution of OR estimates across © 201 6 of all independent population strata identified a clear minor allele excess (Table 1). These included the gene encoding the SNARE (soluble study cohorts (right), vertical dotted line in cases with a combined significance of P = 4.8 × 10FALS−5 +and SALS OR 7,19= 2.44 11,732NSF attachment0.0081 0.0035protein 2.41receptor)1.57−3.71 complex1.2 × 10–7 protein synataxin 12 denotes OR estimated under meta-analysis. (Fig. 2). We also observed disease association in the FALS case–control (STX12, OR = 33.1, P = 9.7 × 10−5). Analysis of0.1 the01 SALS .0replication10.0 CI, confidence interval. data (OR = 2.7, P = 1.5 × 10−3) and a meta-analysis of FALS, SALS and cohort identified missense variants in 5/2,303 cases versus 0/1,059 in −7 all controls combined (OR = 2.4, P = 1.2 × 10 ). Figure 2. Replication Analysiscontrols. of NEK1:p.Arg261His. However, the cohort NEK1 :p.Arg261Hiswas not sufficiently genotypes powered were ascertained to assess 2 ADVANCE ONLINE PUBLICATION Nature GeNetics DNA availability facilitated segregation analysisfor 1,022of only FALS, one 6,172 NEK1 SALS events and 11,732 of this controls. frequency, The SALS and cohortlarger samplewas divided sizes into will 7 begeographically required to LOF variant, a p.Arg550* variant, which we also detected in the establish effects on ALS risk (Supplementary Table 6). Another iden- affected mother of the identified proband. To basedvalidate case-control the effect strata. of tifiedAllelic testscandidate of association gene was were the performed known hereditary for all subcohorts spastic and paraplegia followed LOF variants observed in FALS and assess any potentialby meta-analysis. contribution gene KIF5A6 (OR = 7.1, P = 4.8 × 10−4); however, no observed eleva- to sporadic disease, we analyzed full sequencing data of the NEK1 cod- tions in patient variant frequencies within the SALS replication cohort ing region for 2,303 SALS cases and 1,059 controls (Supplementary reached statistical significance (Supplementary Table 7).

148 Cohort Cases Controls MAF (cases) MAF (controls) OR 95% CI P Figure 2 Replication analysis of NEK1

–3 p.Arg261His. NEK1 p.Arg261His genotypes FALS 1,022 7,315 0.0086 0.0036 2.66 1.48−4.57 1.5 × 10 were ascertained for 1,022 FALS samples, SALS 6,172 SALS samples and 11,732 controls. Belgium 466 476 0.0097 0.0053 1.81 0.60−5.51 The SALS cohort was divided into seven Spain/Italy 472 183 0.0074 0.0055 1.36 0.28−6.58 geographically based case–control strata. Germany 1,229 288 0.0090 0.0017 5.27 0.71−38.81 Ireland 565 526 0.0044 0.0019 2.32 0.45−12.05 Logistic regression was used to conduct tests Netherlands 1,839 1,982 0.0109 0.0035 2.99 1.63−5.47 of allelic association for all subcohorts and UK 1,335 893 0.0049 0.0022 2.59 0.71−6.69 was followed by a fixed-effects meta-analysis. United States 266 69 0.0056 0.0072 0.84 0.08−8.29 Total 6,172 4,417 0.0080 0.0033 2.41 1.58−3.71 4.8 × 10–5 In the distribution of OR estimates across study cohorts (right), vertical dotted line FALS + SALS 7,194 11,732 0.0081 0.0035 2.41 1.57−3.71 1.2 × 10–7 denotes OR estimated under meta-analysis. 0.101.0 10.0 CI, confidence interval.

2 ADVANCE ONLINE PUBLICATION Nature GeNetics associations are believed to constitute rarer causes of disease.

Extension of the optimal known ALS gene parameters to all protein-coding genes identified one new gene displaying exome-wide significant disease association (Fig. 1b). The gene, NEK1 (odds ratio (OR) = 8.2, P = 1.7x10-⁶), encodes the serine/ threonine kinase NIMA (never in mitosis gene-A) related kinase. Retesting NEK1 under alternate analysis parameters revealed strong disease associations across most analysis strategies, particularly where we included LOF (nonsense and predicted splice altering) variants (Supplementary Table 2 and Supplementary Fig. 3). We observed no evidence for systematic genomic inflation (λ = 0.95), confounding related to sample ascertainment (Supplementary Fig. 4) or case-control biases in NEK1 gene coverage (Supplementary Fig. 5). Removal of samples carrying rare variants of known ALS genes did not influence the association (OR = 8.9, P = 7.3x10-⁷). 6

In an independent line of research, we performed whole-genome sequencing for four ALS patients from an isolated community in the Netherlands (population < 25,000). We observed high inbreeding coefficients for each of the four patients, confirming their high degree of relatedness and supporting a restricted genetic lineage (Supplementary Fig. 6). Autozygosity mapping, allowing for genetic heterogeneity, identified four candidate disease variants occurring in detectable runs of homozygosity (ROH) (Supplementary Fig. 7). These variants included a p.Arg261His variant of NEK1. Two of the four SALS cases were homozygous for p.Arg261His and two were heterozygous, raising the possibility that even a single copy of the allele may increase disease risk. Clinical evaluation of the 4 cases did not find any overt differences in disease phenotype. None of the other three candidate variants exhibited homozygosity in multiple patients or occurred at all in more than two patients. Analysis of the region identified a shared p.Arg261His haplotype spanning 3 Mb in all 4 samples (Supplementary Table 3).

To validate the risk effects of p.Arg261His, we tested for disease association among 6,172 SALS cases and 4,417 matched controls from eight countries (Supplementary Figs. 8 and 9, and Supplementary Methods). We genotyped this cohort using the Illumina exome chip or by whole-genome sequencing, allowing for checking of any overlap or detectable relatedness to the FALS case–control cohort, which was not present. Meta- analysis of all independent population strata identified a clear minor allele excess in cases with a combined significance of P = 4.8x10-⁵ and OR = 2.4 (Fig. 2). We also observed disease association in the FALS case–control data (OR = 2.7, P = 1.5x10-³) and a meta- analysis of FALS, SALS and all controls combined (OR = 2.4, P = 1.2x10-⁷).

149 Chapter 6 | NEK1 variants confer susceptibility to ALS

DNA availability facilitated segregation analysis of only one NEK1 LOF variant, a p.Arg550* variant which was also detected in the affected mother of the identified proband. To validate the effect of LOF variants observed in FALS and assess any potential contribution to sporadic disease, we analyzed full sequencing data of the NEK1 coding region for 2,303 SALS and 1,059 controls (Supplementary Fig. 3 and Supplementary Methods). RVB analysis confirmed a significant excess of LOF variants in cases (23/2,303 SALS versus 0/1,059 controls, OR = 22.2, P = 1.5x10-⁴, Supplementary Table 2). Meta-analysis of discovery and replication LOF analyses yielded a combined significance of P = 3.4x10-⁸ and OR = 8.8.

In total, we detected 120 predicted nonsynonymous NEK1 variants in FALS samples, SALS samples and controls. These were distributed throughout the gene including in the sequence encoding protein kinase domain (PKD) and six coiled-coil domains thought to be involved in mediating protein-protein interactions (Supplementary Fig. 3). After conditioning for LOF variants and p.Arg261His, we observed tentative excesses of case variants in analyses of rarer variant categories, but larger sample sizes will be required to confirm the pathogenicity beyond p.Arg261His and LOF variants (Supplementary Table 4). Analysis of other members of the NEK gene family (NEK2-11) identified no associations in the FALS data set meeting multiple test criteria (Supplementary Table 5).

Although no other gene achieved discovery significance, ten candidate loci exhibited P<1.0x10-³ in the FALS discovery analysis (Table 1). These included the gene encoding the SNARE (soluble NSF attachment protein receptor) complex protein synataxin 12 (STX12, OR = 33.1, P = 9.7x10-⁵). Analysis of the SALS replication cohort identified missense variants in 5/2,303 cases versus 0/1,059 in controls. However, the cohort was not sufficiently powered to assess events of this frequency, and larger sample sizes will be required to establish effects on ALS risk (Supplementary Table 6). Another identified candidate gene was the known hereditary spastic paraplegia gene KIF5A6 (OR = 7.1, P = 4.8x10-⁴), however no observed elevations in patient variant frequencies within the SALS replication cohort reached statistical significance (Supplementary Table 7).

NEK1 has been previously described as a candidate gene for ALS⁷,⁸. Here, our findings show that NEK1 in fact constitutes a major ALS-associated gene with risk variants present in ~3% of European and European-American ALS cases. We identified LOF variants in 1.2% of FALS samples (OR = 8.2) and 1.0% of SALS samples (OR = 22.2) versus 0.17% of controls, whereas we identified the p.Arg261His variant in 1.7% of FALS samples (OR = 2.7) and 1.6% of SALS samples (OR = 2.4) versus 0.69% of controls. We identified variants of unknown clinical importance (missense, MAF < 0.001) in a further

150 1.8% of FALS samples and 1.3% of SALS samples versus 1.2% of controls. In comparison, risk variants in previously established ALS genes occur at approximately the following percentages: C9orf72, <10%; SOD1, <2%; TARDBP, <1%; FUS, <1%; and others, <<1% or uncertain⁹-¹². However, caution must be taken when comparing the frequency of variants or mutations that differ in penetrance (i.e., highly penetrant mutations to lower-penetrance risk variants). Furthermore, assessment of the true odds ratio for variants in a gene may be difficult because of the presence of neutral variants that dilute out the observed effect. The actual odds ratio may therefore be even higher for specific subsets of patient variants. The LOF variants in NEK1 display a higher odds ratio relative to p.Arg261His. The p.Arg261His variant occurs adjacent to the protein kinase domain and is classified as deleterious by most bioinformatic prediction algorithms (SIFT, PolyPhen, LRT, MutationTaster, Mutation Assessor, PROVEAN, CADD, GERP, SiPhy). One model to account for the difference in p.Arg261His and LOF variant toxicity could 6 be a correlation between phenotypic expression and the predicted extent of NEK1 LOF. This model would also be consistent with previous findings that homozygosity for NEK1 LOF variants causes a severe developmental phenotype; short rib polydactyly syndrome type II (SRPS)¹³. In the current study, no individuals carried multiple LOF alleles. However, in SRPS homozygous carriers of NEK1 LOF variants have been reported to exhibit a 64% reduction of NEK1 mRNA levels while unaffected heterozygous parents exhibit a 30-40% reduction¹³.

Table 1. FALS Discovery Analysis Identifies Candidate Genes.

Gene ALS ALS Freq Control Control Freq OR OR 95% CI P NEK1 12 0.0117 14 0.0019 8.2 3.7-18.0 1.7x10-⁶ ATRN 8 0.0078 7 0.0010 10.3 3.6-29.6 3.7x10-⁵ STX12 4 0.0039 1 0.0001 33.1 5.8-339.0 9.7x10-⁵ CREB3L2 4 0.0039 0 0.0000 64.9 6.6-8695.3 1.1x10-⁴ DCC 4 0.0039 2 0.0003 18.6 4.1-108.1 3.1x10-⁴ WDR49 5 0.0049 2 0.0003 15.8 3.5-92.1 4.4x10-⁴ KIF5A 7 0.0068 8 0.0011 7.1 2.5-19.7 4.8x10-⁴ C1QTNF7 12 0.0117 26 0.0036 3.6 1.8-7.1 6.7x10-⁴ PEAK1 5 0.0049 3 0.0004 11.6 2.9-51.5 7.5x10-⁴ BIRC6 10 0.0098 18 0.0025 4.3 1.9-9.3 8.4x10-⁴ ZSCAN5B 4 0.0039 2 0.0003 16.3 3.3-98.0 8.8x10-⁴

RVB analysis results for all genes exhibiting case association at P<1x10-³ in FALS discovery cohort.

151 Chapter 6 | NEK1 variants confer susceptibility to ALS

NEK1 represents one of 11 members of the highly conserved NIMA kinase family, which has conserved functions in cell-cycle progression and mitosis. In post-mitotic cells, NEK1 is a primary regulator of the formation of non-motile primary cilium¹⁴,¹⁵. Disruption in the structure or function of primary cilia has been linked to neurological defects such as brain dysgenesis, hydrocephalus and intellectual disability¹⁶,¹⁷, and abnormalities in cilia number, structure and microtubule state occur in fibroblasts derived from SRPS patients homozygous for NEK1 truncation variants¹³. In vitro disruption of the activity of other neuronally expressed NEK family members has similarly been shown to disrupt neuronal morphology, neurite outgrowth, microtubule stability and microtubule dynamics¹⁸,¹⁹. Microtubule integrity and kinesin and dynein intraflagellar transport are essential to maintain cilia structure and function. This is of particular relevance as disruption of the microtubule cytoskeleton has been associated to the development of ALS³, and mutations of the dynein subunit dynactin are associated with motor neuron degeneration²⁰. Additionally, motor neurons derived from mice expressing human SOD1 G93A show a selective loss of cilia both in vitro and in vivo²¹. Besides its role in ciliogenesis, NEK1 is also known to regulate mitochondrial membrane permeability²² and DNA repair²³. Both of these processes have been extensively investigated in relation to ALS, and have been postulated to explain the toxicity of ALS-associated mutations in SOD1 and FUS²⁴,²⁵. Mutations in DNA-repair genes cause several early-onset neurological phenotypes, and multiple lines of evidence suggest defective DNA repair may contribute to both late-onset neurodegeneration and brain aging in general²⁶. For example, oxidative damage and DNA strand breaks have been observed to be elevated in ALS, Alzheimer’s disease and Parkinson’s disease cases²⁷, and a recent large-scale genome-wide association study (GWAS) implicated DNA repair genes as age-of-onset modifiers in Huntington’s disease²⁸. The pathological importance of DNA damage in ALS, and whether modifier effects observed in Huntington’s disease may generalize to repeat- expansion disorders such as C9orf72-associated ALS, constitute important questions to be addressed. Finally, through its coiled-coil domain, NEK1 has been shown to interact with multiple other proteins of potential importance, including the ALS-associated proteins VAPB and ALS2⁷, and the axonal outgrowth regulator FEZ1²⁹.

152 REFERENCES

1. Gilissen, C., Hoischen, A., Brunner, H.G. & Veltman, J.A. Unlocking Mendelian disease using exome sequencing. Genome Biol 12, 228 (2011). 2. Ng, S.B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 42, 790-3 (2010). 3. Smith, B.N. et al. Exome-wide rare variant analysis identifies TUBA4A mutations associated with familial ALS. Neuron 84, 324-31 (2014). 4. Jian, X., Boerwinkle, E. & Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 42, 13534-44 (2014). 5. Shihab, H.A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 34, 57-65 (2013). 6. Reid, E. et al. A kinesin heavy chain (KIF5A) mutation in hereditary spastic paraplegia (SPG10). Am J Hum Genet 71, 1189-94 (2002). 7. Cirulli, E.T. et al. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. 6 Science 347, 1436-41 (2015). 8. Brenner, D. et al. NEK1 mutations in familial amyotrophic lateral sclerosis. Brain (2016). 9. Renton, A.E., Chio, A. & Traynor, B.J. State of play in amyotrophic lateral sclerosis genetics. Nat Neurosci 17, 17-23 (2014). 10. Kenna, K.P. et al. Delineating the genetic heterogeneity of ALS using targeted high-throughput sequencing. J Med Genet 50, 776-83 (2013). 11. Lattante, S. et al. Contribution of major amyotrophic lateral sclerosis genes to the etiology of sporadic disease. Neurology 79, 66-72 (2012). 12. Chio, A. et al. Extensive genetics of ALS: A population-based study in Italy. Neurology 79, 1983-1989 (2012). 13. Thiel, C. et al. NEK1 mutations cause short-rib polydactyly syndrome type majewski. Am J Hum Genet 88, 106-14 (2011). 14. Shalom, O., Shalva, N., Altschuler, Y. & Motro, B. The mammalian NEK1 kinase is involved in primary cilium formation. FEBS Lett 582, 1465-70 (2008). 15. White, M.C. & Quarmby, L.M. The NIMA-family kinase, NEK1 affects the stability of centrosomes and ciliogenesis. BMC Cell Biol 9, 29 (2008). 16. Lee, J.H. & Gleeson, J.G. The role of primary cilia in neuronal function. Neurobiol Dis 38, 167-72 (2010). 17. Lee, L. Riding the wave of ependymal cilia: genetic susceptibility to hydrocephalus in primary ciliary dyskinesia. J Neurosci Res 91, 1117-32 (2013). 18. Cohen, S., Aizer, A., Shav-Tal, Y., Yanai, A. & Motro, B. Nek7 kinase accelerates microtubule dynamic instability. Biochim Biophys Acta 1833, 1104-13 (2013). 19. Chang, J., Baloh, R.H. & Milbrandt, J. The NIMA-family kinase Nek3 regulates microtubule acetylation in neurons. J Cell Sci 122, 2274-82 (2009). 20. Puls, I. et al. Mutant dynactin in motor neuron disease. Nat Genet 33, 455-6 (2003). 21. Ma, X., Peterson, R. & Turnbull, J. Adenylyl cyclase type 3, a marker of primary cilia, is reduced in primary cell culture and in lumbar spinal cord in situ in G93A SOD1 mice. BMC Neurosci 12, 71 (2011).

153 Chapter 6 | NEK1 variants confer susceptibility to ALS

22. Chen, Y., Craigen, W.J. & Riley, D.J. NEK1 regulates cell death and mitochondrial membrane permeability through of VDAC1. Cell Cycle 8, 257-67 (2009). 23. Pelegrini, A.L. et al. NEK1 silencing slows down DNA repair and blocks DNA damage-induced cell cycle arrest. Mutagenesis 25, 447-54 (2010). 24. Sama, R.R., Ward, C.L. & Bosco, D.A. Functions of FUS/TLS from DNA repair to stress response: implications for ALS. ASN Neuro 6(2014). 25. Tafuri, F., Ronchi, D., Magri, F., Comi, G.P. & Corti, S. SOD1 misplacing and mitochondrial dysfunction in amyotrophic lateral sclerosis pathogenesis. Front Cell Neurosci 9, 336 (2015). 26. Madabhushi, R., Pan, L. & Tsai, L.H. DNA damage and its links to neurodegeneration. Neuron 83, 266-82 (2014). 27. Coppede, F. & Migliore, L. DNA damage in neurodegenerative diseases. Mutat Res 776, 84-97 (2015). 28. Genetic Modifiers of Huntington’s Disease, C. Identification of Genetic Factors that Modify Clinical Onset of Huntington’s Disease. Cell 162, 516-26 (2015). 29. Surpili, M.J., Delben, T.M. & Kobarg, J. Identification of proteins that interact with the central coiled-coil region of the human protein kinase NEK1. Biochemistry 42, 15369-76 (2003).

154 FULL AUTHOR LIST

Perry TC van Doormaal*, Kevin P Kenna*, Annelot M. Dekker*, Nicola Ticozzi*, Brendan J Kenna, Frank P Diekstra, Wouter van Rheenen, Kristel R van Eijk, Ashley R Jones, Pamela Keagle, Aleksey Shatunov, William Sproviero, Bradley N Smith, Michael A van Es, Simon D Topp, Aoife Kenna, Jack W Miller, Claudia Fallini, Cinzia Tiloca, Russell L McLaughlin, Caroline Vance, Claire Troakes, Claudia Colombrita, Gabriele Mora, Andrea Calvo, Federico Verde, Safa Al-Sarraj, Andrew King, Daniela Calini, Jacqueline de Belleroche, Frank Baas, Anneke J van der Kooi, Marianne de Visser, Anneloor LMA ten Asbroek, Peter C Sapp, Diane McKenna-Yasek, Meraida Polak, Seneshaw Asress, José Luis Muñoz-Blanco, Tim M Strom, Thomas Meitinger, Karen E Morrison, SLAGEN Consortium, Giuseppe Lauria, Kelly L Williams, P Nigel Leigh, Garth A Nicholson, Ian P Blair, Claire S Leblond, Patrick A Dion, Guy A Rouleau, Hardev Pall, Pamela J Shaw, 6 Martin R Turner ,Kevin Talbot, Franco Taroni, Kevin B Boylan, Marka Van Blitterswijk, Rosa Rademakers, Jesús Esteban-Pérez, Alberto García-Redondo, Phillip Van Damme, Wim Robberecht, Adriano Chio, Cinzia Gellera, Carsten Drepper, Michael Sendtner, Antonia Ratti, Jonathan D Glass, Jesús S Mora, Nazli A Basak, Orla Hardiman, Albert C Ludolph, Peter M Andersen, Jochen H Weishaupt, Robert H Brown, Jr, Ammar Al- Chalabi, Vincenzo Silani**, Christopher E Shaw**, Leonard H van den Berg**, Jan H Veldink**, John E Landers**

* These authors contributed equally ** These authors jointly directed this work

155 Chapter 6 | NEK1 variants confer susceptibility to ALS

SUPPLEMENTARY INFORMATION

A selection of the supplementary methods and figures is provided here. The full supplementary information is provided online with the published version of this chapter: http://www.nature.com/ng/journal/v48/n9/abs/ng.3626.html

SUPPLEMENTARY METHODS

FALS discovery cohort The FALS discovery cohort included 1,376 FALS patients and 13,883 non-ALS controls analyzed by exome sequencing. Patients were recruited at specialist clinics in Ireland (n = 18), Italy (n = 143), Spain (n = 49), the UK (n = 219), the USA (n = 511), Netherlands (n = 50), Canada (n = 34), Belgium (n = 12), Germany (n = 202), Turkey (n = 47) and Australia (n = 91). Variants occurring at very low frequency in the general population (ExAC MAF<0.0001) which have been both previously reported as ALS associated and annotated as either ‘pathogenic’ or ‘likely pathogenic’ by Clinvar within the 10 genes were considered to be pathogenic mutations. The breakdown of the 72 mutations observed in the final cohort included the following: SOD1 (28), TARDBP (12), FUS (9), PFN1 (6), TBK1 (1), TUBA4A (4), UBQLN2 (4), VAPB (2), VCP (6). An additional 26 cases harbored a repeat expansion in the C9orf72 gene. Controls included 29 internal samples and individuals participating in the dbGAP30 projects. Familial history was considered positive for ALS if the proband had at least one affected relative within three generations. We received approval for this study from the institutional review boards of the participating centers, and written informed consent was obtained from all patients (consent for research).

SALS replication cohort The SALS replication cohort included 2,387 SALS and 1,093 controls analyzed by whole genome sequencing and 5,834 SALS and 4,117 controls analyzed by exome chip. All individuals were recruited at specialist clinics in Ireland, Italy, Spain, the UK, the USA, the Netherlands and Belgium. Details of sample contributions per country are shown in Fig. 2. Evaluation of C9orf72 status was performed in 2,387 SALS cases and 166 (7%) displayed a repeat expansion. We received approval for this study from the institutional review boards of the participating centers, and written informed consent was obtained from all patients (consent for research).

156 Exome sequencing Exome sequencing of patients was performed as previously described.³⁰ Raw sequence data for controls was obtained from dbGAP. Sequence reads were aligned to human reference GRCh37 using BWA (Burrows-Wheeler Aligner) and processed according to recommended best practices³¹. Variant detection and genotyping were performed using the GATK HaplotypeCaller. Variant quality control was performed using the GATK variant quality score recalibration method, with a VQSLOD cut-off of 2.27 (truth set sensitivity of 99%). A minimum variant quality by depth (QD) score of 2 was also imposed and all genotypes associated with genotype quality (GQ)< 20 were reset to missing. Variants were also excluded in the event of case or control call rates <70% (post genotype QC). Exome sequencing data was not used to infer the presence or absence of indels due to the limited sensitivity and comparatively high false positive rates associated with available calling algorithms³². 6

Genome sequencing Whole genome sequencing of 2387 SALS and 1093 controls was performed with Illumina’s FastTrack services (San Diego, USA) using PCR free library preparation and paired-end (100bp or 150bp) sequencing on the HiSeq 2500 or Hiseq X platform (Illumina) to yield 35X coverage at minimum. BWA was used to align sequencing reads to genome build hg19, and the Isaac variant caller was used to call single nucleotide variants (SNVs), insertions and deletions (indels)³³. Both the aligned and unaligned reads were delivered in binary sequence alignment/map format (BAM) together with variant call format (VCF) files containing the SNVs, and indels. gVCF files were generated per individual and variants that failed the Isaac-based quality filter were excluded.

Exome chip A total of 5,815 ALS patients and 4,614 healthy controls from the Netherlands, Belgium, Germany, Ireland, Italy, Spain and the UK were included. Genotyping was conducted using Illumina HumanExome-12v1 BeadChips in accordance with the manufacturer’s recommendations. The GenTrain 2.0 clustering algorithm was used for genotype calling, as implemented in the Illumina GenomeStudio software package. Initial genotype calls were made based on the HumanExome clusterfile provided by Illumina. More accurate cluster boundaries were determined based on the actual study data, after the exclusion of samples with a GenCall quality score in the lower 10th percentile of the distribution across all variants genotyped (p10GC) < 0.38 or call rate <0.99. Subsequently, the excluded samples were added back into the data set, and new genotypes calls were made using the previously obtained cluster boundaries.

157 Chapter 6 | NEK1 variants confer susceptibility to ALS

Sample filtering Samples from the FALS discovery and SALS replication cohorts were excluded from analysis in the event of failing to meet genotype call rate, heterozygosity, gender concordance, duplication, relatedness or population stratification filters as summarized in Supplemental Fig. 1 and Supplemental Fig. 7 All samples from the FALS cohort were required to exhibit filtered exome-wide call rates >70%. For both the FALS and SALS cohorts, PLINK v1.07³⁴ was used to define an LD pruned (R²<0.5, window size = 50, step = 5) set of autosomal markers with MAF>0.01 and p>0.001 for deviation from Hardy Weinberg equilibrium. These marker sets were then used to calculate inbreeding coefficients for use in heterozygosity filtering, identify study duplicates, conduct relatedness filtering, perform tests of pairwise population concordance for stratification filtering, conduct principal components (PC) analysis for a second round of stratification filtering and conduct PC analysis to generate covariates for stratification correction in RVB analysis and single variant analysis of filtered cohorts. Samples from the SALS replication cohort were required to exhibit no relatedness/ duplication with samples from the FALS discovery cohort. PLINK was used to calculate inbreeding coefficients, test for discordance in reported and SNV predicted gender and conduct tests of pairwise population concordance. Identification of sample duplicates and sample relatedness was performed using KING³⁵. PC analyses were conducted using GCTA³⁶ (Genome-wide Complex Trait Analysis). Details of results from population stratification analysis provided in Supplementary Fig. 2 and Supplementary Fig. 8.

Statistical analyses RVB analyses were performed by logistic regression of case-control status to number of minor alleles observed per sample per gene³⁶,³⁷. Results from underpowered tests (<3 observations in combined case control cohort) were excluded and did not contribute to assessments of genomic inflation. Variants were included for RVB analyses on the basis of MAF within the combined case-control cohort, MAF within the 1000 genomes project³⁸, and pathogenicity predictions generated using snpEFF³⁹ (Single Nucleotide Polymorphism Effect), PolyPhen2⁴⁰ (Polymorphism Phenotyping version2), SIFT⁴¹ (Sorting Intolerant From Tolerant), LRT⁴² (Likelihood Ratio Test), MutationTaster⁴³, MutationAssessor⁴⁴, FATHMM5 (Functional Analysis through Hidden Markov Models), CADD⁴⁵ (Combined Annotation Dependent Depletion), PROVEAN⁴⁶ (Protein Variation Effect Analyzer), GERP⁴⁷ (Genomic Evolutionary Rate Profiling), phyloP⁴⁸ (Phylogenetic P-value), SiPhy⁴⁹ (SiPhylogenic), dbNSFP⁵⁰ (database Nonsynonymous SNP Functional Prediction) and dbscSNV4 (database of splice site consequences of Single Nucleotide Variants) as described Supplemental table 1. All RVB analyses were conditioned for a missing variant MAF weighted measure of sample gene call rate and the first 4 PC

158 derived from common variant profiles. Homozygosity mapping was performed using HomozygosityMapper⁵¹ allowing for genetic heterogeneity. ROH were selected as all loci achieving a homozygosity score> = 8483 (0.6 x max). Single variant analyses were allele count based, conducted using PLINK and also included correction for the first 4 PC derived from common variant profiles. Meta-analyses were conducted using METAL⁵² under a fixed effect model with weighting by inverted effect size standard error. All statistical tests were two-sided.

6

159 Chapter 6 | NEK1 variants confer susceptibility to ALS

SUPPLEMENTARY FIGURES

Supplementary Figure 3 Supplementary Figure 3. Distribution of NEK1 variants. Observed case-control distribution of Distribution of NEK1 variants. NEK1 variants in FALS (a) and SALS (b) cohorts. LOF variants are highlighted in black, missense variants (a,b) Observed case–control distribution of NEK1 variants in FALS (a) and SALS (b) cohorts. LOF variants are highlighted in black; missense arevariants labeled are labeled in grey. in gr HGVSay. HGVS descriptions descriptions are are followedfollowed by by case/control case/control carrier carrier counts countsin parenthes in parenthesis.es. Predicted splice- altering variantsPredicted are indicated splice alteringwith an asterisk. variants are indicated with an asterisk.

160 Nature Genetics: doi:10.1038/ng.3626

Supplementary Figure 6 6 SupplementaryInbreeding coefficients Figure from Dutch6: Inbreeding whole-genome sequencing coefficients cohort. from Dutch whole genome sequencing Four ALS patients sampled from an isolated community in the Netherlands can be seen to exhibit elevated coefficients of inbreeding (shown in red) relative to a larger panel of Dutch genome sequences (n = 1,861). Box plots show cohort median, interquartile range, cohort. 2.5%Four quantile ALS and patients 97.5% quantile. sampled from an isolated community in the Netherlands can be seen to exhibit elevated coefficients of inbreeding (shown in red) relative to a larger panel of Dutch genome sequences (n=1,861). Boxplot shows cohort median, interquartile range, 2.5% quantile and 97.5% quantile.

Nature Genetics: doi:10.1038/ng.3626

Supplementary Figure 7: Autozygosity mapping reveals NEK1:p.Arg261His as candidate ALS variant. Whole genome sequencing followed by autozygosity mapping with allowed genetic heterogeneity identified 10 runs of homozygosity present in one or more of four SALS patients from Supplementary Figure 7 an isolated Dutch community (Top panel). These regions contained four variants where at least Autozygosity mapping identifies NEK1 p.Arg261His as a candidate ALS variant. one of the four patients was homozygous and where MAF was less than 0.01 in the 1,000 genomes Whole-genome sequencing followed by autozygosity mapping with allowed genetic heterogeneity identified ten runs of homozygosity preseproject,nt in one the or m NHLBIore of fo exomeur SALS sequencingpatients from an project isolated Dandutch ExAC comm u(bottomnity (top). Tpanel).hese reg NEK1ions co:p.Arg261Hisntained four vari aisnts the where only at least one of the four patients was homozygous and where MAF was less than 0.01 in the 1000 Genomes Project, the NHLBI Exome Seqvariantuencing Pidentifiableroject and ExA Cin ( ballotto patientsm). NEK1 pand.Arg2 6the1His only is the variantonly varia ntfor id ewhichntifiable multiplein all patien homozygousts and the only vari genotypesant for which multiple homozygous genotypes were observed. were observed.

161

Nature Genetics: doi:10.1038/ng.3626 Chapter 6 | NEK1 variants confer susceptibility to ALS

SUPPLEMENTARY REFERENCES

1. Tryka, K.A. et al. NCBI’s Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 42, D975-9 (2014). 2. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491-8 (2011). 3. Fang, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med 6, 89 (2014). 4. Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041-3 (2013). 5. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559-75 (2007). 6. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867-73 (2010). 7. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome- wide complex trait analysis. Am J Hum Genet 88, 76-82 (2011). 8. Heinze, G. & Schemper, M. A solution to the problem of separation in logistic regression. Stat Med 21, 2409-19 (2002). 9. Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65 (2012). 10. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92 (2012). 11. Adzhubei, I., Jordan, D.M. & Sunyaev, S.R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet Chapter 7, Unit7 20 (2013). 12. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073-81 (2009). 13. Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res 19, 1553-61 (2009). 14. Schwarz, J.M., Rodelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 7, 575-6 (2010). 15. Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39, e118 (2011). 16. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310-5 (2014). 17. Choi, Y., Sims, G.E., Murphy, S., Miller, J.R. & Chan, A.P. Predicting the functional effect of amino acid substitutions and indels. PLoS One 7, e46688 (2012). 18. Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6, e1001025 (2010). 19. Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15, 901-13 (2005). 20. Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54-62 (2009).

162 21. Liu, X., Wu, C., Li, C. & Boerwinkle, E. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Non-synonymous and Splice Site SNVs. Hum Mutat (2015). 22. Seelow, D., Schuelke, M., Hildebrandt, F. & Nurnberg, P. HomozygosityMapper--an interactive approach to homozygosity mapping. Nucleic Acids Res 37, W593-9 (2009). 23. Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190-1 (2010).

6

163 164 Chapter 7

Discussion

7

165 Chapter 7 | Discussion

This thesis combines different genetic technologies and methods to increase understanding of the genetic landscape of ALS, ultimately aiming to 1) help dissect the underlying disease pathophysiology, 2) guide more accurate risk prediction models and 3) tailor personalized treatments based on genetic profile. In this chapter, I first discuss the various technologies and methods used in this thesis to unravel ALS genetics, and how the results from various chapters relate. I proceed with new insights regarding the genetic architecture of ALS, and how these aid the goals of treatment, prediction and elucidation of disease processes. Finally, I provide directions for future research.

Candidate gene studies In 2010, a genome-wide CNV association study identified the 15q11.2 locus containing NIPA1 (non-imprinted in Prader-Willi/Angelman syndrome 1) to possibly associate with ALS¹. NIPA1 was previously linked to hereditary spastic paraplegia (HSP) type 6, a neurodegenerative disease characterized by slowly progressive upper motor neuron loss of predominantly the lower limbs. This phenotypic overlap with ALS made NIPA1 an interesting candidate for follow-up studies, and subsequently a screen of the gene was performed in patients and healthy controls. This study reported an association between ALS susceptibility and a polyalanine repeat expansion at the 5’-end of NIPA1². Also, NIPA1 repeat expansions were associated with lower age of onset and shorter survival. Since this original publication, only one other publication tested NIPA1 repeat expansions for association with ALS, albeit in a much smaller group of controls and C9orf72 repeat expansion carriers³. No association was found, and no other groups have attempted to replicate the association since or did not publish on it. Chapter 2 of this thesis includes the first replication study of the association of NIPA1 repeat expansions and the risk of ALS in a large international cohort⁴. We first replicated the association of long polyalanine repeats in NIPA1 with ALS in an independent Dutch cohort. Overcoming methodological challenges, we combined results from multiple wet- and dry-lab techniques, allowing inclusion of 6,231 individuals from 6 populations. Meta- analysis of our analyses with previously published data, comprising 6,245 ALS patients and 5,051 controls, confirmed that NIPA1 repeat expansions confer an increased risk for developing ALS. The lack of replication of the original study by other groups is striking. Reliable analysis of the NIPA1 repeat length using wet-lab methods (repeat PCR and Sanger sequencing) proved difficult, possibly obstructing replication attempts at early stages. Also, the effect size was higher in the discovery dataset than the replication dataset (due to winner’s curse), warranting a large sample size for sufficient power for replication. Our study failed to replicate the relationship between long NIPA1 alleles and age of onset or survival in the replication set samples of which we had phenotypic information. Since phenotypic information was only available for approximately 50%

166 of cases in our replication dataset we cannot definitively conclude that NIPA1 repeat expansions are not related to age of onset or survival, stressing the importance of documentation of clinical information.

Prior to the work described in Chapter 3⁵, several ALS pedigrees were identified that carried mutations in multiple ALS genes⁶-⁸. Inheritance patterns with incomplete penetrance and large phenotypic variability had been well established, but not explained. In Chapter 3⁵, in order to help dissect the genetic architecture of sporadic ALS, we undertook a large sequencing study that analyzed the co-occurrence of 10 ALS genes in sporadic ALS cases. The genes were divided into: 1) high-risk genes (rare mutations with large effect): SOD1, C9orf72, TARDBP and FUS; 2) susceptibility genes with intermediate effect: ANG, CHMP2B, ATXN2, NIPA1, SMN1, and 3) polymorphisms with small effect: a risk SNP in UNC13A. Although we found a threefold increased risk of co-occurring variants in sporadic ALS patients, this signal was not significant. We did find an excess of concomitant mutations in C9orf72 repeat expansion carriers. Additional analyses showed significant co-occurrence of C9orf72 and NIPA1 repeat 7 expansions, pointing towards NIPA1 as a modifier of the C9orf72 phenotype.

In the work described in Chapter 2⁴, we aimed to replicate the co-occurrence of C9orf72 and NIPA1 repeat expansions in ALS cases described in Chapter 3⁵. Although we found a higher than expected number of ALS cases carrying both mutations, we failed to robustly reproduce the signal. One possible explanation is the relatively small sample size in the original study (755 ALS patients), resulting in broad confidence intervals. Alternatively, the co-occurrence might be relevant in some but not all included populations. The results described in these candidate gene association studies reemphasize the necessity for replication, and the importance of tracking clinical characteristics of all individuals in large genetic databases.

Genome-wide association studies Between 2009 and 2014, several GWAS were performed in ALS, together identifying three loci associated to the disease⁹-¹¹. Guided by the success of large GWAS in other complex traits¹²,¹³, we performed one of the largest GWAS in ALS to date, aiming to discover new genetic risk loci and further elucidate the genetic architecture. The study, described in Chapter 4, comprised a total of 12,577 cases and 23,475 controls from 41 different cohorts in the discovery phase, and a total of 2,579 cases and 2,767 controls from nine independent replication cohorts¹⁴. After imputation with a custom ALS-specific reference panel, we fine-mapped a new risk locus on chromosome 21 and, subsequently, identified C21orf2 as a gene associated with ALS risk. In addition,

167 Chapter 7 | Discussion

we identified variants in MOBP and SCFD1 as new risk loci. There was no evidence for residual association within each locus after conditioning on the top SNP, indicating that all risk loci were independent signals. A burden analysis in a set of whole genome sequenced samples showed an excess of non-synonymous and loss-of-function mutations in C21orf2 among ALS cases, that persisted after conditioning on the genome-wide significant variant. Interestingly, variants in MOBP have previously been associated with progressive supranuclear palsy (PSP) ¹⁵ and were identified as a modifier for survival in FTD¹⁶. Our study also replicated the signals at the three loci linked to ALS by the previous GWAS. Additional analyses using polygenic scores showed that a modest, albeit significant proportion of the phenotypic variance could be explained by all SNPs. The proportion of the heritability explained by all variants was estimated around 8.5%, notably lower than the ~ 61% heritability based on twin studies¹⁷. The genome-wide-significant loci only explained 0.2% of this heritability, implicating that the bulk of the heritability (8.3%) was captured by SNPs with associations below genome-wide significance. Estimated heritability partitioned by MAF showed that most variance was explained by low-frequency SNPs (i.e., MAF 1-10%). To further elucidate the role of these low-frequency and rare variants in the genetic landscape of ALS, we performed the largest exome-wide study in ALS to date, using the exome array to investigate the role of low-frequency and rare variants in 7,350 ALS patients and controls drawn from different European cohorts. The results of this work are described in Chapter 5. Although imputation of low-frequency (MAF 0.5 – 5.0%) variants in genotyping data has become more accurate with the increasing public availability of large reference panels, and with the combination of public reference panels with study- specific reference panels, high quality imputation of rare variants remains challenging. Ergo, the GWAS described in Chapter 4 could only include variants with MAF above 1%¹⁴. The exome array is a low-resolution, low-cost alternative to large-scale exome sequencing, with the majority of variants at MAF < 1%. As a result, only 43% of the variants in our dataset (i.e., 100,896 variants) were polymorphic and candidates for downstream analyses. Exome chip data is not suitable for high-quality imputation, due to the sparsity of common variants, thereby limiting analyses to directly genotyped variants. No associations reached the predefined significance levels in single-variant association testing, gene-based burden testing and exome-wide individual set-unique burden (ISUB) testing. Several of the strongest associations in single-variant association testing and gene-based burden testing are known ALS signals, demonstrating the validity of our methods. We concluded that none of the rare variants captured on the exome array were large-effect variants associated with ALS. ISUB testing did not reveal an exome-wide increase in individual set-unique burden between cases compared to controls, as opposed to a similar analysis previously performed in schizophrenia¹⁸.

168 Next-generation sequencing studies As the costs of NGS drops, the opportunities for large scale sequencing studies in ALS rise. Chapter 6 describes a next-generation sequencing study that combined two different lines of research from the USA and the Netherlands¹⁹. Both groups aimed to identify novel ALS genes, albeit with different next-generation sequencing approaches. Rare variant burden analysis in whole exome sequencing data from familial ALS patients and controls, and homozygosity analysis in whole genome sequencing data of patients from a Dutch inbred community, independently pointed towards NEK1 (NIMA (never in mitosis gene-A)-related kinase) as a novel gene associated with ALS. Subsequently, the risk effect of the NEK1 variant p.Arg261His was validated in 6,172 sALS cases and 4,417 matched controls from eight countries, genotyped on the Illumina exome chip or through whole-genome sequencing. Rare variant burden analysis yielded a significant excess of LOF variants in sporadic ALS cases compared to controls, further confirming the role of NEK1 in ALS pathogenesis. Although NEK1 had previously been identified as a candidate gene²⁰,²¹, our study was the first to firmly establish NEK1 as an ALS gene. Furthermore, our study identified several new candidate genes, including 7 KIF5A, a gene that was previously linked to HSP and Charcot-Marie-Tooth type 2. In 2018, KIF5A was confirmed as a novel ALS gene through a large new genome-wide association study, and rare variant burden analysis on exome sequencing data of 21,000 samples²². Interestingly, NEK1 was the top associating gene in the gene-based burden analysis described in Chapter 5. This signal was mainly driven by the presence of the p.Arg261His variant on the Illumina exome chip. In our rare variant burden analysis in the exome chip data, we did not find an increased rare variant burden in other known ALS genes. This is probably due to a combination of factors; including the small number of variants included in this study and low power for finding rare variants with modest-to-weak effect sizes²³. Illustratively, for gene-based burden analysis in KIF5A, we could only include 3 variants that were not located in the ALS-associated domains, and therefore failed to pick up on a disease-associated signal.

Genetic characterization of ALS The genetic architecture of a disease is the underlying genetic basis defined as the combination of the number, type, frequency, relationship between and magnitude of effect of genetic variants contributing to that trait²⁴. The work described in this thesis supports existing views or sheds new lights on several core concepts of the genetic architecture of ALS.

169 Chapter 7 | Discussion

Heritability The heritability of a trait is defined as the proportion of phenotypic variation in a population that is attributable to genetic variation among individuals²⁵. Heritability estimates provide boundaries for the ability of genetics to predict disease, but also have inherent limitations. There are multiple statistical methods to assess heritability in human complex traits. Three main sources of data derive from monozygotic and dizygotic twins, adopted siblings and families. The methods in these studies are based on comparisons between shared genetic background, unique environment and shared environmental factors. With the rise of GWAS, heritability estimates are increasingly based on genotyped and imputed variants in unrelated individuals²⁵,²⁶.

The first large scale twin-based ALS heritability study was published in 2010, with an estimated pedigree-based heritability in sporadic ALS of 61% (95% confidence intervals (CI) 38-78). Two subsequent studies used the genome-wide complex trait analysis (GCTA) GREML (genome-based restricted maximum likelihood) method to estimate ALS heritability from a genetic relationship matrix based on genome-wide SNP data in large case-control cohorts. These studies estimated the SNP-based heritability 2 (h snp) at around 12% (95% CI, 11–13), and 21% (95% CI, 17-25), respectively¹¹,²⁷. As discussed previously, the SNP-based heritability described in Chapter 4 of this thesis was estimated around 8.5% using multiple methods, including GCTA-GREML¹⁴. The differences between these SNP-based heritability estimates have been attributed partly to the different ALS prevalence estimates in the various studies²⁸,²⁹. Other explanations include different sets of variants included in the studies, residual structure in the data (e.g., through population stratification and correlation between environmental factors and genotype), different case-control ratios and differences in data quality control³⁰-³². In ALS, as is well known for many diseases and traits, family-based heritability estimates are significantly higher than the variance explained by known genetic variants. This mismatch is termed the ‘missing heritability’³³, and has been hotly debated in genetic literature.

Our GWAS (Chapter 4) showed that the six genome-wide significant loci only explained 0.2% of the 8.5% SNP-based heritability, leaving the bulk of the heritability captured by SNPs with associations below genome-wide significance¹⁴. Partitioning the heritability by MAF showed that most variance is explained by low-frequency SNPs (i.e., MAF 1-10%). This means that an further increase in sample size in ALS GWAS will likely yield novel risk loci, especially in the lower allele frequency bins32. In the exome chip analyses described in Chapter 5 we failed to identity additional rare variants associated to ALS, due to the absence of large effect variants (i.e., OR > 2.3% at MAF <

170 1%) present on the exome chip associated to the disease. The relatively low sample size in the exome chip study prohibited examination of low frequency and rare variants of lower effect size, emphasizing the need for large-scale high resolution studies of rare variation in ALS.

In ALS, as is well known for many diseases and traits, family-based heritability estimates are significantly higher than the variance explained by known genetic variants. The gap between the twin-based ALS heritability estimates (the total heritability) of approximately 61% and the SNP-based heritability estimates of 8.5-21% is termed the missing heritability. The 8.3% gap between the SNP-based heritability of 8.5% and the 0.2% explained heritability attributed to significant GWAS variants, as described in Chapter 4¹⁴, is termed the hiding heritability³⁴. This distinction is important, since the hiding heritability should decrease with an increase of study sample size. The missing heritability of complex traits has been hotly debated in genetic literature, and might involve multiple causes. Genotyped SNPs that underlie the SNP-based heritability do not perfectly tag other variants possibly involved in ALS susceptibility. This is 7 particularly true for rare single nucleotide variants and other variant types, e.g. repeat expansions and structural variants³²,³⁵. Several studies, including the studies described in Chapter 2⁴ and Chapter 6¹⁹ of this thesis, have established rare genetic variants and repeat-expansions as risk factors for ALS, emphasizing the role of these types of variation as potential causes for the missing heritability in ALS. Other causes of missing heritability include gene-gene interactions (epistasis) and gene-environment interactions, maybe partially through epigenetic mechanisms. Future studies including large samples sizes and / or addressing different types of (epi)genetic variation should provide novel insights into the final spectrum of the missing heritability of ALS.

Oligogenic inheritance Although the term ‘genetic architecture’ encompasses the entire genetic basis of a trait, it is also frequently used to describe its polygenicity. Historically, diseases are divided into monogenic, oligogenic or polygenic architectures, meaning that one, few or many genetic variants contribute to phenotypic variability, respectively³⁶. For years it has been clear that ALS segregates in families that carry high risk mutations (e.g., in SOD1, C9orf72, FUS and TARDBP), with Mendelian inheritance patterns, congruent with a monogenic architecture. However, this monogenic model does not explain all aspects of the disease, including the sporadic appearance in the majority of patients. Even in families with a clear Mendelian inheritance pattern, the clinical manifestations of ALS are also highly variable (e.g., in terms of site/age of onset, concomitant symptoms, disease progression and survival), suggestive of the existence of disease-modifying factors³⁷.

171 Chapter 7 | Discussion

Observations that large GWAS in ALS (Chapter 4¹⁴) identify far fewer loci compared to other complex traits and diseases, for instance height or schizophrenia, support the notion that ALS is also not a pure polygenic trait with many common variants contributing to disease onset. Consistently, ISUB analysis performed in Chapter 5 did not show an increased unique rare variant burden in ALS cases compared to controls. ISUB analysis in a comparable sample size in schizophrenia did show such an excess, supporting the more polygenic nature of schizophrenia compared to ALS. Recently, the proposal of an ‘omnigenic’ model has attracted much attention in genetic debates38. Omnigenic is a more precise term than polygenic, and the key feature of the model is the classification of genes as ‘peripheral’ (involved in cellular networks relevant for disease; thereby contributing to risk for many diseases and therefore to pleiotropy) or ‘core’ (which are more disease specific with biologically interpretable roles)³⁹. It states that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease- related genes and that most heritability can be explained by effects on genes outside core pathways³⁸. Although appealing in terms of conceptualization, one key limitation of the omnigenic model is the inability to pull the different gene categories apart, leaving it undistinguishable from the polygenic model. Also, contemplating on the omnigenicity of a presumed non-polygenic trait as ALS might provide an interesting thought experiment, but might be less worthwhile in terms of relevancy.

Ensuing case reports of families carrying mutations in multiple ALS genes⁶,⁷,⁴⁰, attention shifted towards an oligogenic inheritance model of ALS. Multiple studies have since investigated and confirmed a larger than expected percentage of co- occurring pathogenic mutations in ALS patients, including our study described in Chapter 2 ³,⁵,⁸,⁴¹-⁴³. Oligogenic inheritance models state that a quantitative trait is genetically controlled by a few genes with large effects and many genes with small effects⁴⁴. This model provides an explanation for the marked phenotypic heterogeneity and incomplete/non-penetrance observed in ALS, and fits a multistep disease mechanism. In a landmark paper in 2014, ALS was modelled into a six step process, suggesting that six genetic or environmental exposures are needed, and that the last one (which could be environmental or genetic because gene expression varies through life) is the disease trigger⁴⁵. Multiple hit pathophysiology also fits many other features of ALS, including the late onset, genetic pleiotropy, and the process by which the disease cascades through the (predominantly motor) system rapidly after onset⁴⁶. Carrying a large effect mutation might account for multiple steps, thereby explaining the monogenic (or Mendelian) appearance of the disease in families harboring such a variant. Recently, the authors of the original multistep paper revisited the six-step

172 model to test this hypothesis, and showed that carriers of high risk mutations need a reduced amount of steps to trigger disease (e.g., 3 additional steps for C9orf72 repeat expansion carriers, and 2 additional steps for SOD1 mutation carriers)⁴⁷. Since families often share many environmental factors, this model also explains the monogenic appearance in (some, but not necessarily all) family members.

Pleiotropy Several ALS-associated genes exhibit a phenomenon called pleiotropy, where one gene can influence multiple phenotypes. Identification of pleiotropic variants and genes is important for multiple reasons. Pleiotropy sheds insights in shared disease pathways, and might ultimately guide therapies used in one condition to be tested in others. Conversely, when contemplating ‘fixing’ mutations using genome editing approaches such as the CRISPR-Cas system, identification of pleiotropic associations help prevent unforeseen side-effects⁴⁸. The most striking example of pleiotropy in ALS concerns repeat expansions in the first intron of C9orf72. C9orf72 repeat expansions are associated with any manifestation along the ALS-FTD phenotype spectrum, and 7 C9orf72 repeat expansions are currently the most important genetic cause of familial ALS and FTD, accounting for approximately 34.2 and 25.9% of cases worldwide⁴⁹. It has been hypothesized that concurrent mutations in other ALS-genes can act as C9orf72 modifiers and drive the (motor neuron) phenotype, including intermediate repeat expansions in ATXN2 and long polyalanine repeat expansions in NIPA1 (Chapter 3)³,⁵. Interestingly, variation in these genes have also been associated with other traits than ALS; long polyglutamine expansions in ATXN2 associate with spinocerebellar ataxia (SCA) type 2⁵⁰ and mutations in NIPA1 associate with HSP type 6⁵¹. Examples of other genes associated with ALS and other traits are SMN1 (also associated with spinal muscular atrophy)⁵² and ANG (also associated with Parkinson’s disease)⁵³. Although historically pleiotropic variants were identified through candidate-gene approaches in phenotypically related traits, relatively novel analytic methods allow the estimation of genetic correlation between phenotypes through polygenic risk scoring based on (summary) SNP data ⁵⁴,⁵⁵. Using this method, a modest albeit significant overlap between ALS and schizophrenia was established⁵⁶, further underlining that pleiotropy is ubiquitous and important in the genetic architecture of ALS. Furthermore, these observations might point towards common (neurodegenerative) pathways⁵⁷. Therefore, new studies combining datasets of presumed genetically connected traits or traits with phenotypic overlap might provide novel pleiotropic insights in the future.

173 Chapter 7 | Discussion

Types of variation Over 99.9% of the variation in the human genome consists of single nucleotide variants and indels⁵⁸. It is therefore not surprising that the majority of the variation associated with ALS concerns these variant types, SNVs in particular. Nonsynonymous single- nucleotide variants confer an increased risk of ALS in the three high-risk ALS genes SOD1 (all exons), FUS (predominantly exons 14 and 15) and TARDBP (exon 6)⁵⁹-⁶¹. With the increasing popularity of GWAS, SNPs have attracted increasing attention in ALS. An important footnote for the disease-associated loci through GWAS (Chapter 4¹⁴) is that the associated locus is not necessarily disease-causing in itself; correlation is not causation. This is best illustrated through the discovery of the C9orf72 repeat expansion near the genome-wide significant SNPs rs2814707 and rs3849942⁹,⁶²,⁶³. In addition to the intronic repeat expansion in C9orf72, exonic repeat expansions have also been associated with ALS (i.e., PolyQ-expansions in ATXN1⁵⁴,⁶⁵ and ATXN2⁶⁶ and a PolyA- expansion in NIPA12). Repeat expansions might contribute to disease pathophysiology through various mechanisms, such as loss of functional protein, RNA-toxicity and aggregate formation through protein misfolding⁶⁷-⁶⁹. Until recently, reliable calling of pathogenic repeat expansions in next-generation sequencing data posed challenging, forcing scientists to rely on wet-lab methods. Although Southern blotting remains the gold standard for determination of (especially larger) pathogenic repeat sizes, the software tool ExpansionHunter proved successful in correctly identifying various repeat motifs in PCR-free WGS data⁷⁰. In Chapter 2, we therefore used ExpansionHunter to further extend the NIPA1 replication cohort, allowing a sufficient sample size for replication. Whether other variant types beside SNVs, VNTRs and indels (i.e., larger structural variants) contribute to disease onset, and thereby account for a part of the missing heritability of ALS, is not yet known, and the focus of ongoing studies.

Implications for patients and clinicians The most important question for ALS patients and clinicians alike is how new discoveries in ALS genetics translate from bench to bedside.

Pathophysiology Why, and maybe more importantly how, do neurons die in ALS? Dissecting the pathways that are involved in ALS is essential for a fundamental understanding of disease pathophysiology. As is true for many neurodegenerative diseases, ALS is characterized by the formation of cytoplasmic inclusions in degenerating motor neurons and surrounding oligodendrocytes⁷¹. In the majority of ALS patients, TDP-43 is the main constituent of these protein aggregates⁷². Despite development of multiple disease models (e.g., in rodents, zebrafish, flies, and more recently, induced pluripotent

174 stem cells (iPSCs)) a complete and fundamental understanding of the biological processes leading to the formation of these inclusions still lacking³⁷,⁷³. Translation of statistically significant genetic hits to biologically relevant disease mechanisms, and thereby possible novel therapeutic opportunities, is a complex process. Prioritization of variants tested in cell- or animal models is now often guided by statistical analyses and functional predictors. Multiple statistical methods have been developed to help determine (or fine-map) the genetic regions that are most likely to be functionally involved in disease development⁷⁴. With the increasing number of implicated ALS genes and loci, multiple toxic mechanisms that mediate the degeneration and death of (motor) neurons have been implicated. Probably all proposed mechanisms are interlinked, culminating in a final common disease pathway. Pathophysiological processes associated with ALS include glial dysfunction, hyperexcitability, vulnerability to oxidative stress, aberrant RNA metabolism, mitochondrial dysfunction and altered axonal transport⁶⁷,⁷⁵,⁷⁶. Interestingly, novel ALS-genes C21orf2 (Chapter 4) and NEK1 (Chapter 6) are interactors, and are involved in DNA damage repair and in formation of the primary cilium¹⁴,¹⁹,⁷⁷-⁷⁹. Further functional characterization of the protein products 7 of these genes is warranted to establish function and disease-specific dysfunction. Developments in the field of single-cell genomics increasingly allow the combination of genomics and biology at a single-cell resolution, possibly allowing more direct interpretation of genetic discovery in the future⁸⁰.

Prediction Genetic prediction models that provide reliable estimates on ALS risk, prognosis and therapy response are invaluable for patients, their caretakers and caregivers. The advances in our understanding of the relationship between gene variation, pathology and clinical phenotype have implications for genetic testing in the clinical (non- research) setting. The C9orf72 expansion is found sufficiently frequently that diagnostic testing is now considered in all patients diagnosed with ALS, even without a positive family history. This allows subsequent referral of the family members of carriers, and more broadly, families with a positive family for ALS and/or FTD, to a genetic cousellor75. The complexity of the interpretation of clinical genetic testing in ALS is illustrated by genetic observations as variable variant penetrance, oligogenicity, allelic heterogeneity and pleiotropy. Expert consensus dictates that all patients need to be informed about the pros (e.g., preimplantation screening, possible future therapeutic possibilities) and cons (e.g., disease alleles not actionable yet and uncertainties regarding genetic architecture) before making the decision to undergo testing⁸¹. Currently, preimplantation genetic testing (PGT) is now a widely applied procedure in genetic practices, and has also been performed in SOD1 and SETX-related ALS. PGT

175 Chapter 7 | Discussion

for highly penetrant variants related to young onset of disease is widely accepted, PGT for variants that are not fully penetrant, and are related to a late age of onset (which is typical in ALS) remains controversial and focus for ethical debate. Nevertheless, many patients regard knowledge as power, and the possibility of having an opportunity to establish a pregnancy free of mutation from the onset as favorable⁸². Interestingly, more than half of ALS specialists participating in a recent survey study would seek genetic testing if they had personally received a diagnosis of ALS⁸³.

In addition to counselling, genetic status also aid clinical prediction. Recently, scientists developed a model for prediction of survival (without tracheostomy or non- invasive ventilation for more than 23 h per day) in individual patients, with presence of a C9orf72 repeat expansion conferring one out of eight predictor variables⁸⁴. A survey of patients’ preferences with regard to knowing their personalized prognosis resulted in a majority (65.5%) interested in knowing their estimated survival⁸⁴, emphasizing the value of such clinical predictors for individual patients, in addition to scientific relevance. Other studies have identified genetic variants related to other clinical disease characteristics, such as age of onset. In a recent paper, Mehta et al. confirmed that that people with fALS have a younger age of onset compared to sALS patients by approximately five years, caused by Mendelian gene variants lowering the actual age of onset, rather than ascertainment bias resulting in quicker recognition of symptoms⁸⁵. Lastly, identification of genetic risk factors allows longitudinal studies of (asymptomatic) mutation carriers, for instance through structural MRI-scans, aiding in the development of solid non-invasive disease biomarkers⁸⁶.

With the increasing availability of large-scale genotyping and sequencing studies, the development of genome-wide genetic risk predictors attracted increasing attention. In a recently published landmark paper, genome-wide polygenic scores for five common diseases identified subgroups of patient with a largely increased disease risk (comparable to monogenic mutations) based on their genetic background⁸⁷. Identification of these high-risk individuals allows lifestyle adjustments (if possible) and subgroup analyses in studies and trials. Despite these promising developments, precise genetic prediction models based on polygenic risk scores are presently still limited by complex disease architectures that encompass variants of tiny effect, rare variants that cannot be fully enumerated and complex epistatic interactions, as well as many non-genetic factors⁸⁸.

176 Treatment The clinical implications of genetics will be greatest when the results of genetic testing are actionable⁸⁹. Unfortunately, effective drugs with large effects on disease progression are sparse in neurodegenerative diseases⁵⁷. Although it is plausible that ALS patients with different genetic background respond differently to therapies, genetic stratification of trial participants has not been standard. A recent study conducting post-hoc analyses on clinical trials evaluating the efficacy of lithium carbonate in ALS showed that a genetic subgroup of patients (UNC13A C/C genotype) may benefit from this drug⁹⁰. The original studies failed to pick-up on this signal, probably due to the small size of this genetic subgroup. The concept of precision therapy based on genetic profiles is already widely implemented in oncology (i.e., treatment based on tumor profiles), and the Van Eijk et al. study shows that patient genotyping might prove beneficial for the identification of responder subgroups in neurogenerative diseases as well. This paper paves the way for the design of genetically informed clinical trials, that include standard genotyping in order to optimize randomization and analysis⁹⁰. 7 In the past years, CRISPR-Cas9 technology increasingly allowed precise editing of genomic DNA, also attracting increasing attention in ALS research. Gaj et al. demonstrated that modification of transgenic mice that carried multiple copies of a human G93A SOD1 through CRISPR-Cas9 genome editing resulted in enhanced motor function, diminished muscle atrophy, and more importantly, delayed disease onset resulting in a prolonged overall survival⁹¹. Extrapolation of these results to treatment of ALS patients is not straightforward, and the focus of ongoing studies⁹²,⁹³.

Probably the most thrilling development in the field of neuropharmacology is the recent approval of nusinersen (Spinraza), an antisense oligonucleotide (ASO) drug, for the treatment of spinal muscular atrophy⁹⁴. SMA is the most frequent genetic cause of death in children, and caused by mutations in SMN1. Nusinersen is administered to the central nervous system through lumbar punctures, where it binds to SMN2 pre-mRNA to direct alternative splicing, ultimately increasing the amount of SMN protein. This in turn slows disease progression, and allows a subset of children to reach milestones that were otherwise impossible⁹⁵. In recent years, multiple ASO studies in SOD1 and C9orf72 models showed encouraging results in preclinical studies, and clinical trials are currently being undertaken⁹⁶. Results of these promising therapies are anxiously awaited by ALS specialists and patients. Hypothetically, ASO therapy might guide an ultimate treatment goal in ALS: treatment of asymptomatic gene carriers to prevent the onset of disease⁹⁷.

177 Chapter 7 | Discussion

Future directions This thesis has explored the genetic underpinnings of ALS, ultimately aiming to contribute to finding a cure for ALS. This last section provides contemplations on the proceedings of this extremely important journey.

Reinforcing open access science Idealism and principle form the foundation of open access science. Although much progress has been made over the last decades, not all aspects of science are truly open access yet. The current academic system pushes investigators to claim that their work is highly novel and significant, making rigorous replication attempts less awarding. Since false-positive signals hinder biological understanding of diseases and hamper adequate allocation of research funds, it is important to the scientific community to distinguish non-pathogenic variants from disease-causing variants⁹⁸. Therefore, performance of high-quality replication studies and publication of negative result remain paramount⁹⁹. Additionally, researchers should ideally make their individual level data (if informed consent and regulations, including EU General Data Protection Regulation, permit), summary data, and source code publicly available to promote reproducibility, stimulate (international) collaborations and allow multi- cohort analyses. Inclusiveness of minorities, and equal payment for all sexes should be standard practice. Scientific studies, and this is also true for ALS studies, are often funded through large fundraising campaigns. For this publicly-funded research to be truly open access, publications need to be accessible to anyone, anytime, at any location. Many papers are still published in journals with restricted accessibility¹⁰⁰. The increasing popularity of preprint servers as BioRxiv is illustrative of the increasing cry to bypass (or even boycott) journals without a so-called green (i.e., allowing the author to archive the article in a repository) open access model. Consequently, journals are experiencing increasing trouble to find suitable reviewers, since often the perk of previewing exciting new research has subsided. In a recent publication, a group of national funders, joined by the European Commission and the European Research Council, announced Plan S to be in effect onwards from January 1 2020. Plan S states that research funders will mandate that access to research publications that are generated through research grants that they allocate, must be fully and immediately open and cannot be monetized in any way¹⁰¹. Lastly, in a rapidly developing field such as genetics, it is often difficult for patients, clinicians and junior scientists to grasp basic (let alone more intricate) concepts and participate in the scientific debate. To expand open access science to the non-scientific community takes transparency, vulnerability and willingness to explain complex processes in layman’s terms.

178 Project MinE Although genetic discovery in ALS skyrocketed in the last decade, much about ALS genetics is still unknown. International collaborations are warranted to enable genetic studies of sufficient size to pursue the missing pieces of the puzzle. Project MinE, founded on World ALS day (i.e., June 21) 2013, is a large-scale international project aiming to map full DNA profiles of at least 15,000 ALS patients and 7,500 controls. The project leverages international collaboration and recent developments in sequencing technologies, allowing exploration of the full spectrum of genetic variation in samples collected worldwide¹⁰². This project of unprecedented scale and collaboration invites ALS research groups across the world to participate. One important implication of this set-up is the inclusion of samples from non-European ancestry populations. It has been clear that ALS epidemiology differs based on ancestral origin⁷³. Since the overwhelming majority of genetic studies in ALS have been executed using samples from ancestral European populations, analyses of non-European ancestry genomes might yield novel disease variants and pathways. A recent meta-analysis published by Zhou et al. confirmed that the genetic architecture of ALS in Asian populations is 7 distinct in that from European populations¹⁰³, further underlining the importance of

Picture 7.1 Batch of Project MinE DNA samples – 1920 international genomic DNA samples ready for whole genome sequencing and (methylation) genotyping through Project MinE.

179 Chapter 7 | Discussion

the inclusive nature of Project Mine. Another important strength of Project MinE is the opportunity to look at rare variation at high resolution, and to look beyond the linear genome. Project MinE is the first large-scale study to allow reliable interrogation of structural variants contributing to ALS disease onset. Also, Project MinE includes the first international large-scale study on ALS epigenetics. Epigenetic mechanisms allow organisms to respond to the environment through changes in gene expression¹⁰⁴. Several studies showed that smoking provides a characteristic DNA methylation signature, even years after smoking cessation¹⁰⁵-¹⁰⁷. This might be particularly interesting in ALS, since smoking is the only undebated environmental risk factor for ALS¹⁰⁸. Also, DNA methylation patterns have a high correlation with age in healthy individuals, and many (neurodegenerative) diseases show signs of accelerated aging¹⁰⁹,¹¹⁰. Ergo, epigenetic effects might be one of the mechanisms through which environmental factors predispose for disease, and therefore possibly provide a bridge between genomic and environmental risk factors. Albeit with several substantial methodological challenges, methods such as Mendelian randomization (MR) can subsequently be applied to strengthen causal inference¹¹¹,¹¹². DNA methylation profiles of thousands of cases and controls are collected through Project MinE, allowing the execution of the largest epigenome-wide association study (EWAS) in ALS to date. Lastly, Project MinE is a strong advocate for open access science, enforcing broad, transparent and responsible data sharing. Summary statistics of published work is freely downloadable through the Project MinE website (https://www.projectmine.com), and individual level data on request. To enable optimal use of the enormous amount of data generated through Project MinE, several working groups are formed, each focussing on a specific topic: 1) Phenotyping, 2) Analysis and association testing, 3) Epigenetics, 4) Data infrastructure and 5) Genomic Structural Variation. The first papers have already been published through the Project MinE Sequencing Consortium²²,⁵⁶,¹⁰²,¹¹³, and many more to follow. Project MinE, make it yours.

180 MAIN CONCLUSIONS OF THIS THESIS

• NIPA1 repeat expansions increase risk of ALS. • Rare nonsynonymous variants in NEK1 increase risk of ALS. • Rare nonsynonymous variants in C21orf2 increase risk of ALS. • Variants in MOBP and SCFD1 are associated with ALS. • A disproportionate large part of the hiding heritability of ALS is captured in low frequency variants yet to be discovered. • No rare variants of large effect present on the Illumina Exome Chip increase risk of ALS. • Strong evidence suggests that ALS is rather an oligogenic than a polygenic disease.

7

181 Chapter 7 | Discussion

REFERENCES

1. Blauw, H. M. et al. A large genome scan for rare CNVs in amyotrophic lateral sclerosis. Human Molecular Genetics 19, 4091–4099 (2010). 2. Blauw, H. M. et al. NIPA1 polyalanine repeat expansions are associated with amyotrophic lateral sclerosis. Human Molecular Genetics 21, 2497–2502 (2012). 3. van Blitterswijk, M. et al. Ataxin-2 as potential disease modifier in C9orf72 expansion carriers. Neurobiology of Aging 35, 2421.e13–2421.e17 (2014). 4. Tazelaar, G. H. P. et al. Association of NIPA1 repeat expansions with amyotrophic lateral sclerosis in a large international cohort. Neurobiology of Aging 1–7 (2018). doi:10.1016/j.neurobiolaging.2018.09.012 5. Dekker, A. M. et al. Large-scale screening in sporadic amyotrophic lateral sclerosis identifies genetic modifiers in C9orf72 repeat carriers. Neurobiology of Aging 1–7 (2016). doi:10.1016/j. neurobiolaging.2015.12.012 6. van Blitterswijk, M. et al. VAPB and C9orf72 mutations in 1 familial amyotrophic lateral sclerosis patient. Neurobiology of Aging 33, 2950.e1–2950.e4 (2012). 7. Hand, C. K. et al. Compound heterozygous D90A and D96N SOD1 mutations in a recessive amyotrophic lateral sclerosis family. Ann Neurol. 49, 267–271 (2001). 8. van Blitterswijk, M. et al. Evidence for an oligogenic basis of amyotrophic lateral sclerosis. Human Molecular Genetics 21, 3776–3784 (2012). 9. van Es, M. A. et al. Genome-wide association study identifies 19p13.3 (UNC13A) and 9p21.2 as susceptibility loci for sporadic amyotrophic lateral sclerosis. Nat Genet 41, 1083–1087 (2009). 10. Shatunov, A. et al. Chromosome 9p21 in sporadic amyotrophic lateral sclerosis in the UK and seven other countries: a genome-wide association study. The Lancet Neurology 9, 986–994 (2010). 11. Fogh, I. et al. A genome-wide association meta-analysis identifies a novel locus at 17q11.2 associated with sporadic amyotrophic lateral sclerosis. Human Molecular Genetics 23, 2220–2231 (2014). 12. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. 511, 421–427 (2014). 13. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet 46, 1173–1186 (2014). 14. van Rheenen, W. et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat Genet 1–8 (2016). doi:10.1038/ng.3622 15. Höglinger, G. U. et al. Identification of common variants influencing risk of the tauopathy progressive supranuclear palsy. Nat Genet 43, 699–705 (2011). 16. Irwin, D. J. et al. Myelin oligodendrocyte basic protein and prognosis in behavioral-variant frontotemporal dementia. Neurology 83, 502–509 (2014). 17. Al-Chalabi, A. et al. An estimate of amyotrophic lateral sclerosis heritability using twin data. Journal of Neurology, Neurosurgery & Psychiatry 81, 1324–1326 (2010). 18. Loohuis, L. M. O. et al. Genome-wide burden of deleterious coding variants increased in schizophrenia. Nature Communications 6, 1–6 (2015). 19. Kenna, K. P. et al. NEK1 variants confer susceptibility to amyotrophic lateral sclerosis. Nat Genet 1–8 (2016). doi:10.1038/ng.3626 20. Cirulli, E. T. et al. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways. Science 347, 1436–1441 (2015).

182 21. Brenner, D. et al. NEK1 mutations in familial amyotrophic lateral sclerosis. Brain 139, e28–e28 (2016). 22. Nicolas, A. et al. Genome-wide Analyses Identify KIF5A as a Novel ALS Gene. Neuron 97, 1268–1282.e6 (2018). 23. Auer, P. L. Rare variant association studies: considerations, challenges and opportunities. Genome Med 7, 1–11 (2015). 24. Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat Rev Genet 135, 1–10 (2018). 25. Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era — concepts and misconceptions. Nat Rev Genet 9, 255–266 (2008). 26. Mayhew, A. J. & Meyre, D. Assessing the Heritability of Complex Traits in Humans: Methodological Challenges and Opportunities. Curr. Genomics 18, 332–340 (2017). 27. Keller, M. F. et al. Genome-Wide Analysis of the Heritability of Amyotrophic Lateral Sclerosis. JAMA Neurol 71, 1123–11 (2014). 28. Al-Chalabi, A. & Visscher, P. M. Motor neuron disease: Common genetic variants and the heritability of ALS. Nature Reviews Neurology 10, 549–550 (2014). 29. McLaughlin, R. L., Vajda, A. & Hardiman, O. Heritability of Amyotrophic Lateral Sclerosis. JAMA Neurol 72, 857–2 (2015). 7 30. Browning, S. R. & Browning, B. L. Population Structure Can Inflate SNP- Based Heritability Estimates. The American Journal of Human Genetics 89, 191–193 (2011). 31. Goddard, M. E., Lee, S. H., Yang, J., Wray, N. R. & Visscher, P. M. Response to Browning and Browning. The American Journal of Human Genetics 89, 193–195 (2011). 32. Yang, J., Zeng, J., Goddard, M. E., Wray, N. R. & Visscher, P. M. Concepts, estimation and interpretation of SNP-based heritability. Nat Genet 49, 1304–1310 (2017). 33. Manolio, T. A. et al. Finding the missing heritability of complex diseases. 461, 747–753 (2009). 34. Witte, J. S., Visscher, P. M. & Wray, N. R. The contribution of genetic variants to disease depends on the ruler. Nat Rev Genet 15, 765–776 (2014). 35. Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Publishing Group 11, 446–450 (2010). 36. Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 19, 110–124 (2017). 37. Van Damme, P., Robberecht, W. & Van Den Bosch, L. Modelling amyotrophic lateral sclerosis: progress and possibilities. Dis. Model. Mech. 10, 537–549 (2017). 38. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 (2017). 39. Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common Disease Is More Complex Than Implied by the Core Gene Omnigenic Model. Cell 173, 1573–1580 (2018). 40. Luigetti, M. et al. SOD1 G93D sporadic amyotrophic lateral sclerosis (SALS) patient with rapid progression and concomitant novel ANG variant. Neurobiology of Aging 32, 1924.e15–1924.e18 (2011). 41. Cooper-Knock, J. et al. Targeted Genetic Screen in Amyotrophic Lateral Sclerosis Reveals Novel Genetic Variants with Synergistic Effect on Clinical Phenotype. Front. Mol. Neurosci. 10, 146–11 (2017). 42. Cady, J. et al. Amyotrophic lateral sclerosis onset is influenced by the burden of rare variants in known amyotrophic lateral sclerosis genes. Ann Neurol. 77, 100–113 (2014).

183 Chapter 7 | Discussion

43. Morgan, S. et al. A comprehensive analysis of rare genetic variation in amyotrophic lateral sclerosis in the UK. Brain 1–8 (2017). doi:10.1093/brain/awx082 44. Hu, Z., Wang, Z. & Xu, S. An infinitesimal model for quantitative trait genomic value prediction. PLoS ONE 7, e41336 (2012). 45. Al-Chalabi, A., Calvo, A., Chio, A., Colville, S. & Ellis, C. M. Analysis of amyotrophic lateral sclerosis as a multistep process: a population-based modelling study. The Lancet 13, 1108–1113 (2014). 46. van Es, M. A. et al. Amyotrophic lateral sclerosis. The Lancet 0, (2017). 47. Chiò, A. et al. The multistep hypothesis of ALS revisited: The role of genetic mutations. Neurology 91, e635–e642 (2018). 48. Gratten, J. & Visscher, P. M. Genetic pleiotropy in complex traits and diseases: implications for genomic medicine. Genome Med 1–3 (2016). doi:10.1186/s13073-016-0332-x 49. van Blitterswijk, M., DeJesus-Hernandez, M. & Rademakers, R. How do C9orf72 repeat expansions cause amyotrophic lateral sclerosis and frontotemporal dementia. Current Opinion in Neurology 25, 689–700 (2012). 50. Imbert, G. et al. Cloning of the gene for spinocerebellar ataxia 2 reveals a locus with high sensitivity to expanded CAG/glutamine repeats. Nat Genet 14, 285–291 (1996). 51. Rainier, S., Chai, J.-H., Tokarz, D., Nicholls, R. D. & Fink, J. K. NIPA1 Gene Mutations Cause Autosomal Dominant Hereditary Spastic Paraplegia (SPG6). The American Journal of Human Genetics 73, 967– 971 (2003). 52. Wirth, B. An update of the mutation spectrum of the survival motor neuron gene (SMN1) in autosomal recessive spinal muscular atrophy (SMA). Hum. Mutat. 15, 228–237 (2000). 53. van Es, M. A. et al. Angiogenin variants in Parkinson disease and amyotrophic lateral sclerosis. Ann Neurol. 70, 964–973 (2011). 54. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nature Publishing Group 47, 1236–1241 (2015). 55. Hackinger, S. & Zeggini, E. Statistical methods to detect pleiotropy in human complex traits. Open Biol. 7, 170125–13 (2017). 56. McLaughlin, R. L. et al. Genetic correlation between amyotrophic lateral sclerosis and schizophrenia. Nature Communications 8, 14774 (2017). 57. Gan, L., Cookson, M. R., Petrucelli, L. & Spada, A. R. Converging pathways in neurodegeneration, from genetics to mechanisms. Nat. Neurosci. 1–10 (2018). doi:10.1038/s41593-018-0237-7 58. 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). 59. Veldink, J. H. ALS genetic epidemiology ‘How simplex is the genetic epidemiology of ALS?’. Journal of Neurology, Neurosurgery & Psychiatry jnnp–2016–315469–1 (2017). doi:10.1136/jnnp-2016-315469 60. Lattante, S., Rouleau, G. A. & Kabashi, E. TARDBP and FUS mutations associated with amyotrophic lateral sclerosis: summary and update. Hum. Mutat. 34, 812–826 (2013). 61. Shaw, C. E. et al. Mutations in all five exons of SOD-1 may cause ALS. Ann Neurol. 43, 390–394 (1998). 62. DeJesus-Hernandez, M. et al. Expanded GGGGCC Hexanucleotide Repeat in Noncoding Region of C9orf72 Causes Chromosome 9p-Linked FTD and ALS. Neuron 72, 245–256 (2011). 63. Renton, A. E. et al. A Hexanucleotide Repeat Expansion in C9orf72 Is the Cause of Chromosome 9p21- Linked ALS-FTD. Neuron 72, 257–268 (2011). 64. Conforti, F. L. et al. Ataxin-1 and ataxin-2 intermediate-length PolyQ expansions in amyotrophic lateral sclerosis. Neurology 79, 2315–2320 (2012).

184 65. Lattante, S. et al. ATXN1 intermediate-length polyglutamine expansions are associated with amyotrophic lateral sclerosis. Neurobiology of Aging 64, 157.e1–157.e5 (2018). 66. Sproviero, W. et al. ATXN2 trinucleotide repeat length correlates with risk of ALS. Neurobiology of Aging 51, 178.e1–178.e9 (2017). 67. Balendra, R. & Isaacs, A. M. C9orf72-mediated ALS and FTD: multiple pathways to disease. Nature Reviews Neurology 1–15 (2018). doi:10.1038/s41582-018-0047-2 68. Messaed, C. & Rouleau, G. A. Molecular mechanisms underlying polyalanine diseases. Neurobiology of Disease 34, 397–405 (2009). 69. Kozlowski, P., de Mezer, M. & Krzyzosiak, W. J. Trinucleotide repeats in human genome and exome. Nucleic Acids Res 38, 4027–4039 (2010). 70. Dolzhenko, E. et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Research 27, 1895–1903 (2017). 71. Blokhuis, A. M., Groen, E. J. N., Koppers, M., van den Berg MD, P. L. H. & Pasterkamp, R. J. Protein aggregation in amyotrophic lateral sclerosis. Acta Neuropathol 125, 777–794 (2013). 72. Neumann, M. et al. Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Science 314, 130–133 (2006). 73. Hardiman, O. et al. Amyotrophic lateral sclerosis. Nat. Rev. Dis. Primers 3, 17071–19 (2017). 74. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by 7 statistical fine-mapping. Nat Rev Genet 360, 1–14 (2018). 75. Al-Chalabi, A. et al. The genetics and neuropathology of amyotrophic lateral sclerosis. Acta Neuropathol 124, 339–352 (2012). 76. Taylor, J. P., Brown, R. H. & Cleveland, D. W. Decoding ALS: from genes to mechanism. Nature 539, 197–206 (2016). 77. Fang, X. et al. The NEK1 interactor, C21orf2, is required for efficient DNA damage repair. Acta Biochim Biophys Sin 47, 834–841 (2015). 78. Patil, M., Pabla, N., Ding, H.-F. & Dong, Z. NEK1 interacts with Ku80 to assist chromatin loading of replication factors and S-phase progression. Cell Cycle 12, 2608–2616 (2014). 79. Shalom, O., Shalva, N., Altschuler, Y. & Motro, B. The mammalian NEK1 kinase is involved in primary cilium formation. FEBS Letters 582, 1465–1470 (2008). 80. Linnarsson, S. & Teichmann, S. A. Single-cell genomics: coming of age. Genome Biol. 1–3 (2016). doi:10.1186/s13059-016-0960-x 81. Turner, M. R. et al. Genetic screening in sporadic ALS and FTD. Journal of Neurology, Neurosurgery & Psychiatry 88, 1042–1044 (2017). 82. Kuliev, A. & Rechitsky, S. Preimplantation genetic testing: current challenges and future prospects. Expert Review of Molecular Diagnostics 17, 1071–1088 (2017). 83. Vajda, A. et al. Genetic testing in ALS: A survey of current practices. Neurology 88, 991–999 (2017). 84. Westeneng, H.-J. et al. Prognosis for patients with amyotrophic lateral sclerosis: development and validation of a personalised prediction model. The Lancet Neurology 17, 423–433 (2018). 85. Mehta, P. R. et al. Younger age of onset in familial amyotrophic lateral sclerosis is a result of pathogenic gene variants, rather than ascertainment bias. Journal of Neurology, Neurosurgery & Psychiatry jnnp– 2018–319089–4 (2018). doi:10.1136/jnnp-2018-319089 86. Walhout, R. et al. Brain morphologic changes in asymptomatic C9orf72 repeat expansion carriers. Neurology 85, 1780–1788 (2015).

185 Chapter 7 | Discussion

87. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 470, 1–9 (2018). 88. Lander, E. S. Initial impact of the sequencing of the human genome. Nature 470, 187–197 (2011). 89. Lander, E. S. Cutting the Gordian Helix — Regulating Genomic Testing in the Era of Precision Medicine. N Engl J Med 372, 1185–1186 (2015). 90. van Eijk, R. P. A. et al. Meta-analysis of pharmacogenetic interactions in amyotrophic lateral sclerosis clinical trials. Neurology 89, 1915–1922 (2017). 91. Gaj, T. et al. In vivo genome editing improves motor function and extends survival in a mouse model of ALS. Sci Adv 3, eaar3952 (2017). 92. Kruminis-Kaszkiel, E., Juranek, J., Maksymowicz, W. & Wojtkiewicz, J. CRISPR/Cas9 Technology as an Emerging Tool for Targeting Amyotrophic Lateral Sclerosis (ALS). IJMS 19, 906–13 (2018). 93. Al-Chalabi, A. & Brown, R. H. Finding a Treatment for ALS - Will Gene Editing Cut It? N Engl J Med 378, 1454–1456 (2018). 94. Finkel, R. S. et al. Treatment of infantile-onset spinal muscular atrophy with nusinersen: a phase 2, open- label, dose-escalation study. Lancet 388, 3017–3026 (2016). 95. Corey, D. R. Nusinersen, an antisense oligonucleotide drug for spinal muscular atrophy. Nat. Neurosci. 20, 497–499 (2017). 96. Ly, C. V. & Miller, T. M. Emerging antisense oligonucleotide and viral therapies for amyotrophic lateral sclerosis. Current Opinion in Neurology 1–7 (2018). doi:10.1097/WCO.0000000000000594 97. Talbot, K. Amyotrophic lateral sclerosis: the complex path to precision medicine. J Neurol 265, 2454–2462 (2018). 98. MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. 508, 469–476 (2014). 99. Hirschhorn, J. N. & Altshuler, D. Once and Again—Issues Surrounding Replication in Genetic Association Studies. The Journal of Clinical Endocrinology & Metabolism 87, 4438–4441 (2002). 100. Ioannidis, J. P. A. How to Make More Published Research True. PLoS Med 11, e1001747–6 (2014). 101. Schiltz, M. Science without publication paywalls: cOAlition S for the realisation of full and immediate Open Access. PLoS Biol 16, e3000031–4 (2018). 102. Project MinE ALS Sequencing Consortium. Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis. Eur. J. Hum. Genet. 7, 1–10 (2018). 103. Zou, Z.-Y. et al. Genetic epidemiology of amyotrophic lateral sclerosis: a systematic review and meta- analysis. Journal of Neurology, Neurosurgery & Psychiatry 88, 540–549 (2017). 104. Jaenisch, R. & Bird, A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet 33, 245–254 (2003). 105. Breitling, L. P., Yang, R., Korn, B., Burwinkel, B. & Brenner, H. Tobacco- smoking-related differential DNA methylation: 27K discovery and replication. Am. J. Hum. Genet. 88, 450–457 (2011). 106. Gao, X., Jia, M., Zhang, Y., Breitling, L. P. & Brenner, H. DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies. Clinical Epigenetics 7, 1–10 (2015). 107. Bauer, M. et al. A varying T cell subtype explains apparent tobacco smoking induced single CpG hypomethylation in whole blood. Clinical Epigenetics 7, 1–11 (2015). 108. Armon, C. Smoking may be considered an established risk factor for sporadic ALS. Neurology 73, 1693– 1698 (2009).

186 109. Hannum, G. et al. Genome-wide Methylation Profiles Reveal Quantitative Views of Human Aging Rates. Molecular Cell 49, 359–367 (2013). 110. Horvath, S. et al. An epigenetic clock analysis of race/ethnicity, sex, and coronary heart disease. Genome Biol. 17, 1–23 (2016). 111. Relton, C. L. & Davey Smith, G. Mendelian randomization: applications and limitations in epigenetic studies. Epigenomics 7, 1239–1243 (2015). 112. Dekkers, K. F. et al. Blood lipids influence DNA methylation in circulating cells. Genome Biol. 17, 1–12 (2016). 113. Project MinE ALS Sequencing Consortium. CHCHD10 variants in amyotrophic lateral sclerosis: Where is the evidence? Ann Neurol. 84, 110–116 (2018).

7

187 188 Appendices

Abbreviations Nederlandse samenvatting Acknowledgements Publications About the author

+

189 Appendices | Abbreviations

190 ABBREVIATIONS

Abbreviations used in the Introduction and Discussion of this thesis.

ALS Amyotrophic lateral sclerosis ASO Antisense oligonucleotide CI Confidence interval CNV Copy number variant DNA Deoxyribonucleic acid EWAS Epigenome-wide association study fALS Familial amyotrophic lateral sclerosis FTD Frontotemporal dementia GCTA Genome-wide complex trait analysis GREML Genome-based restricted maximum likelihood GWAS Genome-wide association study HSP Hereditary spastic paraplegia kb Kilobase Indels Insertions and deletions + iPSCs Induced pluripotent stem cells LD Linkage disequilibrium MAF Minor allele frequency NGS Next-generation sequencing PGT Preimplantation genetic testing PMA Progressive muscular atrophy PLS Primary lateral sclerosis PSP Progressive supranuclear palsy sALS Sporadic amyotrophic lateral sclerosis SCA Spinocerebellar ataxia SNP Single nucleotide polymorphism SNV Single nucleotide variant SV Structural variant VNTRs Variable number tandem repeats WGS Whole genome sequencing WES Whole exome sequencing

191 Appendices | Nederlandse samenvatting

NEDERLANDSE SAMENVATTING

Amyotrofische laterale sclerose (ALS) is een dodelijke neurodegeneratieve ziekte, waarbij (met name) motorische zenuwcellen (neuronen) worden aangetast in de hersenen, hersenstam en het ruggenmerg. Er ontstaat toenemende spierzwakte met afname van spiermassa en spasticiteit, wat leidt tot problemen met bewegen, spreken, slikken en ademhalen. Deze symptomen beginnen in één deel van het lichaam en verspreiden zich geleidelijk, waarbij het ziektebeloop sterk wisselt tussen patiënten. Bij een gedeelte van de patiënten ontstaan tijdens de ziekte ook problemen met het denken en het gedrag. Deze heterogeniteit zorgt ervoor dat de diagnose soms lastig te stellen is, en patiënten worden vaak verwezen naar gespecialiseerde ziekenhuizen voor de definitieve diagnose. De meeste patiënten met ALS overlijden binnen drie tot vijf jaar na het ontstaan van de eerste symptomen, vaak ten gevolge van uitval van de ademhalingsspieren (respiratoire insufficiëntie). De diagnose wordt per jaar bij ongeveer 500 Nederlanders gesteld. Door de slechte prognose leven op dit moment gemiddeld 1,500 mensen met ALS in Nederland. De exacte oorzaak van ALS is nog niet opgehelderd. Er is één medicijn wat het ziektebeloop wat kan vertragen, en er is geen behandeling die de ziekte kan voorkomen of genezen.

ALS is een complexe ziekte, wat betekent dat zowel genetische als omgevingsfactoren een rol spelen bij het ontstaan van de ziekte. Deze genetische bijdrage wordt op basis van tweelingstudies geschat op ongeveer 60%. Bij ongeveer 5-10% van de patiënten komt de ziekte voor bij naaste familieleden; van oudsher wordt er bij deze patiënten gesproken van familiaire ALS (fALS). Bij patiënten met sporadische ALS (sALS) is er geen sprake van een familiaire ziekte. Tegenwoordig wordt het onderscheid tussen fALS en sALS als kunstmatig beschouwd, aangezien veel factoren het al dan niet familiair voorkomen van ziekten kunnen beïnvloeden. Genetische afwijkingen die voorkomen bij fALS worden daarom ook gevonden bij patiënten met sALS, echter met een lagere frequentie.

In 1990, ongeveer 50 jaar na de eerste beschrijving van familiaire ALS, werd het eerste ALS-gen geïdentificeerd, SOD1. Parallel aan de snelle ontwikkelingen binnen de humane genetica in het algemeen, zijn sindsdien de ontwikkelingen binnen het veld van de genetica van ALS razendsnel gegaan. Tussen 1990 en 2013, het jaar waarin het onderzoek beschreven in dit proefschrift werd gestart, zijn genafwijkingen (ook wel mutaties of polymorfismen) in meer dan 20 genen ontdekt die een verhoogd risico geven op ALS. Deze genetische risicofactoren verklaren samen ongeveer 70% van alle gevallen met fALS (voornamelijk door afwijkingen in hoog-risico genen als SOD1,

192 TARDBP, FUS en C9orf72), maar slechts 11% van de gevallen met sALS van Europese afkomst. Ondanks alle ontdekkingen ontbreekt tot op heden een fundamenteel begrip van de genetische risicofactoren van ALS, en hoe deze bijdragen aan de ontwikkeling van de ziekte. Middels verschillende methoden tracht dit proefschrift daarom verder inzicht te scheppen in deze genetische achtergrond van ALS, met als uiteindelijk doel 1) het helpen ontrafelen van het onderliggende ziekteproces, 2) het ondersteunen van accuratere predictiemodellen en 3) het helpen ontwikkelen van nieuwe (gepersonaliseerde) behandelingen.

Kandidaatgen studies Kandidaatgen studies zijn genetische studies waarbij getest wordt of varianten in bepaalde genen van interesse frequenter voorkomen bij patiënten dan bij mensen zonder deze aandoening (controles). De keuze voor de genen die onderzocht worden is hypothese gedreven, wat betekent dat deze genen worden uitgekozen op basis van eerdere kennis of een bepaalde veronderstelling. Het nadeel van dit soort onderzoek is dat het lastig kan zijn de resultaten te repliceren, wat kan leiden tot fout-positieve publicaties. + In een eerdere kandidaatgen studie uit 2010 werd een relatie gevonden tussen een zogenaamde repeat-expansie (waarbij een vaste sequentie van DNA-bouwstenen een aantal malen herhaald wordt) in het gen NIPA1 en een verhoogd risico op ALS. Sinds de publicatie is deze bevinding nooit gerepliceerd. Hoofdstuk 2 beschrijft de eerste replicatiestudie van de associatie tussen deze NIPA1 repeat-expansie en het risico op ALS. In een nieuw cohort van 2,887 Nederlanders vonden wij de repeat- expansie significant vaker bij ALS patiënten dan bij gezonde controles (5.5% en 3.7%, respectievelijk), wat de relatie tussen de NIPA1 repeat-expansie en ALS in de Nederlandse populatie bevestigt. Vervolgens hebben we de data uit de eerdere studie uit 2010 gecombineerd met onze nieuwe Nederlandse data en NIPA1 repeat-expansie data van een grote groep internationale patiënten. Hieruit blijkt dat in een groot internationaal cohort NIPA1 repeat-expansies inderdaad geassocieerd zijn met een verhoogd risico op ALS. In tegenstelling tot de eerdere studie vonden wij geen effect van NIPA1 repeat-expansies op overlevingsduur of leeftijd van ontstaan van de ziekte. Wij konden deze relatie echter slechts testen in minder dan 50% van de totale groep patiënten, waardoor we niet met zekerheid kunnen bevestigen of uitsluiten of er een relatie is tussen NIPA1 repeat-expansies en overleving of leeftijd van ontstaan van ALS.

Door de jaren heen zijn een aantal families met ALS beschreven die drager zijn van mutaties in meerdere ALS-genen. Deze observatie zou een (gedeeltelijke) verklaring

193 Appendices| Nederlandse samenvatting

kunnen bieden voor de verschillen in ziektebeloop tussen ALS patiënten, en het wel of niet ontwikkelen van de ziekte bij dragerschap van bepaalde genetische afwijkingen. In hoofdstuk 3 hebben we daarom onderzocht of mutaties in 10 bekende ALS-genen vaker samen voorkomen dan verwacht op basis van kans bij patiënten met sporadische ALS. Alhoewel wij een hoger percentage dubbelmutaties vonden bij ALS patiënten dan bij controles (4.1% om 1.3%), was dit verschil niet significant. Bij ALS patiënten die drager zijn van een C9orf72 repeat-expansie (de meest voorkomende genetische afwijking bij ALS, welke een sterk verhoogd risico geeft op het ontwikkelen van de ziekte) vonden wij dat zij significant vaker dan verwacht drager zijn van een afwijking in een ander ALS-gen (repeat-expansie in ATXN2, NIPA1 of SMN1). Hiervan is de combinatie van C9orf72 en NIPA1 repeat-expansies het meest frequent. Dit zou kunnen betekenen dat de aan- of afwezigheid van een NIPA1 repeat-expansie bijdraagt aan de verschillen in ziektebeloop bij patiënten die drager zijn van een C9orf72 repeat-expansie. In hoofdstuk 2 hebben we getracht deze observatie te reproduceren, waar wij opnieuw vonden dat C9orf72 en NIPA1 repeat-expansies vaker voorkomen dan verwacht, echter minder vaak dan in de studie beschreven in hoofdstuk 3. Hier zijn meerdere mogelijke verklaringen voor, zoals een initieel fout-positief signaal, of dat het signaal niet in alle populaties wordt gezien. Het feit dat dubbelmutaties voorkomen in zowel familiare als sporadische ALS zou kunnen wijzen op een oligogenetisch ziektemodel, waarbij de combinatie van enkele mutaties van invloed is op de uiteindelijke ziektepresentatie.

De kandidaatgen studies in dit proefschrift bevestigen de noodzaak voor replicatiestudies, en het in kaart brengen van volledige klinische karakteristieken bij (genetische) studies.

Genoomwijde associatie studies Genoomwijde associatie studies (GWAS) zijn studies waarbij de aan- of afwezigheid van genmutaties op een groot aantal plekken in het DNA in kaart gebracht wordt. Vervolgens wordt per mutatie de frequentie vergeleken tussen patiënten en controles. Op deze manier kunnen nieuwe genetische risicofactoren worden ontdekt, waarbij de geassocieerde variant direct een verhoogd risico kan geven op de ziekte, óf een marker is voor een variant in de omgeving die zelf niet direct getest is in de studie. Dit onderscheid tussen directe en indirecte associatie, respectievelijk, kan op basis van genetische analyses lastig zijn.

Voorafgaand aan dit proefschrift zijn drie GWAS uitgevoerd in ALS, welke samen drie genlocaties (of loci) vonden gerelateerd aan de ziekte. In hoofdstuk 4 wordt één van de grootste GWAS in ALS tot op heden beschreven, waarbij in totaal 12,577 patiënten

194 en 23,475 controles werden geïncludeerd. Naast de bekende loci identificeerden wij in dit onderzoek drie nieuwe loci, C21orf2, MOBP en SCFD1. Bij aanvullende analyses werden meerdere mutaties gevonden in C21orf2 met een verhoogd risico op ALS, op basis waarvan geconcludeerd wordt dat C21orf2 gezien kan worden als een nieuw ALS-gen. Bij de andere loci is het nog niet geheel duidelijk of dit directe of indirecte associatie betreft. Naast de identificatie van nieuwe genetische risicofactoren biedt hoofdstuk 4 verdere inzichten in de genetische architectuur van ALS. Ongeveer 8.5% van het risico op ALS wordt verklaard door varianten met een allelfrequentie van meer dan 1%. Van deze 8.5% wordt slechts 0.2% verklaard door de 6 geïdentificeerde loci. Verder onderzoek naar de overige 8.3% van dit totale risico laat zien dat deze nog te ontdekken varianten voornamelijk laagfrequente varianten zijn (met een allelfrequentie, of MAF, van 1-10%).

In hoofdstuk 5 wordt de rol van laagfrequente en zeldzame genetische varianten in de ontwikkeling van ALS onderzocht, door middel van een GWAS gericht op deze zeldzame varianten verspreid over het eiwit-coderende gedeelte van het DNA. In deze studie werden 7,350 ALS patiënten en controles uit 6 verschillende Europese landen geïncludeerd. Wij vonden in deze studie geen nieuwe varianten of genen gerelateerd + aan ALS. Daarnaast lijkt er bij ALS geen sprake te zijn van een verrijking van zeldzame genetische varianten die afwezig zijn in de controlegroep (set-unieke varianten). Deze verrijking werd in een eerdere studie wel gevonden in schizofrenie, wat impliceert dat de genetische achtergrond van ALS anders is dan die van schizofrenie. Aangezien schizofrenie een sterk polygene ziekte is (waarbij inmiddels meer dan 100 risicoloci zijn geïdentificeerd), zou dit kunnen betekenen dat de genetische achtergrond van ALS minder polygeen en/of meer regio-specifiek is.

Genoomwijde sequencing studies Met de snelle veranderingen binnen de genetica zijn de laatste jaren methoden waarbij volledige individuele DNA-profielen in kaart worden gebracht steeds toegankelijker geworden. Deze methode wordt whole genome sequencing (WGS) genoemd. Whole exome sequencing (WES) is het letter-voor-letter in kaart brengen van het volledige eiwit-coderende gedeelte van het DNA. Hoofdstuk 6 is een studie waarbij door middel van het combineren van verschillende analyses met WGS- en WES-data een nieuw ALS- gen werd gevonden, NEK1. Via verschillende methoden werd gevonden dat de NEK1- variant p.Arg261His een verhoogd risico geeft op ALS, maar ook dat andere mutaties die een verstoring geven van de eiwitfunctie (loss-of-function varianten) gerelateerd zijn aan ALS. NEK1 was in eerdere studies al herkend als mogelijk ALS-gen, in de studie beschreven in hoofdstuk 6 werd dit onomstotelijk bevestigd. Eerdere studies hebben

195 Appendices | Nederlandse samenvatting

uitgewezen dat nieuwe ALS-genen NEK1 en C21orf2 interactoren zijn. Toekomstig onderzoek zal moeten uitwijzen hoe deze genen bijdragen aan het ontwikkelen van ALS.

Opvallend genoeg was NEK1 ook de tophit in de studie beschreven in hoofdstuk 5, dit signaal was echter niet statistisch significant. De studie had een te kleine onderzoeksgroep om de NEK1-variant p.Arg261His te vinden, en daarnaast werden in hoofdstuk 5 te weinig andere varianten in het gen getest om een afdoende sterk signaal op te vangen bij de burden analyse. Dit bevestigt de potentie voor toekomstige WES- en WGS-studies in ALS.

Genetische karakterisering van ALS De genetische architectuur van een ziekte wordt gedefinieerd als de combinatie van de hoeveelheid, het type, de frequentie, relatie tussen en grootte van effect van genetische varianten die bijdragen aan de aandoening. In hoofdstuk 7 wordt beschreven hoe het onderzoek in dit proefschrift nieuwe inzichten schept in de genetische architectuur van ALS. Hierbij bespreek ik de erfelijkheid (heritability - in welke mate genvariatie bijdraagt aan fenotypische variatie), polygeniciteit (hoeveel varianten bijdragen aan fenotypische variatie), de rol van pleiotropie (waarbij één risicogen meerdere fenotypes kan beïnvloeden) en typen genetische variatie die bijdragen aan ALS. Deze resultaten zijn pas klinisch relevant als deze gebruikt kunnen worden voor een uiteindelijke vertaalslag naar de patiënt in de dagelijkse praktijk. Het uitbreiden van genetische kennis kan bijdragen aan het begrijpen van de pathofysiologie van ALS, het ondersteunen van nieuwe predictiemodellen aan het ontwikkelen van nieuwe (gepersonaliseerde) behandelingen. Daarnaast onderstreep ik het belang van toegankelijkheid van wetenschappelijk onderzoek, het belang van open access science. Ten slotte bespreek ik toekomstige richtingen voor het onderzoek naar de genetica van ALS, waarbij zowel breder (voorbij Europese populaties), als gedetailleerder (met het volledig in kaart brengen van DNA-profielen) zal worden gezocht naar de achtergrond van ALS.

196 CONCLUSIES VAN DIT PROEFSCHRIFT

• NIPA1 repeat-expansies geven een verhoogd risico op ALS. • Zeldzame varianten in NEK1 geven een verhoogd risico op ALS. • Zeldzame varianten in C21orf2 geven een verhoogd risico op ALS. • Varianten in MOBP en SCFD1 zijn geassocieerd met ALS. • Een groot gedeelte van de nog te ontdekken risicofactoren voor ALS betreft laagfrequente genetische varianten. • Geen genetische varianten op de Illumina Exome Chip geven een sterk verhoogd risico op ALS. • Onderzoek suggereert dat ALS eerder een oligogene dan een polygene ziekte is.

+

197 Appendices | Acknowledgements

ACKNOWLEDGEMENTS

Dit proefschrift is tot stand gekomen met de hulp en steun van velen. Een aantal wil ik in het bijzonder noemen:

Allereerst gaat mijn grote dank uit naar alle patiënten en families die dit onderzoek mogelijk hebben gemaakt. De bereidheid om ondanks deze vreselijke ziekte keer op keer deel te nemen aan studies heeft diepe indruk op mij gemaakt.

Geachte prof. Veldink, beste Jan, ik ben erg dankbaar voor de kans om in jouw genetica- groep onderzoek te mogen komen doen. Met bewondering ben ik de afgelopen jaren getuige geweest van de ontwikkeling van het kleine clubje genetica-onderzoekers in de Winklerzaal naar een bloeiende internationale neurogenetics groep. Ik heb het als bijzonder ervaren hoe je ondanks het drukke schema met kliniek- en onderzoekstaken altijd tijd vrij maakt voor je promovendi. Je enthousiasme, creativiteit en de vrijheid (vertrouwen) die je jouw onderzoekers gunt maken de neurogenetica groep uniek en een inspirerende onderzoeksomgeving. Jouw skills als master-collaborator zorgen ervoor dat het beste wordt gehaald uit zowel de onderzoekers als de data. Veel dank voor de afgelopen jaren.

Geachte prof. Van den Berg, beste Leonard, het is enorm indrukwekkend hoe jij het ALS- onderzoek in Nederland op de kaart hebt gezet, en hoe je aan het hoofd staat van vele internationale initiatieven. Alles met veel humor en één helder doel: een behandeling vinden voor ALS. De zin “But what does it mean for the patient?” resoneerde krachtig in mijn hoofd tijdens elke stap van mijn onderzoek. Veel dank voor je heldere en scherpe blik.

Geachte dr. Van Es, beste Michael, van het superviseren van studenten en uitvoeren van analyses tot het schrijven van een artikel en het submitten van papers, met jouw manier van redeneren en uitleggen is alles in no-time helder en overzichtelijk. Het maakt je een voorbeeld in het onderzoek, en zeker ook als clinicus. Ik ben daarom blij en dankbaar dat ik de afgelopen jaren op beide fronten onder jouw supervisie heb mogen werken, en op beide fronten van je heb mogen leren.

Dear dr. Pulit, lieve Sara, although your Dutch is nowadays probably better than my English, I am writing my thanks to you in your native language. I thank you from the bottom of my heart for being a kick-ass mentor and true inspiration in many aspects. You are a computational genetics guru, and the way you are able to explain (in my

198 opinion) difficult statistical concepts in a comprehensible way reminds me of Einstein every time: “If you can’t explain it simply, you don’t understand it well enough”. Your inspirational quotes (Dory’s “Just keep swimming” or Frozen’s “Let it gooooooo…”) and chats over chai lattes picked me up and refueled me every time. Your positivity and faith in my capabilities helped me grow as a professional, but also as a person. I have no doubt you will turn any adventure into greatness.

De leden van de beoordelingscommissie, prof. dr. J.K. Ploos van Amstel, prof. dr. F.W. Asselbergs, prof. dr. R.J. Pasterkamp, prof. dr. J.C. Van Swieten en dr. Y.M. Ruigrok, wil ik hartelijk danken voor de deskundige en kritische beoordeling van mijn proefschrift.

Mijn huidige opleiders, dr. Seute en prof. Biessels, en voormalig opleider prof. Wokke, beste Tatjana, Geert-Jan en John, veel dank voor de mogelijkheid om de opleiding Neurologie te mogen combineren met wetenschappelijk onderzoek. De ruimte die binnen onze opleiding geboden wordt voor individuele wensen en capaciteiten is mijns inziens uniek, ik ben dan ook trots dat ik onderdeel mag uitmaken van de Utrechtse opleidingsgroep. Daarnaast ben ik ervan overtuigd dat ik door mijn promotieonderzoek ook een betere neuroloog zal worden. +

Stafleden van de neurologie in het UMC Utrecht en het Tergooi Ziekenhuis, hartelijk bedankt voor de begeleiding in mijn opleiding tot nu toe. Ik zie uit naar de toekomst!

It’s teamwork that makes the dream work. Al het onderzoek in dit proefschrift is uitgevoerd in teamverband. Veel dank aan alle nationale en internationale collaborators, maar vooral mijn grote dank aan Wouter, Perry, Meinie, Gijs en Frank – zonder jullie was dit boekje er niet geweest.

ALS-labgenoten van het eerste uur: Lotte, Renske, Anne, Marloes, Oliver, Bas (#TeamBas), Henk-Jan (R matey), Renée, Camiel en Marc. En natuurlijk de nieuwe lichting toppers, waaronder Bram, Harold, Kevin, Louise, Susan, Balint en Hannelore. Veel dank voor de inspirerende discussies, fijne congressen, borrels (bier hoort blijkbaar niet in de koude kamer), en altijd goede sfeer.

Lab-helden Peter, William, Raymond, Jelena en Erwin; van de fasciculaties in mijn duimmuis na weken pipetteren van Project MinE DNA, het aanhoren van mijn frustraties over de piep van Cybi-Selma tot de gezamenlijke verwondering over DNA-shipment issues. Dankzij jullie weet ik hoeveel werk er aan de bench nodig is om genetica- onderzoek tot een succes te maken, veel dank voor de hulp en al het harde werk.

199 Appendices | Acknowledgements

Kamergenoten Kristel, Paul en Suzanne: tijdens onze Stratenum-tijd zijn inmiddels een heleboel life-events de revue gepasseerd, net als een kleine dierentuin aan kamerdieren: vogeltjes (het is best verdrietig dat de Hopster uitgestorven is verklaard…), wandelende takken en natuurlijk onze axolotls. Veel dank voor alle support en zo nu en dan een beetje kattenkwaad. My newest roombuddies Rick (ik mis de methodologische en soms filosofische gesprekken nu al), Anna, Kevin and Tessa, thank you for making me feel so welcome during the last months of my PhD.

Paul en Ramona, ik heb de eer gehad om jullie afzonderlijk als student te mogen begeleiden en daarna samen tijdens de eerste periode van jullie promotie als TeamMethylation. Ik weet zeker dat de uren zwoegen op tegengestelde effecten, lage intensiteiten en suboptimale randomisatie gaan resulteren in strakke artikelen. Ik hoop dat jullie met net zoveel plezier terugkijken op ons teamwork als ik en ik kan niet wachten om het keiharde werk terug te zien in jullie proefschriften.

All the members of the Neurogenetics Group, including Joke, Jurjen, Dick, Mark, Aoife, Brendan, Lindy en Bo Chao, thank you for being an amazing group!

Alle medewerkers van het ALS Centrum en de MND-poli, dank jullie wel voor alle ondersteuning en al het geduld. Jullie tomeloze inzet om de klinische zorg, en de verschillende onderzoeksprojecten zo optimaal mogelijk te laten verlopen, is indrukwekkend.

Lieve collega arts-assistenten neurologie, wat hebben we een goede groep. Goed gevarieerd, maar wel altijd met een flinke dosis zelfspot en (een vleug) perfectionisme. Er is altijd wel iemand om even stoom bij af te blazen na een razend drukke dienst, om een aangrijpende casus of gewoon even het leven buiten het ziekenhuis mee te bespreken. Tussen de bedrijven door, maar het liefst mét koffie of op vrijdagavond met pizza of sushi op de SEH.

Helium: Evelien, Lisanne, Anne Maren, Lindy en Yara. Liefste schatten, wat zijn we rijk. Wat ben ik dankbaar dat we al 20 jaar lief en leed mogen delen, of dat nou in Alphen, Amsterdam, Ter Aar, Hardenberg, Hilversum, Terschelling, via de app, in een boerderij met een koeienschuur of in een jacuzzi op een rooftop is. Laten we dat blijven doen tot we rimpelige rozijntjes zijn. Lieve Anne Maren en Lindy, dank jullie wel voor de liefde en support de afgelopen periode, ondanks jullie eigen ongelofelijk volle (nee rijke) leven. Ik kan me geen betere paranimfen / powerhouses wensen!

200 Mijn geneeskunde ladies Evelien, Anouk, Eelkje en Mirjam. Wat hebben wij een mooie avonturen beleefd samen, dat boek zouden we er nog steeds over kunnen schrijven. Het combineren van roosters van vijf specialisten (in opleiding), met en zonder gezin en wel of geen promotietraject blijkt een drama, maar wat is het heerlijk (en alsof we nog steeds 21 zijn – alleen nu zonder de slechte wijn) als we elkaar weer zien. Ontzettend trots ben ik op alle paden die jullie hebben uitgestippeld.

Lieve Ralph en Kirsten, lieve Katja en Rick: jullie zijn van heel dichtbij getuige geweest van alle ups&downs tijdens mijn kliniek- en onderzoeksjaren, en wat ben ik blij dat ik jullie dicht bij me heb. Jullie maken het leven leuker, altijd. Laten we nog heel vaak samen het leven vieren. Lieve Bart, Fanou en alle andere NOU-ers, ideaal dat die man van mij zulke fijne mensen meebracht. Inmiddels zitten jullie ook in mijn hart. Lieve Barry, geen betere plek om dit feest te vieren dan bij jou in Walden.

Lieve Charles en Syl, lieve schoonfamilie; veel dank voor de liefdevolle ontvangst in jullie gezin. Ik kan me geen leven meer voorstellen zonder jullie.

Mijn grote zus, mijn grootste heldin. Lieve Tessa, we zijn steeds meer twee verschillende + versies van dezelfde persoon. Wat is het moeilijk geweest om een nieuw leven op te bouwen in een ander land, en wat heb je dat goed gedaan. De Dekker-group is inmiddels in volle bloei (yes queen). Dank je wel voor al je hulp, steun en peptalks. En humor. De beste humor. Ooit. Laten we voor altijd doorgaan met onze borreldiners en bloody Mary’s the day after – bij jou in Londen of bij mij in Amsterdam.

Lieve Mam, woorden schieten te kort om te beschrijven hoeveel jij voor mij betekent. Jij zag mijn toekomst al lang voordat ik het zag. Onze dagelijkse belletjes om te sparren over werk en leven zijn van onschatbare waarde. Van jou leer ik om een goed mens, en een betrokken dokter te zijn. Mijn grote voorbeeld, wat ben ik trots! Lieve Oscar, wat ben ik blij dat je er bent. Ik heb ontzettend veel van je geleerd, ik hoop dat ik dat nog heel lang mag blijven doen.

Lieve Pap, van jou leerde ik al van jongs af aan: “Het leven is niet eerlijk”, wat voor een belangrijk deel mijn optimistische en pragmatische instelling heeft gevormd. Veel dank voor jouw onvoorwaardelijke steun en ruimte voor het uitstippelen van mijn eigen pad. Ik hoop dat ik net als jij voor altijd mijn passie mag volgen. Lieve Peg, de meest attente vrouw die ik ooit heb ontmoet. Veel dank voor je luisterende oor en waardevolle adviezen, van heel dichtbij of vanaf de andere kant van de wereld.

201 Appendices | Acknowledgements

Tot slot: Lieve Youri, waar jij bent ben ik thuis. Jij haalt het beste in mij naar boven. Dank je wel voor je eindeloze geduld, onvoorwaardelijke steun en grappen waar ik toch altijd, ook als ik dat echt niet wil, om moet lachen. Jij kent mij beter dan ikzelf. Ik wil niets liever dan met jou het leven blijven ontdekken en avonturen blijven beleven. Voor mij ben jij de liefde.

Annelot

Speciaal voor iedereen (en vooral voor mijn grote liefde) die gegniffeld heeft om mijn onderzoek naar genomen: het is genomes, en niet gnomes. Speciaal voor jullie zit er een klein gnoompje (een kabouter dus) verstopt in mijn proefschrift. Veel zoekplezier!

202 +

203 Appendices | Publications

PUBLICATIONS

* These authors equally contributed to this work. # These authors jointly supervised this work.

In this thesis

Tazelaar GHP*, Dekker AM*, van Vugt JJFA, van der Spek RA, Westeneng HJ, Kool JBG, Kenna KP, van Rheenen W, Pulit SL, McLaughlin RL, Sproviero W, Iacoangeli A, Hübers A, Brenner D, Morrison KE, Shaw PJ, Shaw CE, Povedano Panadés M, Mora Pardina JS, Glass JD, Hardiman O, Al-Chalabi A, van Damme P, Robberecht W, Landers JE, Ludolph AC, Weishaupt JH, van den Berg LH, Veldink JH, van Es MA, on behalf of the Project MinE ALS Sequencing Consortium. Association of NIPA1 repeat expansions with amyotrophic lateral sclerosis in a large international cohort. Neurobiology of Aging 74, 234.e9–234.e15 (2019)

Dekker AM*, Seelen M*, van Doormaal PTC, van Rheenen W, Bothof RJ, van Riessen T, Brands WJ, van der Kooi AJ, de Visser M, Voermans NC, Pasterkamp RJ, Veldink JH, van den Berg LH, van Es MA. Large-scale screening in sporadic amyotrophic lateral sclerosis identifies genetic modifiers in C9orf72 repeat carriers. Neurobiology of Aging 39, 220.e9–15 (2016)

Wouter van Rheenen WR*, Aleksey Shatunov A*, Dekker AM, McLaughlin RL, Diekstra FP, Pulit SL, van der Spek RAA, Vosa U, de Jong S, Robinson MR, Yang J, Fogh I, van Doormaal PTC, Tazelaar GHP, Koppers M, Blokhuis AM, Sproviero W, Jones AR, Kenna KP, van Eijk KR, Harschnitz O, Schellevis RD, Brands WJ, Medic J, Menelaou A, Vajda A, Ticozzi N, Lin K, Rogelj B, Vrabec K, Ravnik-Glavac M, Koritnik B, Zidar J, Leonardis L, Dolenc Groselj L, Millecamps S, Salachas F, Meininger V, de Carvalho M, Pinto S, S. Mora JS, Rojas-Garcia R, Polak M, Chandran S, Colville S, Swingler R, Morrison KE, Shaw PJ, Hardy J, Orrell RW, Pittman A, Sidle K, Fratta P, Malaspina A, Topp S, Petri S, Abdulla S, Drepper C, Sendtner M, Meyer T, Ophoff RA, Staats KA, Wiedau-Pazos M, Lomen-Hoerth C, Van Deerlin VM, Trojanowski JQ, Elman L, McCluskey L, Basak N, Tunca C, Hamzeiy H, Parman Y, Meitinger T, Lichtner P, Blagojevic-Radivojkov M, Andres CR, Maurel C, Bensimon G, Landwehrmeyer B, Brice A, Payan CAM, Saker- Delye S, Durr A, Wood N, Tittmann L, Lieb W, Franke A, Rietschel M, Cichon S, Nothen MM, Amouyel P, Tzourio C, Dartigues J-F, Uitterlinden AG, Rivadeneira F, Estrada K, Hofman A, Curtis C, Blauw HM, van der Kooi AJ, de Visser M, Goris A, Weber M, Shaw CE, Smith BN, Pansarasa O, Cereda C, Del Bo R, Comi GP, D’Alfonso S, Bertolin S, Soraru G, Mazzini L, Pensato V, Gellera C, Tiloca C, Ratti A, Calvo A, Moglia C, Brunetti M, Arcuti S, Capozzo R, Zecca C, Lunetta C, Penco S, Riva N, Filosto M, Muller B, Stuit RJ, PARALS registry, SLALOM group, SLAP registry, FALS Sequencing Consortium, SLAGEN Consortium, NNIPPS Study Group, Blair I, McCann EP, Fifita JA, Nicholson GA, Rowe DB, Pamphlett R, Kiernan MC, Grosskreutz J, Ringer T, Prell T, Stubendorff B, Kurth I, P. Leigh PN, Casale F, Chio A, Beghi E, Pupillo E, Tortelli R, Logroscino G, Powell J, Ludolph AC, Weishaupt JH, Robberecht W, Van Damme P, Franke L, Pers T, Brown RH, Glass J, Landers JE, Hardiman O, Andersen PM, Corcia P, Vourc’h P, Silani V, Wray NR, Visscher PM, de Bakker PIW, van Es MA, R. Pasterkamp J, Lewis CM, Breen G, Al-Chalabi A#, van den Berg LH#, Veldink JH#. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nature Genetics 48, 1043–1048 (2016)

204 Kenna KP*, van Doormaal PTC*, Dekker AM*, Ticozzi M*, Kenna BP, Diekstra F, van Rheenen W, van Eijk KR, Jones AR, Keagle P, Shatunov A, Sproviero W, Smith BN, van Es MA, Topp SD, Kenna A, Miller JW, Fallini C, Tiloca C, Mclaughlin RL, Vance C, Troakes C, Colombrita C, Mora G, Calvo A, Verde F, Al-Sarraj S, King A, Calini D, de Belleroche J, Baas F, Van der kooi AJ, de Visser M, LMA ten Asbroek , Sapp PC, Mckenna-Yasek D, Polak M, Asress S, Munoz-Blanco JL, Strom TM, Meitinger T, Morrison KE, SLAGEN Consortium, Lauria G, kelly l williams0, P N Leigh, Nicholson GA, Blair IP, Leblond CS, Dion PA, Rouleau GA, Pall H, Shaw PJ, Turner MR, Talbot K, Taroni F, Boylan KB, van Blitterswijk M, Rademakers R, Esteban-Perez J, Garcia-Redondo A, van Damme P, Robberecht W, Chio A, Gellera C, Drepper C, Sendtner M, Ratti A, Glass JD, Mora Pardina JS, Basak NA, Hardiman O, Ludolph AC, Andersen PM, Weishaupt JH, Brown Jr. RH, Al-Chalabi A, Silani V#, Shaw CE#, van den Berg LH#, Veldink JH#, Landers JE#. NEK1 variants confer susceptibility to amyotrophic lateral sclerosis. Nature Genetics 48, 1037–1042 (2016).

Not included in this thesis

Publications as member of the Project MinE Consortium

Project MinE ALS Sequencing Consortium. CHCHD10 variants in amyotrophic lateral sclerosis: Where is the evidence? Annals of Neurology 84, 110–116 (2018). + Project MinE ALS Sequencing Consortium. Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis. European Journal of Human Genetics 7, 1–10 (2018). van der Spek RA, van Rheenen W, Pulit SL, Kenna KP, Ticozzi N, Kooyman M, Mclaughlin RL, Moisse M, van Eijk KR, van Vugt JFA, Iacoangeli A, Andersen P, Basak AN, Blair I, de Carvalho M, Chio A, Corcia P, Couratier P, Drory VE, Glass JD, Hardiman O, Mora JS, Morrison KE, Mitne-Neto M, Robberecht W, Shaw PJ, Panades MP, van Damme P, Silani V, Gotkine M, Weber M, van Es MA, Landers JE, Al-Chalabi A, van den Berg LH, Veldink JH & Project MinE ALS Sequencing Consortium. Reconsidering the causality of TIA1 mutations in ALS. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration 19, 1–3 (2018).

Nicolas A*, Kenna KP*, Renton AE*, Ticozzi N*, Faghri F*, Chia R*, Dominov JA, Kenna BJ, Nalls MA, Keagle P, Rivera AM, van Rheenen W, Murphy NA, van Vugt JFA, Geiger JT, Van der Spek RA, Pliner HA, Shankaracharya, Smith BN, Marangi G, Topp SD, Abramzon Y, Gkazi AS, Eicher JD, Kenna A, ITALSGEN Consortium, Mora G, Calvo A, Mazzini L, Riva N, Mandrioli J, Caponnetto C, Battistini S, Volanti P, La Bella V, Conforti FL, Borghero G, Messina S, Simone IL, Trojsi F, Salvi F, Logullo FO, D’Alfonso S, Corrado L, Capasso M, Ferrucci L, Genomic Translation for ALS Care (GTAC) Consortium, de Araujo Martins Moreno C, Kamalakaran S, Goldstein DB, ALS Sequencing Consortium, Gitler AD, Harris T, Myers RM, NYGC ALS Consortium, Phatnani H, Musunuri RL, Evani US, Abhyankar A, Zody MC, Answer ALS Foundation, Kaye J, Finkbeiner S, Wyman SK, LeNail A, Lima L, Fraenkel E, Svendsen CN, Thompson LM, Van Eyk JE, Berry JD, Miller TM, J. Kolb SJ, Cudkowicz M, Baxi E, Clinical Research in ALS and Related Disorders for Therapeutic Development (CReATe) Consortium, Benatar M, Taylor JP, Rampersaud E, Wu G, Wuu J, SLAGEN Consortium, Lauria G, Verde F, Fogh I, Tiloca C, Comi GP, Soraru G, Cereda C, French ALS Consortium, Corcia P, Laaksovirta H, Myllykangas L, Jansson L, Valori M, Ealing J, Hamdalla H, Rollinson S, Pickering-Brown S, Orrell RW, Sidle KC, Malaspina A, Hardy J, Singleton AB, Johnson JO, Arepalli S, Sapp PC, McKenna-Yasek D, Polak M, Asress S, Al-Sarraj S, King A, Troakes C, Vance C, de Belleroche J, Baas F, ten

205 Appendices | Publications

Asbroek ALMA, Munoz-Blanco JL, Hernandez DG, Ding J, Gibbs JR, Scholz SW, Floeter MK, Campbell RH, Landi F, Bowser R, Pulst SM, Ravits JM, MacGowan DJL, Kirby J, Pioro EP, Pamphlett R, Broach J, Gerhard G, Dunckley TL, Brady CB, Kowall NW, Troncoso JC, Le Ber I, Mouzat K, Lumbroso S, Heiman-Patterson TD, Kamel F, Van Den Bosch L, Baloh RH, Strom TM, Meitinger T, Shatunov A, Van Eijk KR, de Carvalho M, Kooyman M, Middelkoop B, Moisse M, McLaughlin RL, Van Es MA, Weber M, Boylan KB, Van Blitterswijk M, Rademakers R, Morrison KE, Basak AN, Mora JS, Drory VE, Shaw PJ, Turner MR, Talbot K, Hardiman O, Williams KL, Fifita JA, Nicholson GA, Blair IP, Rouleau GA, Esteban-Perez J, Garcia-Redondo A, Al-Chalabi A, Project MinE ALS Sequencing Consortium, Rogaeva E, Zinman L, Ostrow LW, Maragakis NJ, Rothstein JD, Simmons Z, Cooper-Knock J, Brice A, Goutman SA, Feldman EL, Gibson SB, Taroni F, Ratti A, Gellera C, Van Damme P, Robberecht W, Fratta P, Sabatelli M, Lunetta C, Ludolph AC, Andersen PM, Weishaupt JH, Camu W, Trojanowski J, Van Deerlin VM, Brown Jr RH, van den Berg LH, Veldink JH, Harms MB, Glass JD, Stone DJ#, Tienari P#, Silani V#, Chio A#, Shaw CE#, Traynor BJ#, Landers JE#. Genome-wide Analyses Identify KIF5A as a Novel ALS Gene. Neuron 97, 1268–1282.e6 (2018)

Cooper-Knock J, Robins H, Niedermoser I, Wyles M, Heath PR, Higginbottom A, Walsh T, Kazoka M, Project MinE ALS Sequencing Consortium, Ince PG, Hautbergue GM, McDermott CJ, Kirby J, Shaw PJ. Targeted Genetic Screen in Amyotrophic Lateral Sclerosis Reveals Novel Genetic Variants with Synergistic Effect on Clinical Phenotype. Frontiers in Molecular Neuroscience. 10, 146–11 (2017)

McLaughlin RL*, Schijven D*, van Rheenen W, van Eijk KR, O’Brien M, Project MinE GWAS Consortium, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Kahn RS, Ophoff RA, Goris A, Bradley DG, Al-Chalabi A, van den Berg LH, Luykx JJ#, Hardiman O#, Veldink JH#. Genetic correlation between amyotrophic lateral sclerosis and schizophrenia. Nature Communications 8, 14774 (2017)

Publications as collaborator

Meininger V, Genge A, van den Berg LH, Robberecht W, Ludolph A, Chio A, Kim SH, Leigh PN, Kiernan MC, Shefner JM, Desnuelle C, Morrison KE, Petri S, Boswell D, Temple J, Mohindra R, Davies M, Bullman J, Rees P, Lavrov A, on behalf of the NOG112264 Study Group. Safety and efficacy of ozanezumab in patients with amyotrophic lateral sclerosis: a randomised, double-blind, placebo-controlled, phase 2 trial. The Lancet Neurology 16, 208–216 (2017)

Other publications

Westeneng HJ, Debray TPA, Visser AE, van Eijk RPA, Rooney JPK, Calvo A, Martin S, McDermott CJ, Thompson AG, Pinto S, Kobeleva X, Rosenbohm A, Stubendorff B, Sommer H, Middelkoop BM, Dekker AM, van Vugt JFA, van Rheenen W, Vajda A, Heverin M, Kazoka M, Hollinger H, Gromicho M, Korner S, Ringer TM, Rodiger A, Gunkel A, Shaw CE, Bredenoord AL, van Es MA, Corcia P, Couratier P, Weber M, Grosskreutz J, Ludolph AC, Petri S, de Carvalho M, Van Damme P, Talbot K, Turner MR, Shaw PJ, Al-Chalabi A, Chio A, Hardiman O, Moons KGM, Veldink JH, van den Berg LH. Prognosis for patients with amyotrophic lateral sclerosis: development and validation of a personalised prediction model. The Lancet Neurology 17, 423–433 (2018).

Van Doormaal PTC*, Ticozzi N*, Weishaupt JH*, Kenna K, Diekstra FP, Verde F, Dekker AM, Tiloca C, Pensato V, Nurnberg P, Calini D, Altmuller J, Castellotti B, Motameny S, Antonia Ratti A, Gellera C, Ludolph AC, van den Berg LH#, Landers JE#, Veldink JH#, Silani V#, Volk A E#. The role of de novo mutations in the development of amyotrophic lateral sclerosis. Human Mutation 38, 1534–1541 (2017)

206 McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Min Kang H, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink JH, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, M Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki AE, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy SL, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, van den Berg JH, van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PIW, Swertz MA, McCarroll S, Kooperberg C, Dekker AM, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R, Abecasis G, Marchini J, for the Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics 48, 1279–1283 (2016) van Oorschot B, Hovingh S, Dekker AM, Stalpers, LJ, Franken NAP. Predicting Radiosensitivity with Gamma- H2AX Foci Assay after Single High-Dose-Rate and Pulsed Dose-Rate Ionizing Irradiation. Radiation Research 185, 190–198 (2016)

Submitted / in preparation

Dekker AM, Diekstra FP, Pulit SL, Tazelaar GHP, van der Spek RA, van Rheenen W, van Eijk KR, Andrea Calvo A, + Maura Brunetti M, Van Damme P, Robberecht W, Hardiman O, McLaughlin RL, Chiò A, Sendtner M, Ludolph AC, Weishaupt JH, Mora Pardina JS, Leonard H. van den Berg LH, Veldink JH. Exome array analysis of rare and low frequency variants in amyotrophic lateral sclerosis. Submitted

207 Appendices | About the author

ABOUT THE AUTHOR

Annelot Dekker was born on the 21st of August 1987 in Alphen aan den Rijn, The Netherlands. In 2005 she graduated from secondary school (VWO) with honors at the Ashram College in Alphen aan den Rijn. She started Medical School in 2005 at the University of Amsterdam. During her studies Annelot travelled to Ghana to perform volunteer work at the Oskan Challis Foundation / Church of God Clinic in Asienimpong. Her interest for genetics grew during her scientific internship at the Laboratory for Experimental Oncology and Radiobiology (LEXOR) at the Academic Medical Center in Amsterdam, where she investigated the effect of mutations in DNA damage repair genes on the repair of DNA double-strand breaks induced by ionizing radiation. In 2012 she obtained her medical degree cum laude. After finishing medical school, she worked for one year as a senior house officer (ANIOS) at the Onze Lieve Vrouwe Gasthuis in Amsterdam. In 2013, she started her PhD on the genetic underpinnings of ALS at the University Medical Center Utrecht, the results of which are described in this thesis. During her PhD, she also worked as a trial physician on two trials testing new drugs for ALS, and worked on studies on epigenetic risk factors for ALS. She performed her PhD research under the supervision of prof. dr. L.H. Van den Berg and prof. J.H. Veldink (promotors), and dr. S.L. Pulit and dr. M.A. van Es (co-promotors). In 2013, Annelot also started the Neurology Residency program at the University Medical Center Utrecht under supervision of prof. J.H.J. Wokke, dr. T. Seute and prof. dr. G.J. Biessels.

208