Human (2021) 140:1183–1200 https://doi.org/10.1007/s00439-021-02289-w

ORIGINAL INVESTIGATION

Whole‑genome sequencing identifes functional noncoding variation in SEMA3C that cosegregates with dyslexia in a multigenerational family

Amaia Carrion‑Castillo1,2 · Sara B. Estruch1 · Ben Maassen3,4 · Barbara Franke5,6 · Clyde Francks1,6 · Simon E. Fisher1,6

Received: 16 November 2020 / Accepted: 27 April 2021 / Published online: 2 June 2021 © The Author(s) 2021

Abstract Dyslexia is a common heritable developmental disorder involving impaired reading abilities. Its genetic underpinnings are thought to be complex and heterogeneous, involving common and rare genetic variation. Multigenerational families seg- regating apparent monogenic forms of -related disorders can provide useful entrypoints into biological pathways. In the present study, we performed a genome-wide linkage scan in a three-generational family in which dyslexia afects 14 of its 30 members and seems to be transmitted with an autosomal dominant pattern of inheritance. We identifed a on 7q21.11 which cosegregated with dyslexia status, with the exception of two cases of phenocopy (LOD = 2.83). Whole-genome sequencing of key individuals enabled the assessment of coding and noncoding variation in the family. Two rare single-nucleotide variants (rs144517871 and rs143835534) within the frst intron of the SEMA3C cosegregated with the 7q21.11 risk haplotype. In silico characterization of these two variants predicted efects on gene regulation, which we functionally validated for rs144517871 in human cell lines using luciferase reporter assays. SEMA3C encodes a secreted that acts as a guidance cue in several processes, including cortical neuronal migration and cellular polarization. We hypothesize that these intronic variants could have a cis-regulatory efect on SEMA3C expression, making a contribution to dyslexia susceptibility in this family.

Introduction

Dyslexia is a prevalent human neurodevelopmental condi- tion, characterized by a difculty learning to read despite conventional instruction, adequate educational opportunities and IQ, and a lack of sensory impairments (Shaywitz et al. * Simon E. Fisher 1992). It shows familial clustering and has been reported to [email protected] be moderately heritable in a range of samples, with herit- ability estimates from 0.3 to 0.8 (Peterson and Pennington 1 Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands 2015). Studies thus far indicate a complex multifactorial eti- ology involving genetic and environmental factors (Bishop 2 Basque Center on Cognition, Brain and Language, San Sebastian, Spain 2015). Multiple genetic variants are likely to act as risk factors 3 Centre for Language and Cognition Groningen, University of Groningen, Groningen, The Netherlands contributing to the liability of dyslexia. Until recent years, 4 much of the research on the molecular basis of dyslexia School of Behavioral and Cognitive Neurosciences, ROBO1, University Medical Centre Groningen, Groningen, focused on a handful of candidate (e.g., The Netherlands KIAA0319, DCDC2, and DYX1C1) that were identified 5 Departments of Human Genetics and Psychiatry, Radboud through linkage analysis in families, and then followed University Medical Center, Nijmegen, The Netherlands up via fne-mapping of association with common variants 6 Donders Institute for Brain, Cognition and Behaviour, within those genes (see (Carrion-Castillo et al. 2013) for a Radboud University, Nijmegen, The Netherlands review). In the last decade, genome-wide association scan

Vol.:(0123456789)1 3 1184 Human Genetics (2021) 140:1183–1200

(GWAS) studies have tried to identify other loci afecting a rare non-synonymous variant within the NCAN gene, on reading ability by querying common genetic variants across chromosome 19p13.11 (maximum NPL = 1.36) (Einarsdottir the whole genome for association in a relatively unbiased et al. 2017). The variant was present in seven out of eight manner (Luciano et al. 2013; Field et al. 2013; Gialluisi afected individuals and one out of four unafected family et al. 2014, 2019, 2020; Eicher et al. 2013; Truong et al. members (Einarsdottir et al. 2017). 2019; Price et al. 2020). These GWAS eforts have pointed In the current study, we took advantage of the multi- to a number of promising loci. On the other hand, there are plex family design and sequencing technologies to study a some unusual extended families with high frequency of dys- three-generation Dutch pedigree in which almost half of its lexia, in which it may follow a roughly Mendelian inherit- 30 members have dyslexia. We searched for chromosomal ance pattern (Fagerheim et al. 1999; Nopola-Hemmi et al. regions that cosegregate with dyslexia within this family 2001; de Kovel et al. 2004). Studying these families in which using linkage analysis based on a single-nucleotide polymor- one, or perhaps a few, genetic variants with substantial pen- phism (SNP) genotype array and used WGS to characterize etrances could contribute to dyslexia has already proven to the genetic variants found in seven critical individuals at be valuable for the identifcation of several dyslexia candi- single-nucleotide and structural levels. Next, we used the date genes. For example, a rare haplotype of ROBO1 was haplotype scafolding (from the genotype array) for imputa- found to cosegregate with dyslexia status in the majority of tion in the remainder of the family members, and focused afected relatives of a large family, and the same gene was on rare variation that was shared among individuals with disrupted by a translocation in an unrelated case (Hannula- dyslexia, with the aim of identifying likely highly penetrant Jouppi et al. 2005); also, DYX1C1 was identifed as a poten- variants. Finally, we assessed the putative biological role tial candidate because of a chromosomal rearrangement that of the identifed variants of interest with functional experi- cosegregated with dyslexia in multiple members of one fam- ments in human cell lines. This strategy can lead to the iden- ily (Taipale et al. 2003). tifcation of new genetic variants involved in dyslexia, and Next-generation sequencing (NGS) has revolutionized promises to yield new clues on genes and pathways that are the genetic analysis of human diseases by enabling the sys- important for the development of reading abilities. tematic and rapid screening of common and rare mutations in the whole genome or whole exome. This has led to the discovery of several disease-causing genes underlying previ- Materials and methods ously unsolved Mendelian diseases (Bamshad et al. 2011). Moreover, the technology has also been used to explore the Sample extent to which rare alleles can explain the heritability of complex diseases and health-related traits. For instance, We studied a three-generation family of 30 members through family-based approaches for detecting de novo (referred to hereafter as family 352, Fig. 1), in which 14 mutations, NGS has identifed new genes involved in the relatives had a positive diagnosis of dyslexia. The inherit- genetic architecture of complex and heterogenic disorders ance pattern of the trait was consistent with the potential such as autism spectrum disorder (Iossifov et al. 2014) and involvement of a single genetic locus, transmitted with an intellectual disability (Gilissen et al. 2014), and is beginning autosomal dominant mode of transmission. Specifcally, to shed new light on pathways underlying speech apraxia individuals with dyslexia were observed in every generation, (Eising et al. 2019; Hildebrand et al. 2020; den Hoed and roughly half of the family members were afected, includ- Fisher 2020). Unusual families with a high frequency of dys- ing similar numbers of males (n = 8) and females (n = 6), lexia provide an additional possibility to rank variants for and there were three instances of male-to-male transmission potential causal links to a trait/disorder, according to how (arguing strongly against X-linkage). This family, which has well they cosegregate with the phenotype and their likely not been previously described in the literature, was recruited functional consequences at the molecular level. For example, as part of a multidisciplinary research efort into diferent whole-exome sequencing (WES) in an extended Swedish aspects of dyslexia (the Dutch Dyslexia Programme) (van family identifed a two-base mutation (chr3:123264558-9, der Leij and Maassen 2013), in which families were ascer- hg19), which resulted in an amino acid change (p.R229L) tained when at least two frst degree relatives had a school within the CEP63 gene, partially cosegregating with dys- history of reading problems (de Kovel et al. 2004; van der lexia: seven out of ten of the afected individuals were car- Leij and Maassen 2013). Informed consent was obtained riers of the risk variant (Einarsdottir et al. 2015). Another from all participants, and the study was approved by the study that reported ten putatively linked loci in a Finnish ethics committee (CWOM) of the University Medical Centre family used whole-exome and whole-genome sequencing Nijmegen under CWOM-nr 9811-025. (WGS) in two critical individuals of the family to identify

1 3 Human Genetics (2021) 140:1183–1200 1185

Family 352

I. 1 I. 2 WGS #

II. 1 II. 2II. 3 II.4 II. 5 II.6 II. 7 II.8 II.9 II. 10 II.11 II.12II. 13 II. 14 WGS WGS # WGS WGS

III. 1 III. 2III. 3 III.4 III. 5 III.6 III.7 III.8 III. 9 III. 10 III. 11 III. 12 III. 13 III. 14 WGS # # WGS

Fig. 1 Pedigree of family 352. Black symbols represent individuals was not known. #: individuals without DNA sample availability. WGS diagnosed with dyslexia, white symbols represent individuals without whole-genome sequence determined as part of this study dyslexia, and gray symbols are individuals for which the phenotype

Phenotypic measures and diagnostic criteria performed below the 10th percentile on a word reading test, or (2) scored below the 10th percentile on a non- The family members were administered a battery of tests in word reading test, or (3) scored below the 25th normative single session, as briefy described below. percentile on both word and non-word reading tests, or (4) had a word or non-word reading score that was more than Word and non‑word reading fuency 60% points below their normalized score on a verbal com- petence test (discrepancy criterion). Following these crite- Word reading fuency was assessed using the One-Min- ria, 14 out of 27 tested family members were identifed as ute-Test (in Dutch, Een-Minute-Test or EMT), (Brus and having dyslexia. No sensory, developmental, or cognitive Voeten 1972; Kuijpers et al. 2003), while non-word read- defcits that could confound the dyslexia diagnosis were ing fuency was assessed using the Klepel test (van den detected among these individuals. Bos et al. 1994; Kuijpers et al. 2003). Participants were asked to correctly read as many items as possible within 1 Genotyping and linkage analysis min (word reading) or 2 min (non-word reading). DNA was available for 26 individuals from family 352, including 13 with dyslexia, 11 without dyslexia, and two Verbal competence with unknown phenotype (i.e., all individuals except I.2, II.7, III.9, and III.11 in Fig. 1). These samples were geno- Verbal competence was assessed as part of the Dutch ver- typed using the Illumina Infnium OmniExpressExome-8 sion of the Wechsler Adult Intelligence Test (Uterwijk BeadChip (Illumina, Illumina Human OmniExpress-Exome, 2000), which tests the ability of the individual to express 2016a, b) by the service company ‘Eurofns’ (Ger- him/herself verbally. The participant is ofered two words many). Genotypes were called using Illumina GenomeStu- and is asked to describe as concisely as possible the simi- dio, using the “humanomniexpressexome-8v1-2_a.bpm” larities between them. Examples (in English) are car-aer- manifest fle to map and annotate the SNPs to reference oplane or courage-cowardice. genome assembly build GRCh37/hg19. A total of 7338 SNPs, with a minimum distance of 0.5 centimorgans (cM) between SNPs and an average heterozy- Diagnostic criteria gosity of 49.7%, were selected for multipoint linkage analy- sis using LinkDataGen (Bahlo and Bromhead 2009), which Dyslexia was defned following the criteria of the Dutch also fltered out Mendelian inheritance errors and removed Dyslexia Programme, which were based on several read- non-polymorphic SNPs within the family. Given the pres- ing-related quantitative measures (van der Leij and Maas- ence of three male-to-male transmissions in the family, sen 2013). People were defned as afected if they (1)

1 3 1186 Human Genetics (2021) 140:1183–1200 which are not compatible with an X-linked inheritance pat- apart, Minor Allele Frequency, MAF = 0.25) whose allelic tern, chromosome X was not analyzed. status fully segregated with dyslexia status (i.e., inherited Multipoint parametric linkage analysis was performed by all 13 afected individuals but none of the 13 unafected/ using a dominant inheritance model, assuming a disorder unknown individuals) and ran parametric linkage analyses allele frequency of 1% and penetrances of 5%, 99%, and in an identical manner to the real data. In this scenario, we 99% for wild-type, heterozygous, and homozygous muta- obtained a maximum LOD score of 5.3, clearly in excess of tion carriers, respectively. Given the uncertainty in defning the empirical threshold for genome-wide signifcant linkage. a parametric model for this trait a priori, we also conducted non-parametric (NPL) multipoint linkage analysis, which Haplotype analysis does not rely on assumptions regarding monogenic inherit- ance, estimates of penetrance levels, and phenocopy rates To give further insights into regions showing strongest link- (Fisher and DeFries 2002). Note that the NPL approach age to the trait, haplotypes were generated from SNPs cov- uses asymptotic large-sample assumptions to compute test ering the genomic region that showed multipoint LOD > 2. statistics and approximate p values, so this method may Haplotypes were created in the haplotype analysis tool not be optimal for single-pedigree analysis (McPeek 1999; simwalk2snp (Lange and Lange 2004) and visualized using Lange and Lange 2004). The NPL tests only for an increase Haplopainter (Thiele and Nurnberg 2005). in Identity-by-Descent allele sharing in afected individu- als, without specifying a parametric model, but the lack of Whole‑genome sequencing specifcation is accompanied by a loss of power. Multipoint parametric linkage analyses were performed Genomic DNA samples collected from 7 members (5 using the MORGAN (v3.2) programs lm_bayes (paramet- afected and 2 unafected, see Fig. 1) of family 352 were ric) (George et al. 2005) and lm_ibdtests (non-parametric) used for WGS by the genomics research organization and (Basu et al. 2008), with 30,000 Monte Carlo iterations, and service company ‘Novogene’ (Hong Kong) using Illumina’s permutations for the non-parametric run. The NPL score was HiSeq Xten technology (Illumina, Illumina X-ten, 2016). computed as the − ­log (permuted p value). 10 The sample selection was motivated by the individuals’ We performed sensitivity analyses to confrm that the quantitative trait scores (i.e., taking those most severely results were robust to different analytical options used afected or obviously unafected), and it was constrained by within the parametric linkage analyses: the specifc SNP- cost. One unrelated unafected individual (Fig. 1, II.14) was map selected (minimum distance between markers 0.3 cM also selected for WGS to help with fltering out possible or 0.5 cM) and the level of penetrances (0.05, 0.90, 0.90; sequencing artifacts. Sequencing was done at 30 times aver- 0.05, 0.95, 0.95). age coverage depth with library insert size of 350 base pairs and reads of 150 base pairs long (paired-end). Estimation of genome‑wide signifcant linkage threshold and maximal possible LOD score Structural variant calling

To derive an empirically determined threshold for genome- Several signals in WGS data (e.g., read-pairs, split-reads, wide signifcant linkage, we carried out simulations, using and read-depth) can indicate the presence of structural vari- the same SNPs that were included in the real linkage analy- ants (SVs) (Mohiyuddin et al. 2015). Genotyping intensity sis. Permutations were performed using gene-dropping sim- data from SNP arrays can also be used to detect CNVs. ulations, as implemented in MORGAN (v3.2). 1000 repli- Hence, we screened for diferent types of SVs using all the cates were simulated, and each replicate was analyzed with available data. the same parametric model as the real data using lm_bayes. The signifcance for each LOD score was assessed by: (1) SVs from WGS counting the number of replicates (n) in which the maximum LOD score exceeded the highest observed LOD score and WGS data were used to call SVs, including deletions, duplica- (2) calculating the p value as (n + 1)/1001. The threshold tions, and inversions. SVs were identifed using three variant for genome-wide signifcant linkage, taken to be the 49th calling programs: CNVnator (Abyzov et al. 2011), Break- highest LOD score of the 1000 replicates, was estimated as Dancer (Chen et al. 2009), and Lumpy (Layer et al. 2014). 3.44. We also estimated the maximal possible LOD score Overlapping calls from the three SV detectors were then that could be obtained in this family assuming an etiologic combined within samples using MetaSV (Mohiyuddin et al. variant that cosegregates perfectly with the trait. To do so, 2015), considering as high-confdence those SVs that were we simulated a set of 10 consecutive biallelic markers (1 cM detected by at least two of the tools. The seven WGS samples

1 3 Human Genetics (2021) 140:1183–1200 1187 had on average 311,500 SV calls (range 310,900–311,900), of lengths, and reads with unknown bases for more than 10% which on average 1915 (range 1855–1949) had been called by of their lengths. Clean reads comprised 97% of total reads, at least two diferent callers. SV calls across all samples were and were mapped onto the human reference genome (hg19) combined using the R package intanSV (Jia et al. 2020) by using the software Burrows–Wheeler Aligner (BWA) (Li merging the calls that had been made in at least two samples and Durbin 2009). Bam fles were sorted using SAMtools with a reciprocal coordinate overlap larger than 10%. There (Li et al. 2009) and PCR duplicate reads were marked using were a total of 26,799 overlapped calls, 22,954 deletions, 3105 Picard (Picard 2016). The mean sequence read length was duplications, and 740 inversions. 150 base pairs, and approximately 97.5% of the genome was The SV calls were compared against the Database of covered by at least a ten times sequence read depth. Genomic Variants (DGV, release July 2015; downloaded Re-alignment around indels (insertion/deletions) and from UCSC genome browser hg19, February 2016) to base quality control recalibration were performed using the annotate them as rare and common. Those that overlapped Genome analysis toolkit software (GATK v3.4) (McKenna by > 50% of their lengths with fve or fewer CNV events et al. 2010; DePristo et al. 2011). in the DGV were considered rare, while if they overlapped by > 50% of their lengths with more than fve CNV events Variant calling and annotation in the DGV, they were considered common. Those that did not overlap with any CNV in the DGV were classed as novel. Since more accurate variant calls can be achieved by includ- We excluded SVs as being potentially causative for the ing data from larger numbers of subjects simultaneously, phenotype when they were present in the unafected indi- we ran this process by pooling our data from family 352 vidual II.14 (married into family 352), or when they were together with 54 additional samples from diferent projects common variants (as defned above) in the DGV database. that had been sequenced with the identical protocols and in IGV (Thorvaldsdottir et al. 2013) was used to visually evalu- the same batches. ate putative SVs. Genetic variants were called using the HaplotypeCaller (HC) tool of GATK (McKenna et al. 2010; DePristo et al. 2011). HC was run separately per sample using the ‘-ERC SVs from SNP microarrays GVCF’ mode, and then merged together using the Geno- typeGVCFs tool, as recommended by the GATK best prac- PennCNV (Wang et al. 2007) was used to detect CNVs from tices (GATK v3.4). We performed Variant Quality Score the signal intensity data. This program uses the normalized Recalibration (VQSR) to exclude the low-quality variants intensity data (Log R Ratio, and B Allele Frequencies) for (phred-scaled Qscore < 30) and to fag the rest into the sen- SNP and CNV probes to identify putative CNVs using an sitivity tier they fell into (90, 99, 99.9, and 100). The variant HMM. For this analysis, we used default HMM parameters, calling of SNPs and indels identifed on average 4,523,372 as well as the PFB (Population Frequency of B allele) and per sample (range: 4,455,342–4,581,631), for a total of GC fles provided with the program (hhall.hg18.pfb, hhall. 14,980,000 diferent variants across the 61 samples. These hg18.gcmodel), since our sample was too small to directly variants were then annotated using Annovar (Wang et al. estimate these parameters from it. We used the joint calling 2010) and Variant Efect Predictor (VEP v37) (McLaren option to take advantage of the family structure. Since only et al. 2010). a trio-structure can be specifed, a separate trio was defned for each non-founder sample. As a result, there were 1947 Investigation of novel and rare variants CNV calls, and each sample had on average 92.7 calls (range 31–211). CNV call coordinates were then lifted to hg19 Imputation genome reference to relate them to the rest of the results. We fltered out common CNVs (as defned for the DGV, see To obtain genotype information on the family members above) and CNVs that fell outside of the linked genomic that had not been sequenced, all novel and rare variants regions (multipoint LOD > 1 or multipoint NPL > 1). from WGS (i.e., having less than 1% frequency in the 1000 Genomes database, total = 306,597) were subjected to ped- igree-based imputation. Gl_auto program from the MOR- WGS data processing GAN (v3.2) framework (Wijsman et al. 2006; Thompson 2011) was used to sample inheritance vectors from a Markov Alignment and pre‑processing chain Monte Carlo analysis of the multilocus marker data (i.e., the 7338 SNPs that went into the linkage analyses). Raw reads were cleaned by excluding adapter sequences, The inheritance vectors were used to impute genotype calls reads with low-quality bases for more than 50% of their by GIGI (Cheung et al. 2013). The genotype probability

1 3 1188 Human Genetics (2021) 140:1183–1200 threshold to call two alleles was set to 80%, and the thresh- selection (i.e., became fxed in the human lineage) with old to call a single allele to 90%. The mean imputation rate simulated mutations (Kircher et al. 2014), in such a way that was 51.58%. When the imputation resulted in haploid calls variants with higher scores are likely to have been selected (i.e., only one allele imputed), we assumed that the unknown against (given their annotation pattern). The GWAVA score allele was the reference allele, since only rare variants were (Ritchie et al. 2014) uses similar sources of annotation, to being imputed (The downstream analysis software to evalu- discriminate between disease-causing (i.e., pathogenic) and ate cosegregation of variants with dyslexia in the family control variants, and to apply this information to weight required diploid genotypes). variants across the genome. We also considered noncoding variants based on the FIRE (Functional Inference of Regula- Filtering of variants as potentially causative tors of Expression) score (Ioannidis et al. 2017), which is predictive of cis expression quantitative trait loci (eQTL), 7,805,631 variants had non-reference calls across the seven but does not rely on conservation information as GWAVA sequenced family members. The fltering strategy is shown and CADD do. Thresholds were: GWAVA­ unmatched > 0.5, in Fig. S1. First, we excluded variants if they were absent ­CADDphred > 15; FIRE > 0.6, the mean FIRE score value of from all individuals with dyslexia, or if they were present autosomal cis-eQTL SNVs (Ioannidis et al. 2017). A full in both unafected members who had WGS data (2,707,476 list of all 292 variants within the 7q21.11 linked region is variants kept). We then excluded common variants by flter- presented (see “Results”), regardless of the availability of ing out any with reported MAF > 1% in 1000Genomes data- these scores. base (1000G), Exome Sequencing Project (ESP), and Exome We further assessed the putative functional impacts of the Aggregation Consortium’s (ExAC) European populations prioritized variants by checking available databases on regu- (470,596 variants kept). Then, variants were excluded which latory regions of the genome [RegulomeDB (Boyle et al. fell outside genomic regions where linkage analysis found 2012) and HaploReg (Ward and Kellis 2012)]. LODs or NPL scores exceeding 1, retaining 8528 variants. The use of a loose threshold for this variant fltering step Association analysis allows for a thorough assessment that does not prematurely exclude potentially causative variants which imperfectly We performed a family-based test of allelic association cosegregate with dyslexia in the family. As a fnal flter, we (MQLS) using the MQLS-XM package (Thornton and McPeek retained only those variants that were present in more than 2007; Thornton et al. 2012), which does not assume a model. 60% of the individuals with dyslexia (2276 variants kept). Rare variants (as defned above) located within genomic Imputed genotypes were used for this fltering step. The fl- regions with multipoint LOD > 1 or NPL > 1 were included. tered set of variants was then queried under diferent hypoth- Best-guess imputed genotypes were used for these analyses. eses considering coding and noncoding variation (Fig. S1): (i) We identifed two exonic variants, and excluded one UK Biobank brain phenome scan annotated as a synonymous variant by both Annovar and VEP. (ii) Then, we identified 2274 noncoding variants, To characterize the phenotypic spectrum associated with which we evaluated taking into account several scores variation at rs144517871, we conducted a phenome scan (GWAVA, CADD, FIRE, see below) that aim to aid inter- on 870 brain MRI phenotypes in the UK Biobank cohort, pretation of noncoding regions of the genome. Note that using PHESANT software (v0.13) (Millard et al. 2017). precomputed scores are only available for SNVs and some Analyses were restricted to participants of UK ancestry indels, but not all noncoding indels could be evaluated by (UK Biobank specifed variable) with brain imaging data. these scores. Unscored variants (559/2274) for which nei- Individuals with high missingness (missing rate > 0.05), ther CADD, GWAVA, nor FIRE metrics were available were heterozygosity (PC corrected heterozygosity > 0.19), gender fltered out. Among the noncoding variants, we frst con- mismatch, and putative aneuploidies were excluded (Bycroft sidered those that were predicted to be likely pathogenic, et al. 2018), as well as one from each pair of related indi- based on GWAVA (Genome-Wide Annotation of Variants) viduals (i.e., with a kinship coefcient > 0.0442 as defned (Ritchie et al. 2014) and CADD v1.6 (Combined Annotation within the UK Biobank relatedness fle). Genotype dosage Dependent Depletion) (Kircher et al. 2014; van der Velde for rs144517871 was converted into best-guess genotypes et al. 2015) annotations. These two scores consider infor- using PLINK v1.90b3w (Purcell and Chang 2020; Purcell mation on potential regulation of expression and evolution- et al. 2007). 33,441 individuals with non-missing genotypes ary conservation, and each uses diferent algorithms (and (genotype AA = 32,689, CA = 749, CC = 3) were included assumptions) to evaluate the pathogenicity or functional in the analysis. importance of variants. The CADD metric measures del- Linear regressions were fitted to test the association eteriousness by contrasting variants that survived natural between genotypes of rs144517871 (AA = 0, CA = 1,

1 3 Human Genetics (2021) 140:1183–1200 1189

CC = 2) and continuous brain measures outcomes. Analyses Relative luciferase expression was normalized to the control were adjusted for the following covariates: age at imaging and results were presented as relative response ratios com- (UKB feld ID = f.21003_2_0), ­age2, sex (f.31_0_0), geno- pared with the control reporter construct. Experiments were typing array (f.22000_0_0), 40 genetic principal compo- performed in triplicate and were repeated independently. nents (f. 22009_0_1 to 22009_0_40), imaging assessment For each experiment, conditions were statistically com- center (f.54_2_0), total gray matter volume (f. 25006_2_0), pared: frst, the minimal-promoter control vs the two allelic inverted signal-to-noise ratio in T1 (f.25734_2_0), inverted conditions, and next the two allelic conditions (common contrast-to-noise ratio in T1 (f. 25735_2_0), scanner lat- rs144517871-A and rare rs144517871-C) using an independ- eral (X) brain position (f.25756_2_0), scanner transverse ent sample t test and a non-parametric Mann–Whitney–Wil- (Y) brain position (f.25757_2_0), scanner longitudinal coxon rank test in R. We cannot verify the assumption of (Z) brain position (f.25758_2_0), and scanner table posi- normality for the t test in these small numbers, but note tion (f.25759_2_0). A conservative Bonferroni threshold that t tests are in principle robust in small samples if the was applied accounting for a total of 856 tests performed assumptions are met. The non-parametric test provides an (p < 5.74 × ­10−5). The UK Biobank data were obtained as exact p value and is appropriate for relatively small numbers part of research application 16066. The data collection for of observations in a given comparison, while the t test can the UK Biobank has been described elsewhere (Sudlow et al. have greater power when its assumptions are met. Results 2015). Informed consent was obtained by the UK Biobank for all performed tests are reported. for all participants.

Developmental expression pattern Results

The spatio-temporal expression patterns of SEMA3C dur- We studied a large three-generation family (Fig. 1), in which ing human development were characterized by querying the almost half of the individuals were afected with dyslexia. BrainSpan database (Miller et al. 2014). The inheritance pattern was consistent with autosomal dominant Mendelian transmission, with afected members Luciferase reporter assay in every generation, 14 cases of dyslexia among the thirty family members, and three instances of male-to-male trans- An 1814 bp region of the SEMA3C promoter (from − 27 bp mission. Simulations confrmed that for a family with this of the transcription start site to + 1244 bp of the frst intron) structure, the number of informative meioses is more than containing the major allele of rs144517871 (A) was ampli- sufcient to detect genome-wide signifcant linkage, in the fied from genomic DNA using the following primers: ideal case of a risk allele that perfectly cosegregates with the 5′ggtacccacccagcaagttgctcactcc3′ and 5′agatctgagacaggtgt- trait (see “Materials and methods”). cactgcttc3′. This construct was subcloned upstream of the luciferase reporter gene in the promoterless frefy luciferase Linked locus in chromosome 7q21 vector pGL4.23 (Promega) between KpnI and BglII restric- tion sites. The rare rs144517871-C variant construct was Multipoint linkage analysis using the SNP-chip data that generated by site-directed mutagenesis using the Quick- we obtained from the available relatives did not detect a Change Site-Directed Mutagenesis kit (Stratagene) follow- marker or haplotype that perfectly cosegregated with dys- ing the manufacturer’s protocol with the following prim- lexia, arguing against there being a simple Mendelian expla- ers: 5′ttgtgttggtaattttcaaaagcaacagtttcttcctgcttccac3′ and 5′ nation for all the cases observed in the family. Indeed, no gtggaagcaggaagaaactgttgcttttgaaaattaccaacacaa3′. All con- locus exceeded the genome-wide signifcance threshold for structs were verifed by Sanger sequencing. linkage (LOD of 3.44). The highest observed parametric Luciferase assays were performed as described previ- LOD score was 2.83 on 7q21.11 (approximate 95% CI (Con- ously (Estruch et al. 2016). In brief, HEK293T and HeLa neally et al. 1985) ­LODmax − 1: chr7:80197286–83403157, cells were seeded in clear-bottomed white 96-well plates hg19, see Fig. 2), in a region that encompasses several genes and transfected in triplicate with equimolar concentrations (CD36, SEMA3C, LOC100128317, HGF, CACNA2D1, of reporter constructs [5 ng of control reporter construct LOC101927356, PCLO). Sensitivity analyses show that this (pGL4.23) or 7.15 ng SEMA3C promoter-reporter con- is a robust signal across analysis options (Fig. S2). Analyses struct (common: rs144517871-A or rare: rs144517871-C)], of marker haplotypes for 7q21.11 showed that this linked together with 4.96 ng of pRL-TK Renilla luciferase normali- region was shared among 11 of the 13 afected family mem- zation control (Promega). After 48 h, luciferase activity was bers, and none of the 13 unafected/unknown members, measured in a TECAN F200PRO microplate reader using for which DNA was available (Fig. 3). All chromosomal the Dual-Luciferase Reporter Assay System (Promega).

1 3 1190 Human Genetics (2021) 140:1183–1200

(a) 3

2

1 LOD 0

−1

−2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chromosome (b) 80 mb 82 mb 81 mb 83 mb 2 1

LOD 0 −1 GNAI1 CD36 LOC105369146 PCLO GNAI1 CD36 LOC100128317 PCLO LOC101927269 LOC100128317 SEMA3E GNAT3 HGF LOC101927356 SEMA3E CD36 HGF LOC101927356 CD36 HGF CACNA2D1 CD36 HGF CD36 HGF CD36 CACNA2D1 refGene CD36 CD36 SEMA3C SEMA3C SEMA3C

Fig. 2 a Parametric multipoint linkage analysis across the genome. positions 80–83 Mb on reference genome assembly build GRCh37/ are represented along the X axis in numerical ascend- hg19. The red rectangle highlights regions showing multipoint LOD ing order from left to right, and from their p to q arms. The Y axis scores above 1.8 (LOD­ max − 1). RefGene gene transcripts within the shows the multipoint parametric linkage LOD score. b Genes under region are shown on the lower track the highest linkage peak on chromosome 7q21.11, spanning physical regions showing LOD or NPL scores exceeding 1 are shown (ENST00000360467.5:p.H203L) is within the COL4A2 in Table S1. gene and is predicted to be benign/tolerated across pre- diction tools (Table S3). The number of variant carriers No structural or coding variants cosegregate among the sequenced family members was: 4/5 afected and with dyslexia in the family 0/2 unafected, and in the imputed family members: 8/10 afected and 1/3 unafected, which yields an MQLS associa- Based on analyses of SNP-chip and WGS data, there were tion p value of 0.11 (Table S3). COL4A2 encodes a subunit no high-confdence SV or CNV calls that cosegregated with of type IV collagen, a major structural component of base- dyslexia or fell within regions defned by NPL or LOD > 1 ment membranes. (see Table S2 for a summary of low confdence calls within the suggestively linked regions). Filtering of WGS data on MAF of the variants in pub- Noncoding variants within the 7q21 linked locus lic databases (1000G, ESP, and ExAC), and exclusion of synonymous variants yielded only one coding variant Table S4 shows all variants within the main linked region in (chr13:111082914-A-T rs201716258, Table S3) located chromosome 7q21.11 (292 variants). within the 13q34 NPL > 1 region. This missense variant

1 3 Human Genetics (2021) 140:1183–1200 1191

Fig. 3 Haplotype visualization of the linked genomic region on chro- Individuals II.8 and III.4 appear to be phenocopies (i.e., they have a mosome 7q21.11. Individuals without a DNA sample are indicated by diagnosis for dyslexia, but do not carry the risk haplotype in 7q21). the # symbol. The putative ‘risk haplotype’ is shown in red. Upper Carrier status for rare minor allele of rs144517871 (confrmed via and lower boundaries of the critical recombinations are marked Sanger sequencing) is indicated above each individual, with −/− within a box for all members with dyslexia (rs819457–rs6949364). denoting wild type; +/− denoting heterozygous carrier

There were 18 noncoding or synonymous variants within and in perfect linkage disequilibrium (LD) (r2 = 1; 1000G putatively linked genomic regions that were likely to be eur RegulomeDB v3, Fig. S3) (Boyle et al. 2012), located pathogenic based on aggregated annotation scores (CADD within the frst intron of the SEMA3C gene (Fig. 4a). The or GWAVA) (Table 1). Two of these variants (rs144517871 frequencies of both of the observed variants were 0.0031 in and rs143835534) fell within the most strongly linked region the 1000G overall population and slightly higher (0.0099) in on chromosome 7q21.11. The two SNVs are 184 bp apart, the European subsample, as well as in a representative Dutch

1 3 1192 Human Genetics (2021) 140:1183–1200 Linked region Linked NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 NPL > 1 > 1 LOD > 1 LOD 0.30 Imp Rate Imp 0.30 0.30 0.11 0.30 0.30 0.30 0.30 0.19 0.19 0.30 0.48 0.48 0.48 0.48 0.48 0.41 0.52 0.52 – 0.00 Unafected 0.00 0.50 0.00 0.00 0.00 0.00 1.00 1.00 0.50 0.00 0.00 0.00 0.00 0.67 0.00 0.00 0.00 0.67 Afected 0.67 1.00 1.00 0.67 0.67 0.67 0.67 1.00 1.00 0.83 0.70 0.70 0.70 0.70 0.80 0.67 0.90 0.90 QLS – – 0.07 0.07 M 0.02 1.00 0.07 0.07 0.07 0.07 0.23 0.23 0.23 0.23 0.23 0.19 0.51 0.02 0.02 – 0.41 0.29 FIRE 0.55 0.56 0.51 0.52 0.55 0.53 0.55 0.56 0.48 0.63 0.36 0.42 0.40 0.08 0.38 0.45 – – – – – – – – – – – 0.64 0.84 GWAVA 0.51 0.65 0.63 0.21 0.82 0.74 5.78 6.48 0.48 2.97 3.10 15.51 15.67 15.92 CADD 16.84 16.78 15.74 16.05 15.63 16.09 21.80 15.32 15.41 15.17 17.96 1000G MAF 0.006 0.006 0.001 0.005 0.007 0.007 0.007 0.0099 0.0099 Func intronic ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA ncRNA intronic ncRNA intronic intergenic intergenic intergenic intronic intronic intergenic intronic intronic LINC01247 MIR7515HG Gene GNG12 GNG12-AS1 GNG12-AS1 GNG12-AS1 GNG12-AS1 GNG12-AS1 GNG12-AS1 GNG12-AS1 GNG12-AS1 GNG12-AS1 LRRC7 SOX11, LINC01105 SOX11, LOC400940, LINC01246, LINC01246, RNF144A SRGAP3 PDHA2, STPG2-AS1 SEMA3C SEMA3C snp138 rs111907541 rs111472994 rs17842969 rs17090698 rs189591558 rs145390307 rs146498852 rs142428415 rs141684587 rs35966925 rs144517871 rs143835534 cytoBand 1p31.3 1p31.3 1p31.3 1p31.3 1p31.3 1p31.3 1p31.3 1p31.3 1p31.3 1p31.3 1p31.1 2p25.2 2p25.2 2p25.2 2p25.1 3p25.3 4q22.3 7q21.11 7q21.11 Alt C A G C G A A T A C A A C C G A C C C Ref T G C T T G G A T T G G T T A – T A T value. ImpRate: imputation rate across the members of family 352 without sequencing data. Afected, unafected/unknown: frequency of variant carriers of variant frequency data. type 352 without Afected, unafected/unknown: sequencing of each the imputation ImpRate: across rate members of family : association p value. QLS Rare, noncoding variants mapping within genomic regions and predicted by CADD/GWAVA to be pathogenic to CADD/GWAVA by and predicted regions mapping within noncoding variants Rare, genomic ­ M Start 68296119 68300052 68627652 68627723 68628015 68628019 68628027 68628031 68628752 68628754 70507620 5865630 6485809 6776265 7150786 9079126 98275548 80547391 80547575 in family 352. The frequency of carriers is given for the imputed genotypes within each group within each the genotypes imputed of carriers 352. The frequency for is given in family 1 Table Chr Genomic regions of interest are defned in Table S1. Min: minor allele. MAF: minor allele frequency in the 1000G (eur) population. CADD: deleteriousness score. GWAVA: functional impact impact functional deleteriousness GWAVA: score. thein CADD: population. (eur) 1000G frequency allele minor MAF: allele. minor Min: Table S1. in defned are interest of regions Genomic score. 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3 4 7 7

1 3 Human Genetics (2021) 140:1183–1200 1193 population (rs144517871 = 0.010 and rs143835534 = 0.009) of rs144517871, or a control reporter vector in which (Genome of the Netherlands, http://​www.​nlgen​ome.​nl/​ the luciferase gene was placed upstream of a minimal search/) (Francioli et al. 2014). The imputed genotypes pre- promoter. dicted that both would cosegregate with dyslexia within the The promoterless condition showed lower relative lucif- family, except for two putative phenocopies: II.8 and III.4 erase activity compared to the two allele conditions (A (i.e., matching the inheritance of the risk haplotype, Fig. 3). and C), indicating that the SEMA3C promoter region is We further validated this fnding in all the available DNA sufcient to increase luciferase gene expression (Tables samples by Sanger sequencing of PCR products spanning S6 and S7; HeLa t test p values < 0.001 and Mann–Whit- rs144517871 (Fig. 4c). Each SNV yielded an MQLS asso- ney–Wilcoxon U test p values = 0.02; HEK t test p val- ciation p value of 0.02. The relationships of rs144517871 ues = 0.0003 and 0.07 and Mann–Whitney–Wilcoxon U with word and non-word reading fuency (Fig. 5) showed test p values = 0.02 and 0.17). We observed an increase that risk-allele carriers performed more poorly overall than of activity relative to the common variant when the non-carriers on both measures, although this is confrma- rare rs144517871-C allele was present in HeLa (Fig. 6, tory and not independent evidence, as these measures were Table S6 and S7) and HEK cell lines (Fig. S4, Table S6 used to defne cases and controls in the family (“Materials and S7) (all t test p values < 0.05; all Mann–Whitney–Wil- and methods”). coxon U test p values = 0.1). SEMA3C is most highly expressed during early brain development in cortical regions, with expression levels decreasing after birth (Miller et al. 2014). We investigated Discussion potential associations of rs144517871 allelic variation with aspects of brain structure and function in adults from In the present study, we adopted a strategy of combining the general population, by performing a phenome-wide linkage analysis with WGS to identify genetic variants that scan of 856 magnetic resonance imaging (MRI) pheno- may contribute to dyslexia in a large three-generation fam- types in the UK Biobank (N = 33,441), as implemented ily in which around half of the relatives are afected. in PHESANT (Millard et al. 2017). No associations were Parametric multipoint linkage analysis under a domi- found that were signifcant after multiple comparison cor- nant mode of inheritance in family 352 identified the rection; the strongest evidence was found for the supe- strongest evidence for a region on chromosome 7q21.11 rior part of the precentral sulcus in the left hemisphere (LOD = 2.83). With the exception of two afected mem- (“Area of S-precentral-sup-part (left hemisphere)”), with bers that were putative phenocopies, the identifed risk the minor allele of rs144517871 (C) showing association haplotype cosegregated perfectly with dyslexia status with a decrease in the cortical surface area of this region in the family: it was found in a heterozygous state in (beta = − 0.018, nominal p = 4.9 × ­10–4). 11 members with dyslexia while being absent from 11 As a complementary approach to assess putatively members without dyslexia and two of unknown phe- functional noncoding variants, we checked variants with notype. The linked region encompassed several genes a high FIRE score, which predicts cis-eQTL function with- (CD36, SEMA3C, LOC100128317, HGF, CACNA2D1, out making assumptions about pathogenicity. Thirty-four LOC101927356, PCLO). Within this region, there were noncoding variants within regions LOD > 1 or NPL > 1 two rare (MAF < 0.01), noncoding SNVs (rs144517871 were likely to regulate gene expression, with a FIRE score and rs143835534) predicted to have functional efects, larger than 0.6 (Table S5). Of these, none fell within the located in the frst intron of the gene SEMA3C, and in high main linked locus (7q21.11), and none was associated with LD with each other. These variants cosegregated perfectly dyslexia status (MQLS p > 0.05, Table S5). with the risk haplotype. In silico characterization showed that rs144517871 afects an evolutionarily constrained regulatory region in the frst intron of SEMA3C, located Rs144517871 modulates gene expression between exons that are part of the 5′ untranslated region in a reporter assay using human cell lines of the transcript. This SNP can act also as a cis-eQTL for SEMA3C (GTEx Consortium 2015). To functionally To determine if allelic variation of rs144517871 has test whether such an efect on gene regulation exists, we potential to afect the transcriptional regulatory activity performed luciferase reporter gene assays; in this way, we of SEMA3C, we performed a luciferase reporter assay in showed that this region increases gene expression and that diferent cell lines. HeLa or HEK cells were transfected its regulatory activity is modulated by the allelic status with a reporter construct in which expression of the lucif- of rs144517871, with the rare variant resulting in signif- erase gene was driven by a region of the SEMA3C pro- cant increases in expression levels. These putative efects moter containing either the common (A) or rare (C) allele on expression are subtle and should be validated further

1 3 1194 Human Genetics (2021) 140:1183–1200

a) 5' 80.4 mb 80.5 mb 3' 80.45 mb 80.55 mb

NM_001350120

NM_001350121 SEMA3C transcripts

NM_006379

b) 10 bases hg19 80,547,380 80,547,385 80,547,390 80,547,395 80,547,400 GGTAATTTTCA AAAG AAACAGTTTCTTCCT

Simple Nucleotide Polymorphisms (dbSNP 146) rs144517871 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) SEMA3C SEMA3C SEMA3C

H1-hESC Chromatin State Segmentation by HMM from ENCODE/Broad 6_Weak_Enhancer HepG2 Chromatin State Segmentation by HMM from ENCODE/Broad 11_Weak_Txn HUVEC Chromatin State Segmentation by HMM from ENCODE/Broad 12_Repressed HMEC Chromatin State Segmentation by HMM from ENCODE/Broad 1_Active_Promoter NHEK Chromatin State Segmentation by HMM from ENCODE/Broad 1_Active_Promoter DNaseI Hypersensitive Site Master List (125 cell types) from ENCODE/Analysis Master DNaseI HS 4.88 _ 100 vertebrates Basewise Conservation by PhyloP 100 Vert. Cons

-4.5 _ Multiz Alignments of 100 Vertebrates Gaps Human GGTAATTTTCA AAAGAAACAGTTTCTTCCT Chimp GGTAATTTTCA AAAGAAACAGTTTCTTCCT Gorilla GGTAATTTTCA AAAGAAACAGTTTCTTCCT Orangutan GGTAATTTTCA AAAGAAACAGTTTCTTCCT Gibbon GGTAATTTTCA AAAGAAACAC TTTCTTCCT Rhesus GGTAATTTTCA AAAGAAACAGTTTCTTCCT Mouse GGTAATTTTCA AAAGAAACAGTTTCTTCCT Dog GGTAATTTTCAAA C G AAACAGTTTCTTCCT Elephant GGTAATTTTCA AAAGAAACAGTT C CTTCCT Armadillo GGTAATTTTCA AAAGAAACAA TTTCTTCCT Lamprey 1 _ 100 vertebrates conservation by PhastCons 100 Vert. Cons 0 _ 100 vertebrates Conserved Elements lod=80 lod=132

c)

III.2

III.3

II.10

II.1

I.1

1 3 Human Genetics (2021) 140:1183–1200 1195

◂Fig. 4 a Chromosome 7 genome axis and SEMA3C gene transcripts. S3 and S5). We checked whether either of the two vari- The red bar indicates the genomic region where rs144517871 and ants (or any other variant in high linkage disequilibrium, rs143835534 are located. b Detailed annotation of the genomic region around rs144517871 (black square) using the UCSC Genome Fig. S5) was predicted to act as an expression quantita- Browser (hg19). Tracks are included for ENCODE digital DNaseI HS tive trait locus (eQTL) in public databases [Braineac and hypersensitivity clusters, ENCODE/Broad chromatin state segmenta- Genotype Tissue Expression, GTEx, V7 Ramasamy et al. tion by Hidden Markov Model (HMM) in several cell lines, as well 2014; GTEx Consortium 2015)]. Both variants predicted as 100 Vertebrate consevation scores (PhyloP, PhansCons, Conserved SEMA3C p elements) and sequence alignment (Multiz Alignments of 100 Ver- the expression of in blood (uncorrected value tebrates). c Sanger sequencing chromatograms of rs144517871 for 0.0021). representative wild-type (A/A; individuals III.3, II.10) and risk-allele The interpretation of functional annotation of variants carrier (A/C; individuals III.2, II.1, I.1) family members outside protein-coding regions is complex. In this study, we used CADD and GWAVA summary scores (Kircher using independent in vitro and in vivo systems. Never- et al. 2014; Ritchie et al. 2014) that integrate information theless, since dyslexia is a complex and relatively subtle from diferent sources at the DNA and protein levels and phenotype, genetic contributions to it need not have severe genomic properties to rank variants according to expected consequences at the molecular/cellular level. A regulatory deleteriousness (CADD) or pathogenicity (GWAVA). We mechanism that afects the abundance of mRNA/protein, took advantage of these metrics to rank the variants, as a rather than disrupting the function of the encoded pro- proxy for afecting gene function, but it should be noted that tein itself, is biologically plausible. These results are in since dyslexia occurs in otherwise healthy individuals, and is line with the previous studies in which DNA variations unlikely to reduce the chance of having children, we cannot potentially associated with dyslexia were shown to afect assume that causal variants have been subjected to negative the expression of proximal genes (Hannula-Jouppi et al. selection. Nevertheless, a variant that increases the risk of 2005; Dennis et al. 2009). We performed a phenome scan a trait like dyslexia in the dominant form (i.e., when het- of MRI-based brain measures using available UK Biobank erozygous), could lead to a more severe disorder in recessive data from healthy adults, but did not observe signifcant form (i.e., when in a homozygous state or as a compound associations that survived correction for the number of heterozygote with another deleterious variant of the same tested phenotypes. gene). Variants may also have pleiotropic efects, impacting The rs144517871 variant is located within a predicted multiple traits with diferential severity. Both tools predicted regulatory region according to the EnsemblRegula- that rs144517871 has functional efects, which we were able tory Build (Zerbino et al. 2015) (Fig. 4b), and has high to validate experimentally by performing luciferase reporter scores for potential functional impact, CADD = 17.96 and assays in human cell lines. GWAVA = 0.82. CADD and GWAVA aggregate annota- SEMA3C encodes a class III semaphorin. These secreted tions on potential regulation of expression and evolution- bind to plexin and play an important role in the ary conservation to evaluate the pathogenicity or func- regulation of developmental processes, including provid- tional importance of variants (with CADD > 15 considered ing guidance cues to migrating cortical neurons (Chen et al. to be deleterious, and GWAVA > 0.5 pathogenic). The 2008; Van Battum et al. 2015). In the embryonic mouse FIRE score for rs144517871 was 0.45. Higher FIRE scores brain, Sema3c is transcriptionally repressed by Bcl11a in (range 0–1) indicate that SNVs are more likely to alter the radially migrating neurons (Wiegrefe et al. 2015). Wiegrefe expression levels of nearby genes (Ioannidis et al. 2017). et al (2015) found that homozygous mutant Bcl11a mice According to Haploreg (Ritchie et al. 2014), rs144517871 presented defects in neuronal morphology and neuronal has promoter and enhancer histone marks in multiple tis- migration, and that the mutant phenotype was rescued when sues including brain (17 and 4 tissues respectively; see knocking down Sema3c. Of note, a de novo microdeletion Fig. S5). Importantly, the allelic variation at this site is in the dyslexia susceptibility locus 3 (DYX3) on chromo- predicted to alter several regulatory motifs [Haploreg: some 2p spanning only the BCL11A gene was reported in Pou2f2, Pou5f1, ZEB1, Table S8 (Ward and Kellis 2012); a proband with severe speech sound disorder (Peter et al. Regulome: Pou5f1 (Boyle et al. 2012)] and the site is well 2014). Furthermore, heterozygous de novo mutations in conserved across mammals, with a GERP conservation BCL11A are associated with an intellectual disability syn- score of 4.73. The other variant in perfect LD within the drome involving delayed speech and language, and haplo- SEMA3C intron, rs143835534, has lower functional pre- insufciency for Bcl11a in mice also resulted in postnatal diction scores: CADD = 6.57 (i.e., not deleterious) and upregulation of class III semaphorins in the cortex (Sema3d) GWAVA = 0.74 (i.e., predicted pathogenic). This variant and hippocampus (Sema3e) (Dias et al. 2016). Due to its shares some of the histone marks with rs144517871, and roles in brain development, SEMA3C represents a convinc- is predicted to alter some regulatory motifs, but does not ing candidate gene for susceptibility to dyslexia, a cognitive show evolutionary conservation (GERP = − 1.28) (Fig.

1 3 1196 Human Genetics (2021) 140:1183–1200

Fig. 5 Reading scores for (a) 100 carriers (black, n = 11) and non-carriers (gray, n = 13) of the validated rs144517871 risk 75 allele in family 352. a Percentile scores on word and non-word rs144517871 reading. b Relationship of the percentiles of the word reading 50 -/- and non-word reading tests. +/- Triangles indicate individuals

with a dyslexia diagnosis PercentileScore 25

0 Nonword Word reading fluency reading fluency Test (b) 100 y

75 Control Dyslexic 50 rs144517871

25 -/- +/- Nonword reading fluenc 0 0255075100 Word reading fluency

trait which has been associated with changes in cerebral cor- 3 tical architecture (Giraud and Ramus 2013). 2 1 A previous study identifed linkage with dyslexia span- 1 ning the same location on 7q21.11 (multipoint LOD = 3.08, e microsatellite marker at linkage peak = D7S660) in fami- 0 lies diagnosed with Rolandic epilepsy (Strug et al. 2012). rase activity Relativ 3 SEMA3C was the closest gene to the peak of linkage, and the 2 luci fe 2 authors screened the protein-coding and promoter regions 1 of the SEMA3C gene for mutations in one of the families that contributed most to their linkage LOD score in this 0 control rs144517871−A rs144517871−C study. However, they were unable to identify mutations that cosegregated with dyslexia, and suggested that either HeLa cells intronic regions or other genes could be responsible for the signal. In light of our data from the present study, it could Fig. 6 Luciferase reporter assays HeLa cells. Luciferase reporter vec- be worth further investigating the families studied by Strug tor containing a minimal promoter (control) or SEMA3C promoter containing either the common allele (rs144517871-A) or the rare et al. (2012) with a focus on potential regulatory variation allele (rs144517871-C) were transfected into HeLa cells. Luciferase afecting SEMA3C, to assess whether such variation is rele- expression was measured, and data are shown as relative ratio to the vant to dyslexia in the context of rolandic epilepsy. We could control construct. The two panels (1 and 2) indicate independent not address this issue in the present study, as family 352 was experimental replicates recruited specifcally through a diagnosis of dyslexia.

1 3 Human Genetics (2021) 140:1183–1200 1197

We further examined if any rare coding variants could be editing. SBE: methodology, analysis, and writing—review and edit- contributing to dyslexia in this family. A missense SNV in ing. BM: conceptualization, supervision, funding acquisition, and writ- COL4A2 ing—review and editing. BF: conceptualization, supervision, funding , predicted to be tolerated, was the only coding non- acquisition, and writing—review and editing. CF: conceptualization, synonymous mutation within regions showing an NPL > 1. supervision, writing—original draft, and writing—review and editing. This gene has been associated with “Brain small vessel dis- SEF: conceptualization, funding acquisition, supervision, writing— ease 2”(OMIM 614483), an autosomal dominant cerebrovas- original draft, and writing—review and editing. cular disorder characterized by variable neurologic impairment Funding Open Access funding enabled and organized by Projekt resulting from the disturbed vascular supply that leads to cere- DEAL. AC-C, SEB, CF, and SEF are supported by the Max Planck bral degeneration (Meuwissen et al. 2015). Although a common Society (Germany). This work was also funded by grant 200-62-305 SNP in COL4A2 (rs9521789) was associated with comorbid from the Dutch Organization for Scientifc Research (NWO), division reading disability and language impairment in an early genome- Geesteswetenschappen. BF is supported by a personal Vici grant from NWO (016-130-669). wide association study (Eicher et al. 2013), the SNP showed no association with reading-related quantitative traits in two other Availability of data and materials Anonymized primary data are depos- samples (Eicher et al. 2013; Carrion-Castillo et al. 2016). ited at The Language Archive (TLA: https://​archi​ve.​mpi.​nl/), a public Of note, the variants identifed in the present study are data archive hosted by the Max Planck Institute for Psycholinguistics. not novel mutations, since they are present in the general Data are stored at the TLA and accessible with persistent identifers: https://​hdl.​handle.​net/​1839/​2f897​bc7-​bc62-​4b27-​9fe8-​8d02e​c6c82​ population at low frequencies. Hence, these variants could 2c and https://​hdl.​handle.​net/​1839/​c908d​c8a-​ec12-​4767-​b76d-​1933c​ be considered risk factors that are particularly penetrant 17be0​1c. Access can be granted upon request. in this family, possibly acting in coordination with other genetic and/or environmental efects. Further exploration Code availability Custom codes for the analyses in this study are depos- of “gene × gene” and “gene × environment” interactions ited at The Language Archive (TLA: https://​archi​ve.​mpi.​nl/), a public data archive hosted by the Max Planck Institute for Psycholinguistics. with respect to these variants in large population cohorts in Scripts and output fles are stored at the TLA and accessible with per- future work may help elucidate such mechanistic aspects. sistent identifers: https://​archi​ve.​mpi.​nl/​mpi/​islan​dora/​object/​mpi%​ In summary, a combination of linkage analysis and WGS 3A1839_​fe2ba​831_​974d_​4ec0_​92f1_​41b99​c8259​1a (whole-genome was used to search for rare DNA variants segregating with sequencing—pre-processing and calling) and https://​hdl.​handle.​net/​ 1839/​7ec56​abe-​a2d9-​435d-​bd6c-​4f196​0867b​4e (linkage and NGS dyslexia in an extended Dutch family. This strategy has processed data). Access can be granted upon request. proven to be efective for fnding causative variants involved in several monogenic and more complex traits (Rosenthal Declarations et al. 2013; Norton et al. 2013; Corominas et al. 2018). We identifed a region on chromosome 7q21.11 linked to dys- Conflict of interest The authors declare that they have no confict of lexia, which was concordant with linkage fndings with dys- interest. lexia reported in a previous study of rolandic epilepsy fami- Ethics approval Informed consent was obtained from all participants, lies (Strug et al. 2012). There were no rare or novel exonic and the study was approved by the ethics committee (CWOM) of the variants within this region. Instead, we found that two rela- University Medical Centre Nijmegen under CWOM-nr 9811-025. tively rare (MAF ~ 0.01) noncoding variants within the frst SEMA3C, intron of the gene predicted to have a functional Open Access This article is licensed under a Creative Commons Attri- impact, were present in all but two family members with bution 4.0 International License, which permits use, sharing, adapta- dyslexia, while being absent from all unafected members. tion, distribution and reproduction in any medium or format, as long Histone marking, eQTL data, and experiments with lucif- as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes erase reporter assays, provided evidence that these intronic were made. The images or other third party material in this article are variants could have a cis-regulatory efect on the expression included in the article’s Creative Commons licence, unless indicated levels of the gene. We thus propose SEMA3C as a candidate otherwise in a credit line to the material. If material is not included in gene for dyslexia susceptibility. the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will Supplementary Information need to obtain permission directly from the copyright holder. To view a The online version contains supplemen- copy of this licence, visit http://creat​ iveco​ mmons.​ org/​ licen​ ses/​ by/4.​ 0/​ . tary material available at https://doi.​ org/​ 10.​ 1007/​ s00439-​ 021-​ 02289-w​ .

Acknowledgements Thanks to all of the study participants. Some of the data in this study were obtained from the UK Biobank cohort, as part of research application 16066, with Clyde Francks as the principal References applicant. We are grateful to UK Biobank for making data available. Abyzov A, Urban AE, Snyder M, Gerstein M (2011) CNVnator: an Author contributions AC-C: conceptualization, methodology, analy- approach to discover, genotype, and characterize typical and sis, visualization, writing—original draft, and writing—review and

1 3 1198 Human Genetics (2021) 140:1183–1200

atypical CNVs from family and population genome sequencing. with developmental dyslexia in a Swedish family. Hum Genet Genome Res 21(6):974–984 134(11–12):1239–1248 Bahlo M, Bromhead CJ (2009) Generating linkage mapping fles from Einarsdottir E, Peyrard-Janvid M, Darki F, Tuulari JJ, Merisaari H, Afymetrix SNP chip data. Bioinformatics 25(15):1961–1962 Karlsson L et al (2017) Identifcation of NCAN as a candidate Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson gene for developmental dyslexia. Sci Rep 7(1):9294 DA, Shendure J (2011) Exome sequencing as a tool for Mendelian Eising E, Carrion-Castillo A, Vino A, Strand EA, Jakielski KJ, Scerri disease gene discovery. Nat Rev Genet 12(11):745–755 TS et al (2019) A set of regulatory genes co-expressed in embry- Basu S, Di Y, Thompson EA (2008) Exact trait-model-free tests for onic human brain is implicated in disrupted speech development. linkage detection in pedigrees. Ann Hum Genet 72:676–682 Mol Psychiatry 24:1065–1078 Bishop DV (2015) The interface between genetics and psychol- Estruch SB, Graham SA, Deriziotis P, Fisher SE (2016) The language- ogy: lessons from developmental dyslexia. Proc Biol Sci related transcription factor FOXP2 is post-translationally modifed 282(1806):20143139 with small ubiquitin-like modifers. Sci Rep 6:20911 Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski Fagerheim T, Raeymaekers P, Tønnessen FE, Pedersen M, Tranebjaerg M et al (2012) Annotation of functional variation in personal L, Lubs HA (1999) A new gene (DYX3) for dyslexia is located on genomes using RegulomeDB. Genome Res 22(9):1790–1797 chromosome 2. J Med Genet 36(9):664–669 Brus BT, Voeten MJ (1972) Een-minuut-test [one-minute-test]. Swets Field LL, Shumansky K, Ryan J, Truong D, Swiergala E, Kaplan BJ & Zeitlinger, Lisse (2013) Dense-map genome scan for dyslexia supports loci at 4q13, Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K et al 16p12, 17q22; suggests novel locus at 7q36. Genes Brain Behav (2018) The UK Biobank resource with deep phenotyping and 12(1):56–69 genomic data. Nature 562(7726):203–209 Fisher SE, DeFries JC (2002) Developmental dyslexia: genetic dissec- Carrion-Castillo A, Franke B, Fisher SE (2013) Molecular genetics of tion of a complex cognitive trait. Nat Rev Neurosci 3(10):767–780 dyslexia: an overview. Dyslexia 19(4):214–240 Francioli LC, Menelaou A, Pulit SL, van Dijk F, Palamara PF, Elbers Carrion-Castillo A, van Bergen E, Vino A, van Zuijen T, de Jong PF, CC et al (2014) Whole-genome sequence variation, population Francks C, Fisher SE (2016) Evaluation of results from genome- structure and demographic history of the Dutch population. Nat wide studies of language and reading in a novel independent data- Genet 46(8):818–825 set. Genes Brain Behav 15(6):531–541 George AW, Wijsman EM, Thompson EA (2005) MCMC multilocus Chen G, Sima J, Jin M, Wang KY, Xue XJ, Zheng W et al (2008) lod scores: application of a new approach. Hum Hered 59:98–108 Semaphorin-3A guides radial migration of cortical neurons during Gialluisi A, Newbury DF, Wilcutt EG, Olson RK, DeFries JC, Brandler development. Nat Neurosci 11(1):36–44 WM et al (2014) Genome-wide screening for DNA variants Chen K, Wallis JW, McLellan MD, Larson DE, Kalicki JM, Pohl CS associated with reading and language traits. Genes Brain Behav et al (2009) BreakDancer: an algorithm for high-resolution map- 13(7):686–701 ping of genomic structural variation. Nat Methods 6(9):677–681 Gialluisi A, Andlauer TF, Mirza-Schreiber N, Moll K, Becker J, Hof- Cheung CY, Thompson EA, Wijsman EM (2013) GIGI: an approach to mann P et al (2019) Genome-wide association scan identifes new efective imputation of dense genotypes on large pedigrees. Am J variants associated with a cognitive predictor of dyslexia. Transl Hum Genet 92(4):504–516 Psychiatry 9(1):77 Conneally PM, Edwards JH, Kidd KK, Lalouel JM, Morton NE, Ott J, Gialluisi A, Andlauer TF, Mirza-Schreiber N, Moll K, Becker J, White R (1985) Report of the committee on methods of linkage Hofmann P et al (2020) Genome-wide association study reveals analysis and reporting. Cytogenet Cell Genet 40:356–359 new insights into the heritability and genetic correlates of devel- Corominas J, Klein M, Zayats T, Rivero O, Ziegler GC, Pauper M et al opmental dyslexia. Mol Psychiatry. https://​doi.​org/​10.​1038/​ (2018) Identifcation of ADHD risk genes in extended pedigrees s41380-​020-​00898-x by combining linkage analysis and whole-exome sequencing. Mol Gilissen C, Hehir-Kwa JY, Thung DT, van de Vorst M, van Bon Psychiatry 25(9):2047–2057 BW, Willemsen MH et al (2014) Genome sequencing iden- de Kovel CG, Hol FA, Heister JG, Willemen JJ, Sandkuijl LA, Franke tifies major causes of severe intellectual disability. Nature B, Padberg GW (2004) Genomewide scan identifes susceptibility 511(7509):344–347 locus for dyslexia on Xq27 in an extended Dutch family. J Med Giraud AL, Ramus F (2013) Neurogenetics and auditory processing in Genet 41(9):652–657 developmental dyslexia. Curr Opin Neurobiol 23(1):37–42 den Hoed J, Fisher SE (2020) Genetic pathways involved in human GTEx Consortium (2015) Human genomics. The Genotype-Tissue speech disorders. Curr Opin Genet Dev 65:103–111 Expression (GTEx) pilot analysis: multitissue gene regulation in Dennis MY, Paracchini S, Scerri TS, Prokunina-Olsson L, Knight JC, humans. Science 348(6235):648–660 Wade-Martins R et al (2009) A common variant associated with Hannula-Jouppi K, Kaminen-Ahola N, Taipale M, Eklund R, Nopola- dyslexia reduces expression of the KIAA0319 gene. PLoS Genet Hemmi J, Kaariainen H, Kere J (2005) The axon guidance recep- 5(3):e1000436 tor gene ROBO1 is a candidate gene for developmental dyslexia. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl PLoS Genet 1(4):e50 C et al (2011) A framework for variation discovery and geno- Hildebrand MS, Jackson VE, Scerri TS, Van Reyk O, Coleman M, typing using next-generation DNA sequencing data. Nat Genet Braden RO et al (2020) Severe childhood speech disorder: gene 43(5):491–498 discovery highlights transcriptional dysregulation. Neurology Dias C, Estruch SB, Graham SA, McRae J, Sawiak SJ, Hurst JA et al 94:e2148–e2167 (2016) BCL11A haploinsufciency causes an intellectual disabil- Illumina (2016a) Illumina Human OmniExpress-Exome. Illumina ity syndrome and dysregulates transcription. Am J Hum Genet Human OmniExpress-Exome. http://​www.​illum​ina.​com/​produ​ 99(2):253–274 cts/​infin​ium_​human​omnie​xpres​sexome_​beadc​hip_​kits.​html. Eicher JD, Powers NR, Miller LL, Akshoomof N, Amaral DG, Bloss Accessed 18 July 2014 CS et al (2013) Genome-wide association study of shared compo- Illumina (2016b) Illumina X-ten. Illumina X-ten. http://​www.​illum​ nents of reading disability and language impairment. Genes Brain ina.​com/​syste​ms/​hiseq-x-​seque​ncing-​system.​html. Accessed 6 Behav 12(8):792–801 Feb 2015 Einarsdottir E, Svensson I, Darki F, Peyrard-Janvid M, Lindvall Ioannidis NM, Davis JR, DeGorter MK, Larson NB, McDon- JM, Ameur A et al (2015) Mutation in CEP63 co-segregating nell SK, French AJ et al (2017) FIRE: functional inference of

1 3 Human Genetics (2021) 140:1183–1200 1199

genetic variants that regulate gene expression. Bioinformatics Peterson RL, Pennington BF (2015) Developmental dyslexia. Annu 33:3895–3901 Rev Clin Psychol 11:283–307 Iossifov I, O’Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D Picard (2016) Picard [v1.134]. Picard. https://broad​ insti​ tute.​ github.​ io/​ ​ et al (2014) The contribution of de novo coding mutations to picard/. Accessed 30 July 2015 autism spectrum disorder. Nature 515(7526):216–221 Price KM, Wigg KG, Feng Y, Blokland K, Wilkinson M, He G et al Jia L, Liu N, Huang F, Zhou Z, He X, Li H et al (2020) intansv: an (2020) Genome-wide association study of word reading: Overlap R package for integrative analysis of structural variations. PeerJ with risk genes for neurodevelopmental disorders. Genes Brain 8:e8867 Behav 19:e12648 Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J Purcell S, Chang C (2020) PLINK [1.90]. PLINK [1.90]. www.​cog-​ (2014) A general framework for estimating the relative patho- genom​ics.​org/​plink/1.​9/. Accessed 4 Sept 2019 genicity of human genetic variants. Nat Genet 46(3):310–315 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender Kuijpers C, van der Leij A, Been P, van Leeuwen T, ter Keurs M, D et al (2007) PLINK: a tool set for whole-genome associa- Schreuder R, van den Bos K (2003) Leesproblemen in de boven- tion and population-based linkage analyses. Am J Hum Genet bouw van het voortgezet onderwijs en de volwassenheid: normer- 81(3):559–575 ing van een aantal tests. Pedagogische Studiën 241:272–287 Ramasamy A, Trabzuni D, Guelf S, Varghese V, Smith C, Walker Lange EM, Lange K (2004) Powerful allele sharing statistics for non- R et al (2014) Genetic variability in the regulation of gene parametric linkage analysis. Hum Hered 57(1):49–58 expression in ten regions of the human brain. Nat Neurosci Layer RM, Chiang C, Quinlan AR, Hall IM (2014) LUMPY: a proba- 17(10):1418–1428 bilistic framework for structural variant discovery. Genome Biol Ritchie GR, Dunham I, Zeggini E, Flicek P (2014) Functional annota- 15(6):R84 tion of noncoding sequence variants. Nat Methods 11(3):294–296 Li H, Durbin R (2009) Fast and accurate short read alignment with Rosenthal EA, Ranchalis J, Crosslin DR, Burt A, Brunzell JD, Mot- Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760 ulsky AG et al (2013) Joint linkage and association analysis with Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al exome sequence data implicates SLC25A40 in hypertriglyceri- (2009) The Sequence Alignment/Map format and SAMtools. Bio- demia. Am J Hum Genet 93(6):1035–1045 informatics 25(16):2078–2079 Shaywitz SE, Escobar MD, Shaywitz BA, Fletcher JM, Makuch Luciano M, Evans DM, Hansell NK, Medland SE, Montgomery GW, R (1992) Evidence that dyslexia may represent the lower Martin NG et al (2013) A genome-wide association study for read- tail of a normal distribution of reading ability. N Engl J Med ing and language abilities in two population cohorts. Genes Brain 326(3):145–150 Behav 12(6):645–652 Strug LJ, Addis L, Chiang T, Baskurt Z, Li W, Clarke T et al (2012) McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kerny- The genetics of reading disability in an often excluded sample: tsky A et al (2010) The genome analysis toolkit: a MapReduce novel loci suggested for reading disability in Rolandic epilepsy. framework for analyzing next-generation DNA sequencing data. PLoS ONE 7(7):e40696 Genome Res 20(9):1297–1303 Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J et al McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham (2015) UK biobank: an open access resource for identifying the F (2010) Deriving the consequences of genomic variants with causes of a wide range of complex diseases of middle and old age. the Ensembl API and SNP Effect Predictor. Bioinformatics PLoS Med 12(3):e1001779 26(16):2069–2070 Taipale M, Kaminen N, Nopola-Hemmi J, Haltia T, Myllyluoma B, McPeek MS (1999) Optimal allele-sharing statistics for genetic map- Lyytinen H et al (2003) A candidate gene for developmental ping using afected relatives. Genet Epidemiol 16:225–249 dyslexia encodes a nuclear tetratricopeptide repeat domain pro- Meuwissen ME, Halley DJ, Smit LS, Lequin MH, Cobben JM, de tein dynamically regulated in brain. Proc Natl Acad Sci USA Coo R et al (2015) The expanding phenotype of COL4A1 and 100(20):11553–11558 COL4A2 mutations: clinical data on 13 newly identifed families Thiele H, Nurnberg P (2005) HaploPainter: a tool for drawing pedi- and a review of the literature. Genet Med 17:843–853 grees with complex haplotypes. Bioinformatics 21(8):1730–1732 Millard LA, Davies NM, Gaunt TR, Davey Smith G, Tilling K (2017) Thompson E (2011) The structure of genetic linkage data: from LIPED Software application profle: PHESANT: a tool for perform- to 1M SNPs. Hum Hered 71(2):86–96 ing automated phenome scans in UK Biobank. Int J Epidemiol Thornton T, McPeek MS (2007) Case-control association testing with 47:29–35 related individuals: a more powerful quasi-likelihood score test. Miller JA, Ding SL, Sunkin SM, Smith KA, Ng L, Szafer A et al (2014) Am J Hum Genet 81(2):321–337 Transcriptional landscape of the prenatal human brain. Nature Thornton T, Zhang Q, Cai X, Ober C, McPeek MS (2012) XM: asso- 508(7495):199–206 ciation testing on the X-chromosome in case-control samples with Mohiyuddin M, Mu JC, Li J, Bani Asadi N, Gerstein MB, Abyzov related individuals. Genet Epidemiol 36(5):438–450 A et al (2015) MetaSV: an accurate and integrative structural- Thorvaldsdottir H, Robinson JT, Mesirov JP (2013) Integrative Genom- variant caller for next generation sequencing. Bioinformatics ics Viewer (IGV): high-performance genomics data visualization 31(16):2741–2744 and exploration. Brief Bioinform 14(2):178–192 Nopola-Hemmi J, Myllyluoma B, Haltia T, Taipale M, Ollikainen V, Truong DT, Adams AK, Paniagua S, Frijters JC, Boada R, Hill D Ahonen T et al (2001) A dominant gene for developmental dys- et al (2019) Multivariate genome-wide association study of lexia on . J Med Genet 38(10):658–664 rapid automatised naming and rapid alternating stimulus in Norton N, Li D, Rampersaud E, Morales A, Martin ER, Zuchner S et al Hispanic American and African-American youth. J Med Genet (2013) Exome sequencing and genome-wide linkage analysis in 56(8):557–566 17 families illustrate the complex contribution of TTN truncat- Uterwijk J (2000) WAIS-III Nederlandstalige Bewering. Technische ing variants to dilated cardiomyopathy. Circ Cardiovasc Genet Handleiding. Swets and Zeitlinger, Lisse 6(2):144–153 Van Battum EY, Brignani S, Pasterkamp RJ (2015) Axon guidance Peter B, Matsushita M, Oda K, Raskind W (2014) De novo microdele- proteins in neurological disorders. Lancet Neurol 14(5):532–546 tion of BCL11A is associated with severe speech sound disorder. van den Bos KP, Lutje Spelberg HC, Scheepstra AJ, de Vries JR (1994) Am J Med Genet A 164A(8):2091–2096 De klepel: Een test voor de leesvaardigheid van pseudowoorden

1 3 1200 Human Genetics (2021) 140:1183–1200

[the klepel: a test for the reading skills of pseudowords]. Swets sets of genetically linked variants. Nucleic Acids Res. 40(Data- & Zeitlinger, Lisse base issue):D930-934 van der Leij A, Maassen B (2013) Dutch Dyslexia programme. Dys- Wiegrefe C, Simon R, Peschkes K, Kling C, Strehle M, Cheng J et al lexia 19(4):189–190 (2015) Bcl11a (Ctip1) controls migration of cortical projection van der Velde KJ, Kuiper J, Thompson BA, Plazzer JP, van Valkenhoef neurons through regulation of Sema3c. Neuron 87(2):311–325 G, de Haan M et al (2015) Evaluation of CADD scores in curated Wijsman EM, Rothstein JH, Thompson EA (2006) Multipoint linkage mismatch repair gene variants yields a model for clinical valida- analysis with many multiallelic or dense diallelic markers: Markov tion and prioritization. Hum Mutat 36(7):712–719 chain-Monte Carlo provides practical approaches for genome Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF et al (2007) scans on general pedigrees. Am J Hum Genet 79(5):846–858 PennCNV: an integrated hidden Markov model designed for high- Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR (2015) resolution copy number variation detection in whole-genome SNP The ensembl regulatory build. Genome Biol 16:56 genotyping data. Genome Res 17(11):1665–1674 Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annota- Publisher’s Note Springer Nature remains neutral with regard to tion of genetic variants from high-throughput sequencing data. jurisdictional claims in published maps and institutional afliations. Nucleic Acids Res 38(16):e164 Ward LD, Kellis M (2012) HaploReg: a resource for exploring chro- matin states, conservation, and regulatory motif alterations within

1 3