DISSECTING THE GENETICS OF COMMUNICATION:

INSIGHTS INTO SPEECH, LANGUAGE, AND READING

by

HEATHER ASHLEY VOSS-HOYNES

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Department of Epidemiology and Biostatistics

CASE WESTERN RESERVE UNIVERSITY

January 2017

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We herby approve the dissertation of

Heather Ashely Voss-Hoynes

Candidate for the degree of Doctor of Philosophy*.

Committee Chair

Sudha K. Iyengar

Committee Member

William Bush

Committee Member

Barbara Lewis

Committee Member

Catherine Stein

Date of Defense

July 13, 2016

*We also certify that written approval has been obtained for any proprietary material

contained therein Table of Contents

List of Tables 3 List of Figures 5 Acknowledgements 7 List of Abbreviations 9 Abstract 10 CHAPTER 1: Introduction and Specific Aims 12 CHAPTER 2: Review of speech sound disorders: epidemiology, quantitative components, and genetics 15 1. Basic Epidemiology 15 2. Endophenotypes of Speech Sound Disorders 17 3. Evidence for Genetic Basis Of Speech Sound Disorders 22 4. Genetic Studies of Speech Sound Disorders 23 5. Limitations of Previous Studies 32 CHAPTER 3: Methods 33 1. Phenotype Data 33 2. Tests For Quantitative Traits 36 4. Analytical Methods 42 CHAPTER 4: Aim I- Wide Association Study 49 1. Introduction 49 2. Methods 49 3. Sample 50 5. Statistical Procedures 53 6. Results 53 8. Discussion 71 CHAPTER 5: Accounting for comorbid conditions 84 1. Introduction 84 2. Methods 86 3. Results 87 4. Discussion 105 CHAPTER 6: Hypothesis driven pathway analysis 111 1. Introduction 111 2. Methods 112 3. Results 116 4. Discussion 119 CHAPTER 7: Exploratory pathway analysis 123 1. Introduction 123 2. Methods 124 3. Results 127 4. Discussion 135 5. Future Directions 141 CHAPTER 8: General Conclusions and Future Directions 143 Appendix A- Additional Materials for Chapter 3 146 1. Sample Ancestry 150 2. Power Calculations 151

1 Appendix B- Additional Materials for Chapter 4 154 1. Model Selection 154 2. Full GWAS Results 162 Appendix C- Addiitional Materials for Chpater 5 188 Appendix D- Additional materials for Chapter 6 210 Appendix E- Additional Materials for Chapter 7 211 Bibliography 220

2 List of Tables

Table 2.1 Phonological processes and age at which they decline. 19 Table 2.3 Loci from linkage studies. 27 Table 2.4 associated with SSD 27 Table 2.5 Copy number variation associated with SSD 28 Table 2.6 Genes associated with comorbid conditions. 30 Table 2.7 Loci from linkage studies associated with comorbid conditions 32 Table 3.1 Tests used in current study and the phenotype interrogated 35 Table 3.3 Basic demographics of all individuals in the cohort as of February 2016 35 Table 3.4 Transformations of z-scores 39 Table 3.5 . Genotyping data for the current study. 40 Table 3.6 Chip characteristics summarized from Illumina documentation 40 Table 3.7. SNP quality control summary 41 Table 3.8 Individual quality control summary 41 Table 3.9. Significance threshold for HapMap. 45 Table 4.1 Test used in the analyses divided by endophenotype. 50 Table 4.2 Summary statistics for quantitative traits used in the analysis 51 Table 4.3 Correlation (R2) between the quantitative traits analyzed. 52 Table 4.4 Most significant marker for genes previously associated with SSD or childhood apraxia of speech 54 Table 5.1- Mean/median z-scores stratified by Language Impairment affection status (Model 1) 88 Table 5.2 Mean/median score stratified by Reading Disability affection status (Model 2) 88 Table 5.3 Mean/median scores stratified by all groups except SSD status 89 Table 6.1 Pathways of interest based on Aim I GWAS results 114 Table 6.2 Genes included in the FOXP2 and CANTNAP2 sets 115 Table 6.3 Significance of Aim I based pathways 117 Table 6.4 p-values for FOXP2 and CNTNAP2 networks 118 Table 6.5 p-values for Comorbid Condition Gene Sets 119 Table 7.1 Number of significant pathways for each trait 128 Table 7.2 Pathways shared by four or more traits. 129 Table 7.3 Pathways significant in GFTA and MSW or NSW 135 Table A1- Ancestry of the individuals who passed quality control. 150 Table B1. Lambda values for four models. 161 Table B2 Sample sizes with and without parents 161 Table C1. Top 20 loci for binary outcome after adjusting for LI and RD 199 Table C2. Top loci for Fletcher Time by Count after adjusting for LI and RD. 200 Table C3. Top 10 loci for Goldman Fristoe Test of Articulation after adjusting for LI and RD. 201 Table C4. Top 20 loci for Expressive One Word Picture Vocabulary Test after adjusting for LI and RD 202 Table C5 Top 20 loci for Peabody Picture Vocabulary test after adjusting for LI and RD 203

3 Table C6 Suggestive loci for Weschler Individual Achievement Test –Listening Comprehension after adjusting for LI and RD 204 Table C7 Top 20 loci for multisyllabic word repetition after adjusting for LI and RD 205 Table C8. Top 20 loci for nonsense word repetition after adjusting for LI and RD 206 Table C9. Suggestive loci for TWS after adjusting for LI and RD 207 Table C10. Top 20 loci for Word Attack after adjusting for LI and RD 207 Table C11 . Suggestive makers Word Identification after adjusting for LI and RD 208 Table C12. Most significant SNP in genes previously associated with SSD. 209 Table D1. Pathway Analysis- User defined pathways 210 Table E1. Significant pathways for articulation and motor control 211 Table E2. Significant pathways for language traits 211 Table E3. Significant pathways for phonology traits 214 Table E5. Significant pathways for spelling 216 Table E6 Pathways significant in 3 traits. 218

4 List of Figures

Figure 2.1- Consonants and age of acquisition 18 Figure 3.1 Overall study design and workflow 33 Figure 4.2 Manhattan plot- Fletcher Time by Count 57 Figure 4.3 Manhattan plot- GFTA 58 Figure 4.4 Manhattan plot- EOWPVT 59 Figure 4.5 Manhattan plot- PPVT 60 Figure 4.6 Manhattan plot- WIATLC 61 Figure 4.7 Manhattan plot- Shared between EOWPVT and PPVT 62 Figure 4.8 Manhattan plot- MSW 63 Figure 4.9 Manhattan plot- NSW 65 Figure 4.10 Manhattan plot- Shared between MSW and NSW 66 Figure 4.11 Manhattan plot- WRDATK 67 Figure 4.12 Manhattan plot- WRDID 68 Figure 4.13 Manhattan plot- Shared WRDATK WRDID 69 Figure 4.14 Manhattan plot- TWS 70 Figure 5.1- Conceptual model for the relationship between SNP effect, SSD quantitative trait, language impairment, and reading disability. 84 Figure 5.2 Basic workflow for Aim II. 86 Figure 5.3. Proportion of markers with p<1x10-5 in Aim I 90 Figure 5.4 Effects of adjusting for LI and RD –Fletcher Time by Count 92 Figure 5.5 Effects of adjusting for LI and RD – Goldman-Fristoe Test of Articulation 93 Figure 5.6 Effects of adjusting for LI and RD –Expressive One Word Picture Vocabulary Test. 95 Figure 5.7 Effects of adjusting for LI and RD Peabody Picture Vocabulary Test 96 Figure 5.8 Effects of adjusting for LI and RD Weschler Individual Achievement Test- Listening Comprehension subtest 97 Figure 5.9 Effects of adjusting for LI and RD Multisyllabic Word Repetition 98 Figure 5.10 Effects of adjusting for LI and RD Nonsense Word Repetition 99 Figure 5.11 Effects of adjusting for LI and RD Word Attack 101 Figure 5.12 Effects of adjusting for LI and RD Word Identification 103 Figure 5.13 Effects of adjusting for LI and RD Test of Written Spelling 104 Figure 6.1 Workflow for pathway analysis of genome-wide association results 113 Figure 7.1 Section of the KEGG Calcium signaling pathway 126 Figure 7.3 Classification of pathways significant in two or more traits 128 Figure 7.4 Interactions between significant pathways for language traits. 132 Figure 7.5 Pathways significant in both MSW and NSW 133 Figure 7.5 Shared pathways for reading traits 134 Figure 7.6 Interactions identified between significant spelling pathways 135 Figure A1 z-scores for Fletcher Time by Count and Goldman-Fristoe Test of Articulation 146 Figure A2 z-scores for PPVT and WIATLC 147 Figure A3 z-scores for MSW and NSW 147 Figure A4 z-scores for Word Attack and Word Identification 148 Figure A5 z-scores for Test of Written Spelling 149

5 Figure A6 Principal component plots 150 Figure A7 Power at various minor allele frequencies and effect estimates. 151 Figure A8 Effects of altering various parameters on power for binary outcome. 153 Figure B1. QQ plots for Articulation and Oral Motor Control 155 Figure B2. QQ plots for language endophenotypes 156 Figure B3. QQ plots for reading endophenotypes 157 Figure B4. QQ plots for spelling 158 Figure B5. Histograms of articulation and language traits 159 Figure B6. Histograms of phonology, reading, and spelling traits 160 Figure C1 Manhattan plots for adjusted BT Speech 188 Figure C2 Manhattan plots for adjusted Fletcher Time by Count 189 Figure C3 Manhattan plots for adjusted GFTA 190 Figure C4 Manhattan plots for adjusted EOWPVT 191 Figure C5 Manhattan plots for adjusted PPVT 192 Figure C6 Manhattan plots for adjusted WIATLC 193 Figure C7 Manhattan plots for adjusted MSW 194 Figure C8 Manhattan plots for adjusted NSW 196 Figure C9 Manhattan plots for adjusted WRDATK 197 Figure C10 Manhattan plots for adjusted WRDID 198 Figure C11 Manhattan plots for adjusted TWS 199

6 Acknowledgements

I am grateful to countless individuals for helping me through this process. Thank you to my advisor, Dr. Sudha Iyengar, for involving me in the project that would become this dissertation in my second semester of school. Also, thank you for buying into my ambition and not shooting down my timeline. To my committee members Drs. Will

Bush, Barbara Lewis, and Catherine Stein thank you for sacrificing your own time to help me by providing feedback, valuable insights, and helpful suggestions. You have all helped me grow as a thinker and questioner; for that, I am eternally grateful.

Thank you to Dr. Ralph O’Brien for encouraging me to pursue this degree and to

Dr. Mark Willis for his constant support, reminding me that I should never stop dancing, and that nothing is set in stone. Thank you to Dr. Rob Igo for his help with the QC process and answering my incessant questions. To Jeremy Fondran and Barb Truitt, thank you for introducing me to the data and being sources of information and ideas over the past three years. Thank you to families who participated in the study and Lisa

Freebarin, Jessica Tag, and others who painstakingly tested and scored each participant.

To all the administrators in the department, we would be lost without you. To

Alberto Santana, thank for maintaining Latitude. Cynthia Moore, thank you for always being available for a chat.

To my peers, especially Jessica, Noémi, and Yana, I will never forget the laughs

(and tears) we shared over the past few years. I am delighted that no one was seriously injured or ended up in a security alert, and I wish you all the best of luck in meeting your adulting goals.

7 To the Cavs, thank you for making June far less miserable than it could have been. #allin216

I also extend a most heartfelt thank-you to my fellow teachers and students at the

Murphy Irish Arts Center. My students’ joy and humor provided me invaluable perspective and helped me through some of the most challenging times.

Finally and above all, my most profound gratitude goes to my family. Words fail to express how fortunate I am to have them; without their unwavering love and support, I am certain completing this degree would have been impossible. And Kurt, I will always be a crusader for the humanities.

8 List of Abbreviations

EOWPVT Expressive One Word Picture Vocabulary Test GFTA Goldman-Fristoe Test of Articulation LI Language Impairment MSW Multisyllabic Word Repeition NSW Nonsense Word Repeition PPVT Peabody Picture Vocabulary Test RD Reading disability SSD Speech Sound Disorders TWS Test of Written Spelling Weschler Individual Achievement Test- Listening WIATLC Comprehension WRDATK Woodcock Reading Mastery- Word Attack WRDID Woodcock Reading Mastery Word Identification

9 Dissecting the Genetics of Human Communication: Insights into Speech, Language, and Reading

Abstract

By

HEATHER A. VOSS-HOYNES

Interpersonal communication is a vital component of everyday life which can be negatively affected by speech sound disorders (SSD). SSD affect articulation and phonological processes, are the most common type of communication disorder, and occur in 16% of three year olds. Despite the frequency with which they occur, SSD are relatively understudied compared to other communication disorders such as dyslexia and specific language impairment. SSD can occur due to craniofacial abnormalities, hearing loss, as a symptom of certain syndromes, or due to unknown causes. SSD of unknown cause are heritable with monozygotic twin concordance rates of 0.95, but the genetic basis is not well defined. Many previous studies have focused upon FOXP2, a gene harboring a causal in one large family, or genes and loci associated with language impairment (LI) or dyslexia (RD), frequently comorbid conditions. The weakness of these approaches is they are self-limiting and cannot identify novel loci.

Consequently, it would be beneficial to address the etiology of SSD agnostically to identify novel loci and characterize the genetic architecture of what is likely a multifactorial disorder. To do so, data from the Cleveland Family Speech and Reading

Study, a longitudinal study of children with SSD, were used to perform the first known genome-wide association study on traits associated with SSD endophenotypes in a

10 sample ascertained based on speech sound disorder diagnosis. This analysis identified novel loci, replicated previous findings, and informed hypotheses regarding biological pathways that may be involve in SSD. To investigate the impact of LI and RD on genetic association with SSD endophenotypes, the changes in genetic effect estimates after adjusting for the conditions were analyzed. Some effects were unchanged by LI and RD status suggesting a foundational role of these loci in human communication. Finally a pathway analysis revealed similarities between SSD and other neuropathologies such as spectrum disorders and Alzheimer’s disease. This study represents a thorough examination of the genetic underpinnings of SSD and other communication traits, is the first genome-wide association study for SSD, and supports a multifactorial genetic architecture underlying both typical and atypical communication.

11 CHAPTER 1: Introduction and Specific Aims

Communication disorders cost an estimated $154 billion dollars1 annually in lost salaries, special education, and medical care (Ruben, 2000). The most common communication disorders, childhood speech sound disorders (SSD), occur in roughly

16% of preschoolers and persist past six-years-old in 3.8% of the population (Shriberg,

2002). SSD include aberrant articulation—the way speech sounds are produced—and disrupted phonological processes, vary from mild to severe, and can be comorbid with specific language impairment and reading disability (American Speech and Hearing

Association (ASHA, 2016); Peterson et al., 2009).

Known causes of SSD include hearing loss, otitis media, structural variation of the tongue and teeth, cleft lip and palate (asyndromic and syndromic), cerebral palsy, galactosemia, and syndromes such as Down syndrome (ASHA-Speech Sound Disorders,

2016). However, most causes of SSD are unknown. Though little is known about the latter group, they are heritable (monozygotic twin concordance rate=0.95-0.97, dizygotic= 0.22) (Lewis & Thompson, 1992; Bishop, 2002), a reality motivating genetic studies of SSD.

While comorbid conditions have been well characterized, there have not been extensive studies of SSD genetics. The most well-known study of SSD genetics was conducted on a family segregating apraxia, a severe form of SSD, and identified a point mutation in FOXP2 as the causal mutation (Lai et al., 2001). Based on those results,

FOXP2 became the focus of SSD research (Feuk et al., 2006; Lennon et al., 2007).

Studies not focusing on FOXP2 concentrated on loci previously linked to dyslexia.

1 In 2000 amounts. There have been no follow up studies since the original study by Ruben in 2000.

12 Focusing on regions previously linked with dyslexia, researchers linkage with SSD association on 1, 2, 3, and 15 (Stein et al., 2004; Smith et al., 2005;

Miscimarra et al., 2007). More recently, agnostic sequencing studies on small samples identified variants within various genes such as CNTNAP2, KIAA0319, and SEXT (Laffin et al., 2012; Worthey et al., 2013).

Our understanding of SSD genetics remains fragmented and confined to studies of related phenotypes or single case reports. The ultimate motivation for this dissertation is to characterize the genetic architecture of speech sound disorders in a cohesive manner through both agnostic and hypothesis driven approaches. This work represents the first genome wide association study for SSD, the results of which will lead to hypotheses for future research. The aims of this dissertation are:

One (Chapter 4): To conduct the first genome-wide association study of speech sound disorder and identify variants associated with quantitative measures of SSD endophenotypes.

To our knowledge, this will be the first genome-wide association study conducted

on individuals ascertained based on SSD affection status. In addition to

identifying novel loci, this aim will also generate data for the remaining aims.

Two (Chapter 5): To explore the relationship between commonly comorbid conditions and genetic effects by examining changes in effect estimates after accounting for LI and

RD in a genome wide association study.

In Aim 1 we will not account for comorbidity affection status. It is possible that

the genetic effects from Aim I are confounded by RD and LI affection status

especially given that previous research has identified shared genetic components

13 of SSD and RD and SSD and LI (Stein et al., 2004; Smith et al., 2005; Rice et

al., 2009), it is possible. If adjusting for these comorbidities does not alter the

genetic effect at a certain loci, those loci may be a component of communication

skills.

Three: To account variants of marginal significance and perform pathway analysis to

a. Test for enrichment of association signal in pathways based on the results of Aim

I as well as gene sets associated with comorbid conditions (Chapter 6). In Aim 1,

we will identify suggestive loci that we an cluster into potentially biologically

meaningful groups. Additionally, we place our results in the context of previous

work by testing for enrichment of FOXP2 network and gene sets previously

associated with LI and RD.

b. Classify the spectrum of association signals into biologically meaningful

pathways (Chapter 7). This analysis will classify nonsignificant association

signals into biologically relevant Kyoto Encyclopedia of Genes and

(KEGG) pathways and will be a step toward a more cohesive understanding of the

genetic basis of SSD as well as typical speech and communication.

This dissertation marks the first known genome-wide association study of speech sound disorders and will characterize the genetic basis of SSD in a comprehensive manner.

14 CHAPTER 2: Review of speech sound disorders: epidemiology, quantitative

components, and genetics

1. Basic Epidemiology

Communication, “the process by which information is exchanged (speaking, writing, semaphore etc.),” is vital to human life, and is disrupted by communication disorders (Williams, 2012). Such disorders cause “impairment in the ability to receive, process, represent, or transmit information…specifically speech, language, or hearing”

(Williams, 2012). Prevalence estimates vary, but according to recent data from the

National Center for Health Statistics, the prevalence of all communication disorders is

7.7% in children 3–17. In general, boys are significantly more likely than girls to be affected with communication disorders (9.6% vs 5.7%) as are non-Hispanic black children compared to non-Hispanic white and Hispanic children (9.6%, 7.8%, and 6.9% respectively) (Black et al., 2015). Of the affected children, speech problems were most common, accounting for 41.8% and 24.4% of all communication disorders in children 3–

10 and 11–17, respectively (Black et al., 2015).

Speech sound disorders (SSD) are heterogeneous and include disorders of articulation—sound production—and/or phonology—the organization of sounds in a language (ASHA- SSD Overview, 2016). Articulation and phonology will be discussed in detail subsequently (p. 17). SSD of unknown causes occur along a continuum from mild, which resolve, to severe, such as childhood apraxia of speech, which can persist into adulthood (Lewis et al., 2011). An important consideration of speech sound disorders is the temporal component. It should also be noted that SSD are not differences in pronunciation due to dialect.

15

a. Comorbid conditions

SSD can occur in isolation or with comorbidities. 6–21% of children with SSD also have receptive language disorders, 38–62% have expressive language disorders, and

25–30% have a reading disability (Peterson et al., 2009). Miscimarra et al. also described that the odds of finding LI in individuals with SSD was 10 times greater than finding isolated SSD (Miscimarra et al., 2007). Though all children with SSD do not have comorbid language or reading impairment, the relatively high prevalence of comorbidities has led to discussions of shared etiologies of SSD, specific language impairment (LI), and RD, a possibility that is considered in this work. b. Causes of Speech Sound Disorders

There are two types of SSD, those with known causes and those with unknown causes. Known causes include craniofacial abnormalities such as cleft lip and palate, malformed teeth, overbite or underbite, and macroglossia. Additionally, cerebral palsy and syndromes characterized by severe intellectual disability can lead to speech sound disorders (Shprintzen, 1997; Shprintzen, 1999). The American Speech-Language-Hearing

Association uses the following definitions:

Speech sound disorders is an umbrella term referring to any combination of

difficulties with perception, motor production, and/or the phonological

representation of speech sounds and speech segments (including phonotactic

rules that govern syllable shape, structure, and stress, as well as prosody) that

impact speech intelligibility. (ASHA-SSD Overview, 2016)

16 2. ENDOPHENOTYPES OF SPEECH SOUND DISORDERs

While tests of articulation and phonology are used to diagnose SSD, there are also other testable cognitive skills associated with SSD including receptive and expressive language, reading, and spelling. Analysis of these skills, also known as endophenotypes, allows for refinement of a binary classification to a more precise trait (Gottesman and

Gould, 2003). Such refinement is ideal for genetic analyses of complex traits where phenotypic heterogeneity can obscure genetic associations. Phonological memory, phonological awareness, and vocabulary abilities distinguish between SSD severity levels on a phenotypic level (Lewis et al., 2012). Therefore we leveraged this narrowing of the phenotype in hopes of identifying genetic variants associated with each trait. The same endophenotypes are also relevant to the commonly comorbid conditions (Rvachew, 2007;

Lewis et al., 2011; Stein et al., 2014), a fact that will allow us to put our results into a broader context. a. Articulation

Articulation is the motor component of speech and describes how sounds are made. Complete details regarding articulation are described in Bernthal, Bankson, and

Flipsen (2013); a basic, simplified explanation is that the English language consists 44 phonemes 18 of which are vowels (Bernthal, Bankson, and Flipsen, 2013). Precise motor control is necessary to position the jaw, tongue, and lips correctly to produce sounds

(Figure 1). Vowels are described by the position of the lips, rounded (as in why) or unrounded (as in hi) and location of the tongue in the mouth. Consonants are described by the placement of the lips and tongue and the closure of the oral cavity known as manner. For example, /p/ in pet or /b/ in bat is characterized by complete closure of the

17 oral cavity followed by a release or closure; the sound is bilabial stop. The word

sequence pie, why, vie, thigh, tie, shy, guy, and hi exemplify the differing places of

consonant articulation, from front of the mouth to back, and the motor control necessary

to accurately produce the sounds (sequence from Bernthal, Bankson, and Flipsen, 2013).

There are ages by which children are expected to master these sounds, and departure

from these norms often results in referral to speech language pathologist (ASHA, 2016)

(Figure 2.1).

/ʒ/ beige /ð/ the /θ/ think /v/ very /ʤ/ jam /z/ zoo /ʃ/ shop /ʧ/ chop /s/ sorry /‐l/ heel /l‐/ long /r/ red /j/ yellow /‐f/ leaf /f‐/ fall /ŋ/ ping /t/ top /d/ dot /g/ go /k/ car /b/ book /w/ win /n/ not /h/ hot /m/ mat /p/ pop 0 2 4 6 8 10 Age

Figure 2.1 Consonants and age of acquisition. Figure developed by Williamson, 2010 using data from Sander, 1972; Grunwell, 1981; and Smith et al., 1990

Figure 2.1- Consonants and age of acquisition

18 b. Phonology

Phonology is a linguistic study of how speech sounds are organized in a given

language and is often considered to be the cognitive component of language.

Phonological processes, awareness, and memory are relevant to SSD (Bernthal, Bankson,

and Flipsen, 2013).

Phonological processes: As part of the normal process of language acquisition, children

use phonological processes (sometimes referred to as phonological patterns) to simplify

speech. When using consonant harmony, for example, an individual produces consonants

in a word the same (dogdod). As children mature, the use of processes decreases until

eventually, children speak like adults (Table 2.1). Children who continue to use the

processes past the expected age would likely be referred to an SLP. In addition to using

processes past the normal age, children with phonological disorders may use uncommon

processes such as deleting initial consonants, backing stops (tubkub), or any processes

involving vowels (vowel backing and lowering birdbad) (Bauman-Waengler, 2012).

Table 2.1 Phonological processes and age at which they decline. Adapted from Bernthal, Bankson and Flipsen, 2013 and Grunwell, 1982

Group Process Definition Example Declines Assimilation Consonant harmony one sound becomes similar to dog dod 3;0 another in the same word

Substitution Fronting Velars pronounced as sounds Car tar 3;6 produced father forward in the

mouth

Backing (not common) Doggog 3;0

Gliding Liquids /l/,/r/ are replaced by rabbitwabbit 5+ /w/,/j/

Group Process Definition Example Declines Substitution Depalatization Palatal sounds are pronounced as Fish fis 2;6, 4;0 sounds produced further forward

deaffrication Affricates pronounced as Church Sursh 2;6

19 fricatives

Syllable Processes that affect syllable structure structure Final consonant of the final consonant Dog do 3;0 deletion

Cluster simplification Deletion of on element of a Planepane 3;6 or reduction cluster

Weak syllable Deletion of an unstressed Banananana 3;6, 4 deletion syllable

Phonological awareness is an understanding and ability to analyze and manipulate the sound structure of speech (Bernthal, Bankson, and Flipsen 2013). Measureable components of phonological awareness include the ability to recognize rhyme, recognize and segment syllables, and manipulate phenomes (Williams, 2013; Bernthal, Bankson, and Flipsen 2013). For example, a child should be able to hear /b/ + /ell/ and say /bell/.

Table 2.2 Components of phonological awareness and age at which a related task/interrogation is mastered. Component Example Age % children mastering skill 2–3 50% Rhyme matching 4–5 90% Rhyme production 3 35% 4 50% Syllable counting How many syllables are in puppy? 5 90 Phoneme awareness (segmentation Pond /p/, /α/, /n/, /d/ 6–7 >90% analysis/elison) Phoneme awareness /p/, /α/, /n/, /d/ pond (blending/synthesis)

Phonological memory is the storage of auditory phoneme information in short-term memory so that it can be manipulated (ASHA-SSD Assessment, 2016). Some children with phonological memory deficits experience impairments in developing written and spoken vocabularies (Gathercole & Baddeley, 1990).

20 Phonological representations is the mental representation of sound and their combinations that comprise words. i. Tests of Phonology

Multisyllabic word repetition (MSW): This test requires children to accurately sequence phenomes by repeating multisyllabic words. Target words include aluminum, thermometer, sympathize (Catts, 1986). The test is scored by determining the percentage of words repeated correctly.

Nonsense word repetition (NSW): This test requires children to encode the information they hear and then repeat it. Phonological encoding results in conversation of what is heard to phonetic representations and formation of an articulatory plan to repeat it; in encoding would result in mistakes in word repetition (Levelt, 2002; Kamhi & Catts,

1986). An example of a nonword is rəbesɪt. It is scored by determining the percentage of words repeated correctly. c. Language

Broadly, language is the use and comprehension of a spoken, written, or other symbol system (e.g. sign language) and is both receptive and expressive (ASHA, 2016).

Receptive language is the ability to understand what is said while expressive language is the ability to formulate thoughts into words. Language includes five domains: phonology, morphology, syntax, semantics, and pragmatics (ASHA, 2016). Morphology is a study of the way the smallest meaningful units of language are combined (i.e. grammar), syntax is the study of sentence structure, semantics examines meaning, and pragmatics involves the social component of language (ASHA, 2016). Children with

21 childhood apraxia of speech and SSD+LI score more poorly on vocabulary measures than do unaffected children (Lewis et al., 2012). d. Reading

Reading is a receptive written language. One theory regarding reading is the dual route theory that stipulates that reading occurs via two mechanisms. In the direct route words are immediately recognized and understood; in the indirect, phonological route words must be broken apart or sounded out (Seigel, 2006). The latter requires breaking words into component parts (phonological processing), and associating letters with sounds (phonological awareness) (ASHA, Seigel, 2006). 25–30% of children with SSD also have a reading disability (Peterson et al., 2009) and one group proposes that difficulties with phonological representations may contribute to both speech sound disorders and reading deficits (Anthony et al., 2011). e. Spelling

Like reading, spelling demands phonological awareness because to spell successfully an individual must understand how phonemes (sounds) are represented by symbols (letters).

3. EVIDENCE FOR GENETIC BASIS OF SSD

Prior to embarking on genetic based studies of any disorder, it is necessary to establish that there is, indeed, a genetic basis. For speech sound disorders, twin and family aggregation studies support the existence of a genetic etiology. Morley provides the first evidence of a genetic basis of SSD when he describes that of 12 families in which the proband had childhood apraxia of speech, 50% also had parents or siblings with SSD (Morley, 1967). A longitudinal study of development found that the children of

22 individuals who had phonological disorders in elementary school scored more poorly on articulation tests than children of unaffected individuals (Felsenfeld et al., 1995). Lewis et al. report that in a cohort (cohort for this dissertation) of families ascertained based on

SSD 26% (18.3% fathers, 40.9% brothers, 19.4% sisters, 18.2% mothers) of nuclear family members and 13.6% of extended family members were also affected with SSD

(Lewis et al., 1992). A separate study reports concordance rates of 0.95 and 0.22 for monozygotic and dizygotic twins, respectively (Lewis and Thompson, 1992). Bishop reports narrow sense heritability, the phenotypic variation explained by additive genetic components, of 0.97 (2002). Though the diagnostic criteria for SSD may have changed since the time of these studies, they illustrate that at least some portion of SSD is heritable and validate genetic studies.

4. GENETIC STUDIES OF SSD

For a brief review, see Tables 2.3–2.5. The first genetic study of speech sound disorder was based on a multigenerational family in England, the KE family, affected with apraxia of speech. Following identification of a linkage peak on 7q31.1 by Fisher et al. (1998), Lai et al. identified a single, causal mutation in FOXP2 leading to heavy focus upon FOXP2 as the basis of speech sound disorders. The gene is an evolutionarily conserved that is highly expressed in the brain during development

(Enard et al., 2002). The finding spurred research on other species. In songbirds, early work found FOXP2 is differentially expressed in the brain during periods of song learning, indicating it may have a role in vocal learning (Webb and Zhang, 2005); later experiments using knockout zebra finch models confirmed these findings, as knockout animals were unable to learn songs accurately (Hueston and White, 2015). In mice, there

23 is evidence of a role of FOXP2 in communication; homozygous FOXP2 knockout leads to decreased ultrasonic vocalization compared to wild type controls (Shu et al., 2005).

Given these realities, FOXP2-related research dominated the field for a period of time. a. FOXP2 and SSD

In a 2006 study, Feuk et al. examined the FOXP2 region in 13 individuals with childhood apraxia of speech and identified structural anomalies in all individuals. In their sample, the affected individuals had either maternal uniparental disomy or a deletion of the paternal copy of FOXP2, suggesting a parent-of-origin effect for SSD. It should, however, be noted that 7 individuals had Silver-Russell Syndrome, and 2 had Autism, situations making it challenging to disentangle the effect of speech phenotypes on other syndrome phenotypes.

An exome sequencing study of 24 individuals with apraxia of speech identified structural anomalies—there were 16 unique copy number variations (CNVs) identified in

12 individuals (Laffin, 2012). Although the pathogenicity of the CNVs was not clear, the authors explain that the CNVs included gene families (ALG, BAG, CCDC, CDC<

EXOSC, MAP, PDE, , TMEM, and SFP) that have been associated with neurite outgrowth making them plausible candidate regions for SSD. A case study of a child with apraxia also identified a CNV on 7q31 including FOXP2. In addition to apraxia, the child had fine and gross motor control deficits (Lennon et al., 2007). b. Other, non-FOXP2 based studies

Studies not focused upon FOXP2 can be divided into two groups; they are linkage analyses using region associated with dyslexia and hypothesis-generating-agnostic studies. The latter group consists mostly of case reports or case series.

24 1. Hypothesis driven analyses

Although FOXP2 has been a consistent focus in the SSD literature, studies have also identified other loci. In 2004, Stein et al., conducted a linkage study within a dyslexia candidate region on 3, DYX5, and identified linkage between the region and a phonological factor score based on the scores of multisyllabic and nonsense word repetition. This finding indicates that dyslexia and SSD have a shared genetic basis.

In further support of pleiotropy of the region, Stein et al., found independent effects of the region on both multisyllabic word repetition (MSW) and nonsense word repetition

(NSW). Also exploring the hypothesis that reading delay and SSD have a shared genetic basis, Smith et al. (2005), tested three dyslexia susceptibility regions—1p36(DYX8),

6p22.2 (DYX3), and 15q21(DYX1)—for linkage with SSD. The group reported linkage with GFTA scores and the log odds of affection with a speech disorder on .

They also identified linkage on with nonsense word repetition, GFTA, and percent consonants correct (Smith et al., 2005). This region on chromosome 15 is especially interesting in relation to speech because it is the Prader Willi/Angelman .

These syndromes result in poor oral motor skills and poor speech development, respectively (Cassidy and Schwartz, 1998). Additionally, the region has been associated with autism which is characterized by delayed speech (Pinto et al., 2010). Stein et al. further investigated DYX1 and did not identify linkage with SSD; however, the group did identify linkage (SSD as binary trait and between and repetition of single syllable) slightly upstream at 15q14 (Stein et al., 2006).

In the final study investigating linkage between a dyslexia region and SSD,

Miscimarra et al. found suggestive evidence among DYX8 (1p36), verbal short term

25 memory, and language comprehension (Miscimarra et al., 2007). These findings suggest a pleiotropic effect of the region.

Most recently, Stein et al. (2014) explored the hypothesis that the neural genes

DRD2, a dopamine receptor; AVPR1A, an arginine-vasopressin receptor; and ASPM, a gene, are associated with SSD. By performing association analyses with genotyped SNPs, the group identified association among AVPR1A and phonological memory (measured by nonsense and multisyllabic word repetition), reading decoding (as measured by the Word Identification and Word Attack subtests of the Woodcock Reading

Mastery Tests- Revised), and both receptive and expressive vocabulary (measured by

Peabody Picture Vocabulary Test- 3rd edition and the Expressive One Word Picture

Vocabulary Test- Revised). DRD2 was associated with phonological memory (measured by NSW, MSW), and ASPM was associated with receptive language (measured by

Peabody Picture Vocabulary Test) and reading decoding.

2. Agnostic studies

An exome sequencing study of 10 apraxic children identified potentially pathogenic variants in CNTNAP2, KIAA0319, FOXP1, SETX (Worthey et al., 2013).

There was no single variant shared among all 10 children, so it is difficult to make any conclusions regarding the pathogenicity of the variants. However, the genes have associated with related phenotypes (Tables 2.6 and 2.7).

A separate study using an Affymetrix genome-wide copy number variation(CNV) array on 7 children with childhood apraxia of speech free of SLI and of 8 children with specific language impairment alone revealed a deletion of CNTNAP2 in two children with apraxia but not in any children with SLI (Centanni et al., 2015). The authors suggest that

26 these findings indicate that previous associations between CNTNAP2 and LI/Dylsexia could have been due to comorbid motor speech problems.

Table 2.3 Loci from linkage studies. Linkage peak Study 1p34-p36 Miscimarra et al., 2007 3p12-q13 Stein et al., 2004

6p22.2 Smith et al., 2005 15q21 Smith et al., 2005 15q14 Stein et al., 2006

Table 2.4 Genes associated with SSD Gene Study type Mutation Study ASPM Targeted genotyping Stein et al., 2014 ATP13A4 Exome sequencing g.1938A>T, ATPase highly Worthey et al., 2013 (n=10) p.Glu646Asp expected in language centers of brain AVPR1 Targeted genotyping Stein et al., 2014

CNTNAP1 Exome sequencing p.Arg1064Gln, Formation and Worthey et al., 2013 (n=10) c.3191G>A maintenance of neural contact. Previously associated with

CNTNAP2 Exome sequencing p.arg171Cys; 3 Language delay, Laffin et al., 2012; (n=10) nucleotide intellectual Worthey et al., 2013 insertion near disability, splice site stereotypies of autism, specific language impairment DRD2 Targeted genotyping Stein et al., 2014

Gene Study type Mutation Study FOXP1 Exome sequencing SNP, Neural Worthey et al., 2013 (n=10) p.Ile107Thr development; previously associated with autism, language delay,

FOXP2 Transcription Lai et al., 2001

27 factor

KIAA0319 Exome sequencing p.Ala311Thr, Adhesion between Worthey et al., 2013 (n=10) c.931G>A neurons. Previously associated with SLI and dyslexia SETX Exome sequencing SNP, Previously Worthey et al., 2013 (n=10) p.Lys992Arg, associated with g.2975A>G) oculomotor apraxia type 2

Table 2.5 Copy number variation associated with SSD Locus Type Genes Trait Original Study 1q25.1 Deletion Delayed Centanni et al., 2015 language 2q31 Deletion Deletion of DLX1, Craniofacial Laffin et al., 2012 DLX2 patterning and forebrain development ITGA6 Cell-surface signaling

RAPGEF, HAT, Memory MAP1D,PDK1, retrieval and AL157450,CGEF2, synapse ZAK, CDCA7, remodeling MLK7-AS1 PDE11A Regulation of brain function 2q24 Deletion UPP2, CCDC148, Laffin et al., 2012 PK4P, AK126351 2p14 Deletion SPRED2 Laffin et al., 2012

4p15.1 Duplication Laffin et al., 2012

5q34 Deletion Cleft lip, Centanni et al., 2015 depressed nasal bridge, microcephaly

6p12.1 Duplication DST, BEND6, Carpenter Laffin et al., 2012 ZNF451, BAG2, syndrome RAB23, PRIM2

Locus Type Genes Trait Original Study 7q22- karyotyping Deletions of the All include Feuk et al.2006 7q36 of FOXP2 FOXP2 locus or FOXP2 and flanking maternal UPD regions in 13 patients with apraxia

7q31.1- Case report Deletion Also has severe Lennon et al., 2006

28 7q31.31 developmental delay

7q31.1- Deletion Hemizygous Includes Zilina et al., 2012 7q31.2 maternally inherited FOXP2 and has deletion

7q31.1- Deletion Hemizygous Zilina et al., 2012 q31.31 maternally inherited deletion

12p12.3 Deletion Language delay, Centanni et al., 2015 dysmorphic features and hypotonia

13q13.3 Duplication RFXAP, SMAD9, Laffin et al., 2012 ALG5, EXOSC8,

14q23.2 Deletion Laffin et al., 2012

15q21.2 Deletion Abnormal facial Centanni et al., 2015 shape, hypotonia 16p11.2 Deletion Laffin et al., 2012

16p13.2 Deletion ABAT, TMEM186, ABAT Laffin et al., 2012 PMM2, CARSHP1, deficiency USP7 psychomotor retardation, hypotonia, hyperflexia, lethargy, seizures

17q23.2 MS12 Expressed in Laffin et al., 2012 neuronal precursor cells c. Studies of comorbid conditions

As previously discussed and due to similar deficits, one can hypothesize that genes contributing to dyslexia and reading may contribute to speech sound disorders (or vice versa). Dyslexia is described by four cognitive components: orthographic processing, phoneme awareness, rapid automatized naming, and phonological short term

29 memory (Carrion-Castillo et al., 2013) which are similar to those involved in SSD with the exception of orthographic processing. Children with LI may have delayed phonological development in addition to grammatical, expressive language, and receptive language difficulties (Berkson, Bankson, and Flipsen, 2013). Additionally the conditions are often comorbid, and regions associated with dyslexia have been shown to be associated with SSD (Stein et al. 2004, 2005, Misimarra et al., 2007; Smith et al., 2005).

Consequently, because there may be shared etiology apart from that already described, a summary of genetic studies of SLI and RD will be provided (Table 2.6-2.7) but not discussed in detail.

Table 2.6 Genes associated with comorbid conditions. SLI= Specific language impairment Gene Phenotype Study ACOT13 Dyslexia Deffenbacher et al., 2004 ABCC13 SLI Luciano et al., 2014 ATP13A4 SLI, ASD Kwasnicka-Crawford et al., 2005 ATP2C2 SLI Newbury et al.2009, Newbury et al., 2011 Dyslexia Newbury et al., 2011 BDNF SLI Simmons et al.2010 CCDC136 SLI Giallusi et al., 2014 CFTR SLI O’Brien et al., 2003 CMIP SLI Newbury et al., 2009, Newbury et al.2011 Dyslexia Scerri et al., 2011 CNTNAP2 Dyslexia Newbury et al., 2011; Peter et al., 2011 SLI Vernes et al., 2008; Newbury et al., 2011 Delayed speech Al-Murrani et al., 2012 CYP19A1 Dyslexia Anthoni et al., 2012 DCDC2 Dyslexia Deffenbacher et al., 2004; Harold et al., 2006; Schumacher et al., 2006; Newbury et al., 2011; Scerri et al., 2011; Lind et al., 2010; Zhong et al., 2013 (meta analysis) SLI Rice et al., 2009 DOCK4 Dyslexia Pagnamenta et al., 2010 DRD2 Stuttering Lan et al., 2009 DYX1C1 Dyslexia Taipale et al., 2003; Scerri et al., 2004, Wigg et al., 2004, Brkanac et al., 2007, Marino et al., 2007, Dahdouh et al., 2009, Lim et al., 2011, Paracchini et al., 2011; Mascheretti et al., 2013 SLI Newbury et al., 2011

30 FOXP1 SLI Hamdan et al., 2010 FOXP2 Dyslexia Peter et al., 2011 SLI Rice et al., 2009 GCFC2 Dyslexia Anthoni et al., 2012 (conflicting evidence) SLI Scerri et al., 2011 GNPTAB Stuttering Kang et al., 2011 GNPTG Stuttering Kang et al.2011 GPLD1 Reading Meng et al., 2005 disability

Gene Phenotype Study KIAA0319 Dyslexia Deffenbacher et al., 2004; Cope et al., 2005; Harold et al., 2006; Ludwig et al., 2008; Dennis et al., 2009; Newbury et al., 2011; Scerri et al., 2011; Venkatesh et al., 2013 Reading Meng et al., 2005 disability SLI Rice et al., 2009; Newbury et al., 2011; MRPL19 Dyslexia Anthoni et al., 2007

SLI Scerri et al., 2011 *According to Carrion-Castillo et al., 2013 this gene may be related to general cognition rather than specifically reading and language NAGPA Stuttering Kang et al., 2010 NDST4 SLI Eicher et al., 2013 NRSN1 Dyslexia Deffenbacher et al., 2004 NOP9 SLI ROBO1 Dyslexia Hannula-Jouppi et al., 2005; not replicated by Venkatesh et al., 2013 NWR (in Bates et al., 2010 unaffected individuals) SETBP1 SLI Filges et al., 2011; Marseglia et al., 2012 SRPX2 Oral dyspraxia Roll et al., 2006 and seizure THEM2 Pinel et al.2012; Cope et al., 2012 TTRAP Deffenbacher et al., 2004 TDP2 Dyslexia Deffenbacher et al., 2004; Luciano et al., 2007

VMP Dyslexia Deffenbacker et al., 2004

31

Table 2.7 Loci from linkage studies associated with comorbid conditions, SLI= Specific language impairment Location Gene Phenotype Author 1p34-p36 KIAA0319 Grigorenko et al., 2001; Tzenova, Kaplan, Petryshen, & Field, 2004; de Kovel et al., 2008; Rice et al., 2009; Rice et al., 2008 2q36.3 TM4SF20 SLI Wiszniewski et al., 2013 6q11.2–q12 Dyslexia Petryshen et al., 2001 7q31–7q36 SLI Monaco et al., 2007 13q21 SLI Bartlett et al., 2002 16q23-24 CMIP, SLI SLI Consortium, 2002; SLI consortium 2004 ATP2C2

18p11.2 MC5R, Dyslexia Fisher et al., 2002; Bates et al., 2007; Seshadri DYM, et al., 2007; Poelmans et al., 2011; Scerri et NEDD4L, al., 2010 and VAPA 19q13.13- SLI SLI Consortium 2002 12.41 Xq27.3 FMR1 de Kovel et al., 2004; Platko et al., 2008; Huc- Chabrolle et al., 2013

5. LIMITATIONS OF PREVIOUS STUDIES

To date, most speech sound disorder related research has focused primarily on

FOXP2 or performed linkage analyses on region previously association with comorbid

conditions. There are shared genetic components between SSD and dyslexia (Stein et al.,

2004, Stein et al., 2006; Miscimarra et al., 2007); however, it has yet to be determined in

a cohort ascertained specifically for SSD, if there are unique genetic components.

Additionally, agnostic studies have been limited by small sample sizes with the

maximum being n=24, making it difficult to draw conclusions regarding the

pathogenicity of variants.

This dissertation will addresses these limitations by performing a genome-wide

association study using 721 ascertained for speech sound disorder and using pathway

analyses to simplify/make sense of seemingly disparate results.

32 CHAPTER 3: Methods

This chapter will discuss phenotypic and genetic data collection and quality control methods that are relevant to all remaining chapters, the basic statistical theory for each aim, and the software chosen to address the research questions. Chapters 4, 5, 6, and 7 will address any issues specific to the aim discussed therein. The overall study design and workflow is outlined in Figure 3.1.

Figure 3.1 Overall study design and workflow Figure 3.1 Overall study design and workflow

1. PHENOTYPE DATA

1.1 Overall Study design

The data are from a longitudinal study of SSD in which 4-6-year-old children with SSD were referred by speech-language pathologists in Northeast Ohio. Families are ascertained through a proband diagnosed with speech sound disorder. Diagnosis is based

33 on a score at or below the 10th percentile on the Goldman-Fristoe Test of Articulation

(GFTA) (Goldman and Fristoe, 1986) and on the production of at least three errors on the

Khan-Lewis Phonological Analysis test (KLPA) (Khan and Lewis, 1986) (Chapter 2; p.

17). Additionally, to eliminate the possibility of SSD due to other comorbidities, children

must have normal hearing, normal peripheral speech mechanism (z-score within 1

standard deviation of the normative reference on the Total Function and Total Structure

subscales of the Oral and Speech Motor Control Protocol) (Robbins and Klee, 1987), an

IQ>80 on the Wechsler Preschool and Primary Scale of Intelligence (methods adapted

from description by Stein et al., 2004). Probands and their siblings are given a battery of

tests (Table 3.1) described to measure endophenotypes of speech sound disorders.

Family history of speech, language, and reading disorders as well as psychiatric disorders

were also collected. For an example pedigree see Figure 3.1.

Figure 3.2 Example family participating in the study. Colored boxes indicate affection status. 515 is the proband affected with speech, language, and reading impairments. Through him, his siblings 516- 518 were recruited into the study. His father, 561, has or had reading disability and his mother is or was affected with speech and reading impairments. A=Affected, U=Unaffected 3.2

34 Table 3.1 Tests used in current study and the phenotype interrogated (Adapted from Lewis et al., 2005) Articulation Goldman-Fristoe Test of Articulation1,2 Khan-Lewis Phonological Analysis1 Conversational speech sample1,2 Phonology Comprehensive Test of Phonological Processing Nonsense Word Repetition Test1,2 Multisyllabic Word Repetition1,2 Speech Error Phrases1,2 Semantic/Syntactic Measures Test of Language Development-Primary 21 (TOLD-P2) Clinical Evaluation of Language Fundamentals-32 (CELF-P) Written Language (7-12 years old only) Woodcock Reading Mastery Test 2 (WRMT) WAIT- Reading Comprehension2 Test of Written-Spelling2 Nonverbal intelligence Weschler Preschool and Primary Scale of Intelligence-Revised or (WPSI) Weschler Intelligence Scale for Children- 3rd Edition subtests (WISC) Oral Motor Measures Oral and Speech Motor Control Protocol1 Fletcher Time-by-Count2 1 Administered to 4-7 year olds 2 Administered to 7-12 year olds

Table 3.3 Basic demographics of all individuals in the cohort as of February 2016 n (%) Basic Information 1732 (total) Male 960 (55) SSD=1 418 (31.4) Language=1 607 (36.7) Reading=1 299 (18.1) Family Information Number families 416 Siblings of proband 519 Parents 702 Grandparents 21

35 2. TESTS FOR QUANTITATIVE TRAITS

2.1 Articulation and motor control i. Fletcher Time by Count (Fletcher 1972): Examines the mechanical limit of speech production. According to the developer, the speech mechanism is similar to a machine – it has weights, levers (mandible and hyoid), and devices that produce sound (muscles and nerves), and consequently, there must be a mechanical limit. Fletcher argues that this limit is the rate at which the structures can perform. (Fletcher, 1972): Individuals repeat single syllables such as /pʌ/ and multiple syllables such as /pʌtəkə/as many times as possible in 20 seconds. The test was normed on 384 school aged children, but we use raw scores in the development of our z-scores. ii. The Goldman-Fristoe Test of Articulation (Goldman and Fristoe, 1986) is a series of pictures with target words that tests children’s ability to produce 39 sounds of the English language in various locations of the word (initial, medial, final) (Goldman and Fristoe,

1986). For example, a card may have an image of a yellow duck that says quack which requires the examinee to produce an initial /j/ (as in yellow), /d/, /kw/, and a final /k/.

The test is scored by counting the total number of errors which can be converted to a standard score developed based on test results of 1,723 females and 1,798 from 2;0 to

21;11.

2.2 Language i. Expressive One Word Picture Vocabulary Test (EOWPVT) (Garner, 1990)- Expressive language is the ability to verbalize thoughts. In this test, individuals are required to name a target object, action, or idea that is illustrated in a picture easel. The test was standardized on individuals 2-80 years old.

36 ii. Peabody Picture Vocabulary Test (PPVT) (Dunn and Dunn, 1997) is a test of receptive vocabulary that requires the examinee to select one image that best matches the stimulus from a group of four. The test was standardized on individuals 2 to 90. iii. Weschler Individual Achievement Test-Listening Comprhension subtest – (WIATLC)

(Weschler, 2011): This test assesses receptive language in two ways. The first is a picture easel test like PPVT. In the second, the examiner tells brief stories then asks the examinee to explain why something is important, to remember specific details, and to develop hypotheses about the story. This is a more practical test than EOWPVT because it involves syntax.

2.3 Phonology i. Multisyllabic word repetition (MSW): This test requires children to accurately sequence phenomes by repeating multisyllabic words. In order to complete this task, individuals must encode phonologic information with target words such as aluminum, thermometer, sympathize (Catts, 1986). The test is scored by determining the percentage of words repeated correctly. ii. Nonsense word repetition (NSW) (Kamhi & Catts, 1986): This test requires children to encode the information they hear and then repeat it; deficits in encoding would result in mistakes in word repetition. An example of a nonword is /rəbesɪt/. The test is scored by determining the percentage of words repeated correctly. NSW can discriminate adults with resolved SSD from those who never had SSD.

37 2.4 Reading i. Woodcock Reading Mastery Test-Revised Word Attack subtest (WRDATK)

(Woodcock, 1987): Individuals must read a list of 45 non-words; the test includes

“words” such as ip, din, ceisminadolt, and gnouthe to assess phonetic decoding skills. ii. Woodcock Reading Mastery Test- Revised, Word Identification Subtest (WRDID)

(Woodcock 1997): Individuals must read a list of 106 real words.

2.5 Spelling i. Test of Written Spelling is similar to a spelling test that would be administered in school and requires subjects to spell dictated words in order of increasing difficulty (Larsen,

Hammill, and Moats, 1999).

1.3 Standardizing data

For analyses, we converted all scores to age adjusted z-scores using the procedure below

(Equations 3.1–3.3). We chose the first available observation of each trait for every individual within the study, even those without genetic data, to maximize our sample size. Using only the individuals without SSD and their ages, we calculated effect estimates for age and age squared. Age squared is included in the model because we suspect that there is a non linear relationship between age and the quantitative trait

(Qtrait).

2 Qtrait ~ β1* age + β2*age Equation 3.1

The beta estimates were then used to calculate z-scores for the affected individuals

2 Qtrait predicted ~β0 + β1* age + β2*age Equation 3.2 Z=Qtraitobserved-Qtraitpredicted/SEresiduals Equation 3.3

38 This method has been used elsewhere (Lewis et al., 2011; Wellman et al., 2011; Stein et al., 2014). The scores developed in this manner for each individual were the outcome measures for the analyses described in Chapter 4 and 5. If necessary, z-scores were transformed to satisfy normality assumptions of basic regression based on Q-Q plots

(Table 3.2 distributions in Appendix A). If constants were added, it was to make all original z-scores positive before transforming.

Table 3.4 Transformations of z-scores λ Articulation and motor control Fletcher Time By Count 0.5 (3+z)0.5 Goldman-Fristoe Test of Articulation log log(3+z) Language Expressive One Word Picture Vocabulary Test 1.5 (3+z)1.5 Peabody Picture Vocabulary Test NA Weschler Individual Achievement Test- Listening NA Comprehension Phonology Multisyllabic Word Repetition NA Nonsense Word Repetition NA Reading Woodcock Reading Mastery- Word Attack 2 (8+z)2 Woodcock Reading Mastery- Word Identification 1.5 (8+z)1.5 Spelling Test of Written Spelling NA

3. GENETIC DATA

Saliva samples were collected for all willing family members using Qiagen (saliva) collection kits; blood was also collected from some families. DNA was extracted, prepared for genotyping, and genotyped at Case Western Reserve University.

2.1 Genotyping

All genotyping was performed using Ilumina Omni Chips. Due to rapidly evolving technology, five chips were used (Table 3.5)

39 Table 3.5 . Genotyping data for the current study. Numbers reflect non-failed samples prior to quality control procedures Chip Technical Name Individuals Families Omni 2013 HumanOmni2.5Exome Chip 8v1_A 609 144 Omni 2014 HumanOmni2.5Exome Chip 8v1-1-A 105 34 Omni Express Omni Express 47 28 Omni 8 HumanOmni 2.5-8 40 10 Omni4 HumanOmni 2.5-4 16 12

Table 3.6 Chip characteristics summarized from Illumina documentation # Markers Variation Chip Technical Name 1000 Genomes Exonic captured (MAF >0.05)

Omni 2013 HumanOmni2.5Exome CEU=0.83 Chip 8v1_A/ CHB+ JPT=0.83 ~2.5 million 240,000 Omni 2014 HumanOmni2.5Exome YRI=0.65 Chip 8v1-1-A CEU= 0.83 HumanOmni 2.5-8/ Omni 8/Omni4 2,338,671 0 CHB+JPT=0.83 HumanOmni 2.5-4 YRI=0.65 CEU= 0.73 Omni Express Omni Express 715,000 0 CHB+JPT= 0.74 YRI= 0.40

2.1.2 Genotype quality control

Prior to completing any analyses, a considerable amount of quality control was necessary. We followed stringent quality control procedures outlined in Guo et al., 2014.

Briefly, the markers are filtered for characteristics such as cluster separation and those not passing the filter are visually inspected and manually reclustered, if possible. This procedure saves SNPs from removal. Following cluster QC and conversion to the plus strand, minor allele frequency dependent call rate filters were applied (Table 3.7). Non- autosomal SNPs (X and Y) were removed prior to analysis because the sex chromosome builds for phasing and imputation were not stable, thus we restricted our analysis to the autosomes. On the X and Y chromosomes there are psudoautosomal regions that map to

40 analysis. The exclusion of the X and Y chromosomes is a limitation of this study especially because anecdotal evidence indicates there are more affected males than females.

Table 3.7. SNP quality control summary Number SNPs Omni Omni Omni 2013 Omni 8 Omni 4 2014 Express Total SNPs, pre QC 2503734 2583315 2344747 2442829 716356 0.010.99; 0.050.97; MAF>0.1, call rate >0.95 1654341 1653089 1432120 1211593 666880

Failed Cluster QC in GenomeStudio 63773 7730 4581 15801 2951 Non Autosomal 38010 39107 31105 26668 15789 Duplicate Markers or triallelic 27222 27603 2380 2062 3 Hardy Weinberg Equilibrium 9 0 0 0 0 (p<1x10-20)* Mendelian Error** 0 0 0 0 0 Total # SNPs 1594290 1586379 1398635 1182863 651088 * Only removed extreme violation of HWE ** SNPs with Mendelian errors were zeroed for correct individuals. If removal of the SNPs for individuals resulted in low call rate for the SNP, it was subsequently removed.

Table 3.8 Individual quality control summary Number individuals removed Omni Omni Omni 2013 Omni 8 Omni 4 2014 Express Total, pre QC 651 112 40 16 48 Call rate >0.98 38 7 0 0 1 Twins 3 0 1 0 0 Planned 10 0 0 0 0 replicates Relationship 14 0 0 0 0 errors (non- resolvable) Excess 51 5 2 1 4 heterozygosity* Sex mismatch 0 0 0 0 0 Total 598 105 40 16 47 (313 Male) (53Male) (26 Male) (7 Male) (26 Male) Total families 148 60 8 4 45 *Did not remove becase there was no evidence of sample contamination and doing so would further reduce an already small sample

41 3.3 Phasing

Phasing increases the speed and accuracy of imputation by haplotypes estimating haplotypes of the genotyped data. We used SHAPEIT2 to check for strand congruity between our data and hg19 (reference genome) and individually phased each chip prior to imputation (Delaneau et al., 2012; Delaneau et al., 2014).

3.4 Imputation

Data were imputed to the Phase 3, mixed reference option, of the 1000 Genomes Project using the University of Michigan Imputation server which implements minimac3 (Howie et al., 2012). Following imputation, all markers with R2<0.7 and MAF<0.05 in our population were removed. Following imputation, the genotypes are continuous values between 0 and 2 known as dosage because there is uncertainty in the imputation. These continuous values do not reflect biology because it is impossible to carry non whole number alleles.

3.5 Finalizing data

All genetic data were reduced to the overlapping imputed markers across all five chips,

5,078,482 markers, and merged for all subsequent analyses. Additionally, due to small sample size and reduction in power when adjusting for PCs, all non Caucasian individuals were removed from the sample (For ancestry principle components see

Appendix A- Fig 6).

4. ANALYTICAL METHODS

4.1 Aim 1: Genome wide association study

We hypothesize that there is a common variant captured by the GWAS arrays or imputed markers that is associated with SSD endophenotypes without confounding comorbidities.

42 4.1.1 Statistical model

Genome wide association studies (GWAS) perform single association of each marker present in a data set with either a continuous or dichotomous outcome. They grew in popularity in the early 2000s but have been criticized for failing to replicate, not explaining enough phenotypic variance, and not providing any meaningful biological insights (Vissher et al., 2012). Despite these criticisms, GWAS have been successful in identifying loci for dyslexia (Roeske et al., 2011); therefore this analytical method will be used. Importantly, this is the first known genome wide association study for SSD.

In general, the model follows Equation 3.4.

+ e Equation 3.4

Where

y: quantitative trait

: intercept (interpretability dependent upon centering) G: matrix of genetic component. Because the data are imputed, this is dosage instead of allele copy number

: matrix of regression coefficients for genotype X: matrix of covariates such as age, sex, comorbidities

: matrix of regression coeffcients for covariates The hypothesis test is simply:

H0: 0

H1: 0 This simple equation is then extended to all SNPs and multiple covariates.

4.1.2 Inheritance model

An important consideration in GWAS and any genetic study is selecting an inheritance model—additive, recessive, dominant, or co-dominant. As expected, power to detect variants is maximized when the designated model is the true model (Lettre et

43 al., 2007). For SSD, however, the inheritance model is unclear. Currently, it is thought that additive variance accounts for most of the genetic variance of traits (Hill et al., 2008;

Zhu et al., 2015). Also, previous work in reading and language traits also used an additive model (Gialluisi et al., 2014). In this analysis, we are using an additive model, but it is necessary to remember that the determining the mode of inheritance may be an important component to understanding SSD.

4.1.3 Accounting for family structure

This study is based on family data, thus we must account for the covariance among family members. There are two possible methods to account for relatedness; they are generalized estimating equations (GEE) and linear mixed models (LMM). In LMM, the random effect is family relationship, and the fixed effects are effect of interest and any covariates (general form Equation 3.5).

Equation 3.5

Where

y: continuous outcome X: matrix of covariates, : column vector of regression coefficients, : random effects, : coefficients for the random effects. In practice is assumed to be ~0, where G is a variance covariance matrix, in this case the kinship matrix. The kinship matrix is specified use actual genetic data and not just expected values. LMM was chosen over GEE because LMM, as operationalized in the GCTA software (Yan et al., 2011; Yan et al., 2014) more precisely accounts for the correlation within families than GEE.

44 4.1.4 Multiple testing correction

Because over two million individual tests will be performed, it is necessary to

correct for multiple testing. Common corrections include Bonferroni, α/nindependenttests;

1/n Šidák, 1-(1- αdesired) (Šidák, 1967); false discovery rate (FDR) is an alternative method

that corrects for the number of false positives and thereby estimates the number of

positive results that are, in fact true positives (Benjamani and Hochberg, 1995).

First it is necessary to determine the number of independent tests; however, due to

linkage disequilibrium, each individual test is not truly independent. Those SNPs in

close proximity are likely to be inherited together, whereas those many bases apart may

be separated by crossing-over events. Recently, Li et al (2012) addressed this issue by

calculating the number of independent SNPs within each LD block in the genome. The

group also determined the significance threshold for HapMap data (Table 3.4). Because

we will be imputing using HapMap and our sample includes African American

individuals, the most conservative cut-off for a genome wide significance will be

3.44x10−8 (Table 3.9).

Table 3.9. Significance threshold for HapMap. Based on the work of Li et al., 2012 Population # SNPs Independent Significant p Highly significant p CEU 2,776,528 820,888.14 6.09x10−8 1.22 x10−9 YRI 3,114,362 1,452,799.72 3.44 x10−8 6.88 x10−10

4.1.5 Software

We performed our association study using MLMA in GCTA. GCTA is a suite of

software designed for analysis of complex traits and is computationally efficient (Yan et

al., 2011; Yan et al., 2014). MLMA-GCTA performs a mixed model association analysis

and accounts for family structure using the Genetic Relationship Matrix (GRM). The

GRM is estimated based on SNPs present within the sample which provides greater

45 accuracy than using expected kinship coefficients. GCTA was selected over GWAF,

Genome Wide Association Studies in Families, which has been used previously by

Lewis, Iyengar and Stein, because of its computational speed.

4.2 Aim II- Pathway analysis

4.2.1 Introduction

Genome wide association studies results are not always easily interpreted and may miss biologically relevant associations due to stringent multiple testing corrections requiring highly significant results (Manolio et al., 2008). Furthermore, even if there are markers that reach genome wide significance, it is important to consider that genes do not act in isolation but rather are a part of complex, regulated networks and pathways. One way to address these concerns is by using the results from a GWAS to perform pathway based analyses. By definition, pathways are distinct from networks because pathways have a clear directionality and can be divided into molecular pathways (e.g. glycosphingolipid synthesis), cellular pathways (e.g. axon guidance), and system pathways (e.g. immune response) (Ramanan et al., 2012). Pathway-based approaches attempt to fill the genotype-phenotype gap using GWAS results, even those not obtaining genome significance, to identify biologically meaningful pathways that are enriched for association with a phenotype (Kim and Przytcka, 2013).

4.2.2 Considerations

Pathway analysis can be an informative tool, but in order to understand the results, it is necessary to consider the biases and limitations (Wang et al., 2010). There are multiple biases inherent in pathway analysis due to the structure of the , size and coverage of a pathway, and the database(s) being used to generate

46 pathways which are themselves subject to publication bias. If a pathway is large and well covered, it by chance alone, is more likely to appear to be enriched than a smaller pathway. Pathway analysis relies on databases such as KEGG, and consequently, on what is known, and thus what is published; therefore, there is information bias inherent in this type of analysis. Describing negative results must be done cautiously. For example, what does it mean if a pathway is not enriched? Lack enrichment may be due to an incomplete picture of the underlying pathway and not lack of biological relevance. With these considerations in mind, this strategy was used to provide insight into SSD that is not possible by looking at single GWAS hits.

4.2.3 Software selection

There are countless software packages and web portals available for pathway analysis but PARIS—Pathway Analysis by Randomization Incorporating Structure—was used in this work. A complete theoretical explanation of the software is available in

Yaspan et al., 2011 and Butkiewicz and Cooke Bailey, 2016, but a summary of the key details will be provided. PARIS achieves pathway analysis by defining features within the genome and classifying pathways by their features. Features are either linkage disequilibrium (LD) blocks or regions of linkage equilibrium. Significance is assigned if there is a marker with p <0.05 in the feature. After determining the number of significant features within a pathway, the entire genome is permuted by randomly selecting LD and

LE features to match the distribution in the target pathway. This permutation method is notable because it accounts for the often ignored underlying LD structure, and in so doing hopefully generates the correct null distribution (Wang et al., 2010; Yaspan et al., 2011).

The number of permuted pathways with more significant features than the actual pathway

47 determines the of a given pathway (Yaspan et al., 2011). The same permutation approach is used to assess the significance of individual genes. PARIS does not exclusively account for family structure because it uses GWAS p values. The GWAS itself adjusts for the presence of families, but if a single or a few families were driving a

GWAS signal they may also be driving the association between a pathway and a given trait.

48 CHAPTER 4: Aim I- Genome Wide Association Study

1. Introduction

A comprehensive understanding of the genetic architecture of childhood speech sound disorders (SSD) remains to be developed. Although SSD have been studied for well over 50 years (Morley, 1957), the body of research is not as comprehensive as for other communication disorders such as dyslexia especially in relation to genetics.

The first genome wide association study was reported over a decade ago in 2005, and since then there have been over 1,800 GWAS published, but to our knowledge, there has never been a genome wide association study on individuals ascertained for SSD

(Zeng et al., 2015).

Through the GWAS approach, we aimed to gain insight into the genetic architecture of the disorder and as a result normal communication and develop hypotheses for future work. Because this is the first GWAS of SSD, we expect to identify novel loci.

2. Methods

We performed a genome wide association study using genotyped and imputed data for individuals from the Cleveland Family Study, a study of childhood speech sound disorders. All individuals were genotyped on Illumina Omni arrays (see Chapter 3 for detail) and genotyping was performed at Case Western Reserve University in Cleveland,

Ohio. Quality control was performed on both the SNP and individual level resulting in a final sample of 601 individuals and 5,078,482 markers.

49

2.1 Trait Selection

We selected quantitative traits that capture the endophenotypes, measureable components of speech sound disorder (Table 4.1). For full description see Chapter 2.

We also analyzed the binary trait of SSD diagnosis, but it will not be discussed in detail.

The binary trait is subject to recall bias by parents not remembering their or their children’s medical history and not reporting SSD. Additionally, SSD is quite heterogeneous; using endophenotypes is a more precise measure which is preferable for genetic studies.

Table 4.1 Test used in the analyses divided by endophenotype. For sample characteristics see Table 4.2 Abbreviation Articulation and motor control Fletcher Time By Count Goldman-Fristoe Test of Articulation GFTA Language-expressive Expressive One Word Picture Vocabulary Test EOWPVT Language- receptive Peabody Picture Vocabulary Test PPVT Weschler Individual Achievement Test- Listening WIATLC Comprehension Phonology Multisyllabic Word Repetition MSW Nonsense Word Repetition NSW Reading Woodcock Reading Mastery- Word Attack WRDATK Woodcock Reading Mastery- Word Identification WRDID Spelling Test of Written Spelling TWS

3. Sample

The individuals included in each analysis and sample size vary by data availability (Table

4.2). There is correlation between the test scores which may help explain genetic similarities (Table 4.3). We removed parents from the analysis for all traits except MSW and NSW (Appendix B).

50 Table 4.2 Summary statistics for quantitative traits used in the analysis 1=affected Sample RD=1 Abbreviation SSD=1(%) LI=1(%) Female(%) Mean/median Z Age Range Median size (%) Articulation and motor control Fletcher Time By Count 315 194 (61.8) 102 (33.0) 102 (33.0) 130 (41.3) 1.81 (0.34)* 5-36.5+ 11.67 Goldman Fristoe Test of GFTA 334 214 (64.3) 110 (33.8 75 (23.1) 135 (40.4) 0.61 (0.60)* 2.4-20 9.33 Articulation Language Expressive One Word Picture EOWPVT 347 220 (63.6) 114 (33.7) 80 (23.7) 144(41.5) 7.60 (3.07)* 2.75-36.50 10.43 Vocabulary Test Peabody Picture Vocabulary Test PPVT 353 223 (63.4) 116 (33.7) 83 (24.2) 145 (41.1) -0.03 (1.06) 2.75-36.50 10 Weschler Individual Achievement WIATLC 266 161 (60.8) 88 (33.6) 73 (27.9) 103 (38.7) -0.17 (1.10) 5.25-28.33 11.04 Test- Listening Comprehension Phonology Multisyllabic Word Repetition MSW 438 245 (56.2) 126 (29.4) 98 (23.0) 245 (56.2) -0.74 (1.50) 1-52.08 13.44 Nonsense Word Repetition NSW 438 245 (56.2) 126 (29.4) 98 (23.0) 245 (56.2) -0.52 (1.23) 1-52.08 13.44 Reading Woodcock Reading Mastery- Word WRDATK 320 194 (61.2) 103 (33.0) 81 (26.0) 130 (40.9) 60.69 (17.58)* 5.25-36.50 11.67 Attack Woodcock Reading Mastery- Word 5.25-36.50 WRDID 318 196 (61.4) 104 (33.1) 82 (26.1) 131 (40.9) 21.82 (5.43)* 11.66 Identification Spelling Test of Written Spelling TWS 300 182 (60.9) 97 (33.0) 78 (26.5) 123 (41.0) 21.76 (5.29)* 5.25-18 11.25

* z- scores were transformed to meet normality assumptions. The theoretical mean and SD were not calculated, but the empirical SD in the table can be used to roughly interpret the effect estimates

51 Table 4.3 Correlation (R2) between the quantitative traits analyzed. Hue indicates the strength of relationship; the darker the box, the stronger the correlation. Motor Phonology Language Reading Spelling Fletcher GFTA MSW NSW EOWPVT PPVT WIATLC WRDATK WRDID TWS Fletcher 1 0.1 0.07 0.11 0.16 0.07 0.05 0.11 0.12 0.08 Motor GFTA 1 0.22 0.18 0.04 0.04 0.03 0.05 0.02 0.03 MSW 1 0.64 0.07 0.15 0.11 0.23 0.17 0.25 Phonology NSW 1 0.07 0.14 0.09 0.18 0.16 0.26 EOWPVT 1 0.44 0.2 0.13 0.16 0.11 Langauge PPVT 1 0.26 0.21 0.27 0.27

WIATLC 1 0.24 0.29 0.28 WRDATK 1 0.62 0.61 Reading WRDID 1 0.64 TWS 1 Spelling

52 5. Statistical Procedures

For a complete discussion of Model Selection, see Appendix B. Briefly, all models were linear mixed models, were adjusted for sex, and all quantitative traits were transformed to ensure normality. We used MLMA in GCTA for all association analyses and the model is below (Yang et al., 2011; Yang et al., 2014).

QuantitativeTrait ~ Chip + Sex + Dosage + g + e Equation 4.1

Where Dosage is the predictor of interest and a fixed effect. g is the polygenic and the random effect with covariance based on the genetic relationship matrix (Yang et al.,

2014).

Inflation of test statistics was measured using the genomic control procedure

(Devlin et. al, 1999), all λ values were 1 or below, and chi square statistics were not corrected. Results were annotated using ANNOVAR using phastConElements46way to annotate variation (Wang et al., 2010).

6. Results

There are no loci that obtain our genome-wide significance (p<5x10-8); however there are loci that are suggestive (p<10-5). Within the suggestive loci, there are no exonic

SNPs, and following annotation with ANNOVAR (Wang et al., 2010), none of the top markers had any conservation based annotation ( phastCon,SIFT, Polyphen, FATHMM,

GERP, Phylop, PROVEAN, VEST3, MutationTaster etc.) nor were they in transcription factor binding sites. Minor allele frequencies are reported as the frequency of the alternate allele in Phase3 of the 1000 Genomes Project.

The most significant locus for each endophenotype will be described, but full results are listed in Appendix B. SNPs/loci that are associated with lower quantitative

53 trait scores are referred to as having a negative effect, while those associated with higher scores are referred to as having a positive effect. Fletcher Time by Count is the exception and high scores correspond to poor performance. Risk/protective is not used because we do not necessarily have causal alleles.

6.1 Replication of genes previously associated with SSD

Only ATP2C2 had markers that were genome-wide suggestive. The most significant marker for each gene is reported below (Table 4.3).

Table 4.4 Most significant marker for genes previously associate with SSD or childhood apraxia of speech that did not replicate Gene Marker P value Phenotype ASPM chr1:197067131 0.026 MSW rs10801589 0.01 Binary trait- Speech ATP13A4 rs2280476 0.002 WIATLC ATP2C2 rs193704 3.63x10-7 WRDID AVPR1A rs1969160 0.0001 TWS CNTNAP2 rs2538976 1.39x10-5 EOWPVT CNTNAP1 rs3826427 0.054 Fletcher CYP19A1 rs11070842 0.002 GFTA DRD2 chr11:113292326 0.006 PPVT FOXP2 rs9969232 0.01 MSW rs2030915 0.008 Binary trait- Speech KIAA0319 rs2744605 0.005 NSW SETX rs514279 0.001 EOWPVT

6.2 Articulation and Motor Control

Fletcher Time by Count- The most significant locus is an intergenic region on chromosome 1q43 (chr1:242764961) that is 76.9 kb downstream of PLD5, a phospholipase, and 454 kb upstream LINC01347, a long noncoding RNA (rs10926785, p=1.73x10-6, β=0.158, MAF=0.24) (Figure 4.1). The top SNP is a risk locus meaning for

1 unit increase in dosage (that is every alternate allele a person carries), the transformed

54 Fletcher score increases by about half a standard deviation (note this is a transformed score so the expected SD is not equal to 1). It is in LD with SNPs that have the same effect but in the opposite direction.

55

GFTA- The most significant locus is on chromosome 9q33.1 and is located in PAPP-A,

pregnancy associated A (chr1:119093923, rs2273977, p= 4.38x10-6, β = 0.242,

MAF=0.294) (Figure 4.2). The top SNP has a positive effect causing a 0.4 standard deviation increase in transformed GFTA, but it is in LD with risk SNPs.

56

Figure 4.2 7.2. Language

EOWPVT- The most significant locus is a broad peak on chromosome 5q15 (chr5:

97929333) 923 kb from LINC01340 and 176kb from RGMB, a repulsive guidance

molecule (rs191730, p= 4.44x10-7, β =-1.458 ± 0.289, MAF=0.75) (Severyn et al.,

2009)(Figure 4.3). This locus has a negative effect causing a 1/3 standard deviation decrease in EOWPVT score for each alternate allele. The alternate allele is common with

57 an MAF=0.75.

Figure 4.3

PPVT- The most significant marker is rs847926 which is on 7q21.3 (chr7:12527486)

83.6 kb downstream of VWDE and 82.7 upstream SCIN (p= 5.52x10-6, β = 0.381±0.084,

MAF=0.43) (Figure 4.4). It is a protective locus, that increases PPVT z-score by less

58 than a standard deviation per alternate allele.

Figure 4.4 WIATLC- The most significant marker is an intronic SNP, rs11725311, in SLC39A8

which encodes ZIP8, a plasma membrane transporter (p=3.76x10-7, β=-0.514±0.101

MAF=0.34)(Rivera-Mancía et al., 2011) (Figure 4.5). The top SNP is a risk locus as are the SNPs in LD and each additional alternate allele at this locus decrease WIATLC scores by half a standard deviation .

59

Figure 4.5 Shared Locus- The correlation between EOWPVT and PPVT test scores is 0.44 (Table

4.3), and they share a risk locus on chr1:62.1-62.2 Mb which includes TM2D1, a beta-

amyloid peptide-binding protein (Kajkowski et al., 2001). The most significant shared

-6 SNP is in the 3’ UTR of TM2D1 (rs1286628, pEOWPVT=1.12x10 , β EOWPVT= -

-6 1.38±0.277, pPPVT=6.24x10 , βPPVT =-0.427±0.1, MAF=0.27) (Figure 4.6)

60 Figure 4.6

61 7.3. Phonology

MSW- The most significant marker, rs2327825, is in an intergenic region on

chromosome 6p23 between LINC01108 (308 kb upstream) and JARID2 (652 kb

downstream), which transcribes a transcriptional repressor during embryonic

development (p= 2.85x10-6, β = -0.685±0.146, MAF=0.835) (Bergé-Lefranc et al.,

1996)(Figure 4.7).

Figure 4.7

62 NSW- The most significant locus is on 10q23.1 and includes ANXA11, DYDC1/DYDC2,

FAM213A, and MAT1A (Figure 4.8). The most significant marker is rs3120977 (p=

4.06x10-7, β =-0.46±0.09, MAF=0.69). The wide region of association may make it seem as though there is a CNV, but there is no CNV in this region based on a GenomeStudio

CNV analysis which uses signal intensity to estimate copy number..

Figure 4.8

63 Shared Locus- MSW and NSW share a suggestive locus on 13q12.2 which is an intergenic region between POLR1D and GSX1. The most significant shared SNP is

-6 -7 rs1231021 (pMSW=3.55x10 , β MSW=0.50±0.10 pNSW=4.51x10 , βNSW=0.50±0.09,

MAF=0.39) (Figure 4.9).

7.4. Reading

WRDATK- There are two strong loci. The first is on 14q32.2 located in SETD3, a histone methyl transferase, with the most significant SNP in the 3’UTR (rs1047351, p=3.80x10-7, β =7.54±1.48, MAF=0.47) (Eom et al., 2011)(Figure 4.10). Half of the

SNPs have a negative effect. The other locus is on 17q12 in introns of ARGHAP23, a

Rho GTPase (rs12949691, p=2.48x10-7, β = -7.55±1.46, MAF=0.34) (Katoh and Katoh,

2004) (Figure 4.11). Each of these markers has a negative effect.

64 Figure 4.9

65

Figure 4.10

66

Figure 4.11

67 WRDID - The most significant SNP is located on 16q24.1 within ATP2C2 , a calcium

transporter (rs193704,p=3.63x10-7, β =2.466±0.485, MAF=0.44) (Micaroni, 2012)(Figure

4.12).

Figure 4.12 Shared Locus- The reading traits share a significant risk locus on 7p22.3 in IQCE, a

-6 -6 ciliary protein (pWRDATK=4.53x10 , βWRDATK= -6.56±1.43; pWRDID=2.53x10 ,

βWRDID= -2.093±0.445, MAF=0.67) (Harris et al., 2014) (Figure 4.13).

68 Figure 4.13

69 7.5. Spelling

TWS- The most significant SNP is rs28812505 which is located on 7q31.32 is in SPAM1, an that cleaves hyaluronic acid (p=6.73x10-7, β = -0.803±0.162, MAF=0.13)

(Reese et al., 2011) (Figure 4.14). The most significant SNP and those in LD with it have a negative effect, and each alternate allele reduces TWS scores by almost a whole standard deviation.

Figure 4.14

70 8. Discussion

To better understand the genetic architecture of speech sound disorders, we performed a genome wide association study and identified intronic and intergenic loci suggestively associated with endophenotypes of SSD. In relation to the reference allele, these loci have a positive effect for language (PPVT), phonology (MSW and NSW shared), and reading (WRDATK, WRDID), and conversely, they have a negative direction of effect for Articulation (GFTA), language (EOWPVT, WIATLC), phonology

(MSW and NSW), and spelling (TWS). Highly correlated traits shared loci which does not necessarily indicate a shared genetic basis but may be highlighting the correlation.

The findings corroborate past research as well as point to neural development and other neuropathies and highlight the complex genetic architecture of SSD.

8.2 Relationship with candidate genes

Because this is the first GWAS on a sample ascertained for SSD, there is not a wealth of loci available to check for replication; however, there are genes that have been associated with SSD which we did not replicate (Table 2.4 and Table 4.4). There is no definite reason why we did not replicate genes previously associated with SSD or childhood apraxia of speech; however, our lack of replication supports a heterogenous, multifactorial inheritance model. Our approach was distinct in two ways; previous studies used the binary outcome rather than endophenotypes and they also used linkage or sequencing. Because we used a quantitative trait, we were asking a more refined question than previous studies. Moreover, because we have family sample, the individuals in our study may be more severely affected than individuals from standard case-control studies. Especially as it relates to sequencing, our method identified

71 statistical association whereas the sequencing studies only identified variants that were unique to affected individuals in small samples. Finally, and perhaps most importantly, it is unlikely that SSD are caused by a single gene thus each family may have unique variants that affect the same underlying network or pathway; replication may be challenging.

While no genes that have been associated with SSD replicated, our results support findings from other communication disorders. ATP2C2 was previously associated with nonsense word repetition in a cohort ascertained for specific language impairment

(Newbury et al., 2009). Here, ATP2C2 was associated with the Woodcock Reading

Mastery Test- Word Attack and Word Identification subtests with the same negative direction of effect as Newbury but with different SNPs. In this cohort, performance on phonological processing and reading tasks are only slightly correlated (R2=0.17).

Correlations between poor phonological skills and reading deficits are well-documented so it is not surprising that there is evidence of a common genetic basis (Tomkins and

Binder, 2003; Dillon and Pisoni, 2006; Kalashnikova et al., 2016).

CNTNAP2 has been associated with orthographic coding and non-word repetition in a dyslexic sample (Peter et al., 2011), non-word repetition, reading, and spelling in a cohort ascertained based on SLI (Vernes et al., 2008; Newbury et al., 2011), and autism spectrum disorder (Alarcón et al., 2008). In the present study, the SNPs in CNTNAP2 do not obtain our suggestive cutoff (p<1x10-5) but are close (most significant marker p=1.39x10-5 ) for Expressive One Word Picture Vocabulary Test, a language related trait.

In study of typically developing Australian children that assessed the association between

CNTNAP2 variants and language skills, two SNPs that were significantly associated with

72 low early language scores were also associated with low Expressive One Word Picture

-5 Vocabulary Test scores in our cohort (rs759178, pAUS= 0.0248 PCleveland= 6.58x10 ;

-5 rs2538976 pAUS=0.0535 pCleveland=4.39x10 ) (Whitehouse et al., 2011). Though intronic, these variants appear to influence typical language development.

8.2.1 Recapitulation of risk factors

In our study, the SLC39A8 is a risk locus for low scores on the listening comprehension task and is related to a known risk factor for SSD, galactosemia (Shriberg et al., 2011). The commonalities between disorders caused by deficient

SLC39A8 and galactosemia may provide insight into SSD etiology. Moreover, because both are treatable conditions, it is possible that some cases of SSD can be prevented.

SLC39A8 encodes ZIP8, a plasma membrane transporter of manganese

(primarily) and zinc, ions which are cofactors in many metabolic processes (Rivera-

Mancía et al., 2011). Deficiencies in ZIP8 result in glycolsylation disorders which ultimately cause severe intellectual disability, hyptonia, strabismus, and skeletal abnormalities (Bocott et al., 2015; Park et al., 2015). Transferrin, the serum glycoprotein iron transporter, is a biomarker for glycolsylation disorders, plays a role in oligodentrocyte differentiation through an unknown mechanism (Franco et al., 2015) and is required for myelin synthesis (Espinosa et al., 1999). Mn2+, the solute transported by

ZIP8, activates β-1,4- galactotransferase I and II which add a galactose group to the transferrin (Park et al., 2015). that disrupt manganese transport may lead to decreased functional plasma transferrin due to reduced galactosylation. Supplementation with galactose, which is converted to UDP-galactose, the substrate of β-1,4- galactotransferase, increases normally glycosylated transferrin (Park et al., 2015).

73 Galactosemia also affects glycosylation (Waggoner et al., 1990; Nelson et al.,

1991). In galactoseima, GALT, the enzyme that converts galactose into UDP-galactose is not functional leading to a build of galactose and hypoglycosylated transferrin (McCorvie et al., 2016). Galactosemia is generally diagnosed within the first week of life and, in contrast to disorders of glycosylation, is treated by total galactose restriction. UDP- galactose can be synthesized from UDP-glucose by epimerases, so a galactose-free diet prevents pathogenic build up of galactose and returns transferrin to normal.

The commonality between SLC39A8 and galactosemia is transferrin abnormalities, a fact which suggests that perhaps transferrin, iron, and related pathways are important in SSD pathogenesis.

8.3 Novel findings

Of the novel results, some variants have small effect sizes and minor allele frequencies of 0.40 or greater indicating that common variants likely contribute to SSD.

Granted, with our sample size, we were not powered to detect rare variants. Furthermore, our results are intronic and intergenic and, other than having rsIDs, are not annotated, so attributing pathogenicity to the variants is challenging. Nevertheless, to gain more a cohesive understanding of our results, we gathered the results into biologically similar groups that will be tested in future pathway analyses.

8.3.1 Neuron differentiation and axon migration- Results from language, reading, and spelling endophenotypes provide evidence that neuronal cell differentiation and axon migration play an important role in speech sound disorder etiology.

IQCE, the risk locus shared between Word Attack and Word Identification, could be necessary for adult neurogeneration in a brain region associated with plasticity. If this

74 brain region is not functional, learning may be impaired. The gene encodes a ciliary protein necessary for murine spermiogenesis but is also involved in cilia-mediated sonic hedgehog signaling (Ssh) (Harris et al., 2014; Pusapati et al.., 2014). IQCE has been studied in bone development where the transcribed protein localizes transmembrane to the base of the cilia, an activity which leads to the Ssh cascade (Pusapati et al.,

2014; Uhlen et al., 2015); however, both the RNA and the protein are universally expressed so it possible that IQCE regulates Ssh through ciliary action in other developing tissues, too, such as the brain. Without cilia mediated Ssh signaling, granule neural precursors fail to develop into neural stem cells in the subgranular zone of the dentate gyrus (Breunig eta l., 2008; Han et al., 2008). The subgranular zone of the dentate gyrus is the site of adult neurogenesis, and it is thought to play an important role in learning and memory (Squire et al., 2004; Jonas and Lismen, 2014). This area also has a role in visual processing, and since reading is obviously a visual task (Lee et al., 2012), insults to the dentate gyrus caused by deficits in Ssh signaling could cause the observed deficits in reading.

SPAM1, the spelling risk locus, also reflects the involvement of neuronal differentiation in SSD. SPAM1 encodes PH20, an enzyme most well- known as the one which digests the layer of cells surrounding oocyte but is also importantly a receptor in hyaluronic induced cell signaling (NCBI GeneID:6677). The biological function of hyaluronic acid (HA) and is not extensively characterized, but recently both HA and hyaluronidases have been identified in the nervous system. Hyaluronic acid (HA), the substrate and ligand of PH20, is involved cell migration during development, inflammatory response, wound repair, and cancer

75 pathogenesis; thus deregulation of its breakdown or signaling pathways could result in multiple pathologies (Chen et al., 1999; Toole, 2000; Slomiany et al., 2009). In the CNS,

HA both establishes a stem cell microenvironment, or niche, where stem cells are found prior to differentiation and is involved in cell differentiation and maturation (Preston and

Sherman, 2011). It follows, then, that SPAM1, a receptor for HA, may contribute to cell differentiation.

Finally, illustrating the possible importance of dendrite outgrowth in SSD etiology ARGHAP23, the gene associated with Word Attack, encodes a protein necessary for axon migration. The gene encodes a Rho GTPases which regulate processes related to the , cell polarity, microtubule dynamics, gene transcription, cell cycle progression, and vesicular transport (Etienne-Mannevilee and Hall, 2002; Katoh and Katoh, 2004). Through regulation of cytoskeleton dynamics, Rho GTPases regulate neurite outgrowth, retraction, and axon guidance; mutations in the Rho GTPases,

MEGAP and OPHN1, are associated with severe intellectual disability (Govek et al.,

2005).

8.4 Signaling- synapse function and myelination-

In addition to differentiation, suggestive loci for language and reading imply that nervous system signal transmission at synapses and through axons is involved in SSD.

First, ATP2C2, the gene previously reported by Newbury (2009), transports calcium and magnesium into the . Calcium plays an important role in neurotransmitter release, thus synapse function (Micaroni, 2012).

SETD3 is a transcription activator that is important in muscle cell differentiation and was associated with low Word Attack scores may also play a role in synapse function

76 (Eom et al., 2011; The UniProt Consortium). Homozygous SETD3 mouse knock outs display decreased prepulse inhibition—exhibiting a startle response even after priming with a prepulse—indicating aberrant synaptic function specifically in sensory motor contexts (IPMC-SETD3). PPI is observed in autism spectrum disorders and may mediate hypersensitivity to sensory information (Madsen et al., 2015).

While neuronal signals are transmitted at synapses, they are conducted by axons; functional axons are vital for a functional nervous system. Myelin insulates axons allowing signals to be conducted quickly and without excessive attenuation; a lack of proper myelination causes neurological disorders such as multiple sclerosis

(MedlinePlus). In relation to SSD, any aberrant myelination could result in the observed phenotypes. SPAM1, the risk locus for low spelling scores discussed above, is also associated with myelination. The encoded enzyme, PH20,a hyaluronidase, has been identified in demyelinating MS legions, and its products inhibit re-myelination. In preterm infants, a similar process occurs in hypoxia-induced tissue damage; the metabolites of PH20 inhibit regeneration of myelinating oligodendrocytes leading to aberrant myelination and to the observed neurological disabilities of preterm-infants, which will be discussed subsequently (Back et al., 2015). Another locus potentially related to myelination is SLC39A8 which was associated with listening comprehension.

Transferrin, the protein that is suboptimally produced if SLC39A8 is deficient, is required for myelin synthesis (Espinosa et al., 1999) and plays a role in oligodentrocyte differentiation, although the exact mechanism is not known (Franco et al., 2015).

77 The previous discussion highlighted loci that implicate synaptic and axonal signal transduction; however, a key component of all behaviors is plasticity in signal transduction. Neurons make new connections and synapses change responses over time.

The GFTA locus, pregnancy associated protein A (PAPPA), is associated with one form of plasticity, habituation. Behaviorally, habituation is a simple type of learning characterized by a decrease or complete cessation of response to a certain, repeated stimulus; the nervous system filters irrelevant information. pappaa, the zebra fish analogue of PAPPA, knock out fish exhibited a lack of habituation to an acoustic stimulus and displayed a persistent startle response (Wolman et al., 2015). Wolman et al. propose that the effect is due to changes in insulin like growth factor bioavilability and regulates signaling , a process that regulates neural plasticity (Fernandez and Torres-

Alemán, 2012). As it relates to articulation, PAPPA may mediate one’s ability to attend to relevant signals such as an adult speaker and ignoring environmental noise such as a dog barking while individuals are learning to speak.

Alternatively, PAPPA may have more general effects on learning and development through cleavage of IGFBP-4 and IGFBP-5 to IGF; recent work describes

IGF has roles in neural development, neural plasticity, and recovery from brain injury

(for full review see Dyer et al., 2016). In relation to development, PAPPA mRNA is highly expressed in the facial nucleus of prenatal brains and the left facial motor nucleus in adult brains (Harmonizome- Rouillard et al. 2016; Allen Brain Atlas). These regions in the brainstem are comprised of nerves that innervate muscles of facial expression including key articulators, the lips. Damage to this facial motor nucleus can result in

Bells palsy which is characterized by slurred speech (McCaffery, 2014). These realities

78 suggest that a dysfunctional PAPPA-IGFB pathway may lead to aberrant neuronal development, suboptimal innervation of key articulators, and ultimately, the observed deficits in articulation abilities.

8.5 Evidence of a relationship between embryonic development and speech sound disorders

Development, from fertilization to birth is a complex, precisely regulated process which includes implantation of the embryo into the uterine wall. If implantation fails completely, a miscarriage results, while if invasion is suboptimal, low birth weight, intrauterine growth restriction, and preeclampsia can result (Dugoff et al., 2004). GFTA and Word Attack were associated with the implantation in the uterine wall. First, PAPP-

A was a risk locus for low Goldman-Fristoe Test of Articulation scores, and the encoded protein is secreted by fetal trophoblast cells. Through interactions with insulin-like growth factor 4, PAPP-A has a mediates trophoblast invasion into the uterus (Handschuh et al., 2006; Christians and Beristalin, 2016) and regulates fetal size (David and Jauniaux,

2016). Though secreted by fetal cells, the protein is found circulating in maternal blood and is a biomarker of pregnancy; low levels of this protein are associated with preterm

(<37 weeks) birth (Dugoff et al., 2004; Pummara et al., 2015) . Pummara et al. suggest that the observed preterm births were due to poor invasion causing poor placentation.

PAPPA levels are also used as a biomarker for trisomy 21 (Aldred et al., 2015).

GFTA specifically tests articulation, and Croatian children born prior to 34 gestational weeks exhibited more total and more varied articulation errors than did peers born at term when tested at the age of 7-8 years (Kolundzic 2008). It should be noted that in our sample, there is one individual reported to have been born prematurely with a

79 GFTA z-score in the lower 25% of observations, but is it is unlikely that she alone is driving this association. Moreover, we did not explicitly collect data regarding premature births, so it is possible that there are other premature children in our group.

Next, the most significant locus for the Woodcock Reading Mastery- Word

Attack subtest is in ARHGAP23 which may be involved in implantation in addition to the previously described function in dendritic outgrowth. The transcribed Rho GTPase is highly expressed in trophoblast cells whose differentiation is mediated by the Rho

GTPases (Uhlen et al., 2015; Parast et al., 2001). During trophoblast development, there is extensive actin cytoskeleton—stress fibers— development and increase in cell motility to allow for invasion into the uterus (Parast et al., 2001). Just as with the GFTA locus, here there is evidence of the importance of adequate implantation.

There is only one identified premature child in the study, but we did not explicitly collect this information. Regardless, the mechanism leading to associations with trophblast invasion is entirely not clear. Implantation in the uterine wall is a necessary step in establishing placenta and nutrient transfer; perhaps even slight anomalies may result in a sub-optimal fetal environment and subclinical phenotypes at birth that do not fully manifest until later ages, as when learning to speak, for example. The protein encoded by the TWS locus, PH20 is involved in aberrant myelination in preterm children

(Back et al., 2015). In these children, damage is due to hypoxia caused by immature lungs, but it is possible that poor implantation can lead to poor placentation (placental insufficiency) which causes hypoxia in utero (Huppertz and Peeters, 2005; Morgan,

2014)

80 8.8 Shared architecture with other neuropathologies

TM2D1, the signal common to Expressive One Word Picture Vocabulary Test and

Peabody Picture Vocabulary Test, transcribes a beta-amyloid binding protein that mediates apotosis through a G protein cascade (Kajkowski et al., 2001). The substrate of

TM2D1, a beta-amyloid protein, is best known for its role in Alzheimer’s Disease pathogenesis, but its non-disease related functions are becoming more apparent. The precursor, amyloid precursor protein, has been implicated in neural cell differentiation and proliferation and neurite outgrowth (Nalivaeva et al., 2013; Dawins and Small,

2014). Receptors of metabolic products of amyloid precursor protein catabolism including TM2D1 could also be involved in differentiation and proliferation.

IQCE which was associated with Word Identification is important in cilia mediated sonic hedgehog signaling. , conditions in which cilia do not function properly, are characterized by phenotypes relevant to SSD—hypotonia, ataxia, psychomotor delay, and intellectual disability (Waters and Beales, 2011). The pathway mediated by cilia, sonic hedgehog signaling, is not fully understood but is important in the embryonic development of the brain, skeleton, muscles, lungs, and gastrointestinal tract (Matise and Wang, 2011), as well as in the maintenance of adult tissues (Alvarez-

Buylla and Ihrie, 2014)

8.6 Intergenic regions

While some of our loci fit easily into biologically relevant subgroups, those in intergenic regions are less straightforward to categorize. With the growth of the

ENCODE project and other research on supposed “junk DNA,” has become increasingly further evidence that the noncoding regions are important elements of the genome,

81 especially for regulation(for example Ecker et al., 2012; Neph et al., 2012). However, the exact function of the loci we identified is not known. The intergenic regions for

EOWPVT, for example, may be involved in regulating nearby RGMb which is involved in axon guidance and neural survival (Severyn et al., 2009). Similarly the MSW region is

650 kb upstream of JARID2, a gene that encodes a transcriptional repressor which is expressed in the nervous system during embryonic development (Bergé-Lefranc et al.,

1996). It could, therefore play a role in developmental regulation.

9. Conclusions and future directions

We report findings of the first genome-wide association study of speech sound disorders. We have replicated other researchers’ findings for ATP2C2 and identified novel suggestive loci that may participate in neuronal cell differentiation, neurite outgrowth, and fetal growth. One especially intriguing gene is PAPPA which was associated with poor articulation scores and through regulation of IGF bioavailability may mediate learning.

Just as in GWAS of ADHD, autism, and (ADHD reviewed by Yang et al., 2015; autism reviewed by Gaugler et al., 2014), our top loci are common variants supporting the notion that genetic background (i.e. the unique collection of polymorphisms each individual carries) establishes liability and rare variants are modifiers that lead to disease status (Gibson, 2012).

In addition to replicating the initial findings in different samples, future work will employ hypothesis driven pathway analysis to test for enrichment of association signal in pathways associated with neuron differentiation, axon migration, and synapse function.

82 Additionally, because we did not adjust for comorbid conditions, future work should also examine the affect of adjusting for language impairment and reading disability. Delving into the epigenetic implications of the intergenic regions will also be important.

A

10. Limitations

This analysis is limited by sample size; our largest sample was 438 which is not especially large for a genome wide association study. In relation to interpreting the results, it is always challenging to attribute function to genes that have not been well studied in relation to a given phenotype. Similarly, many of our loci were intergenic; while it is becoming increasingly apparent that non-coding regions of the genome are functional, precisely defining these functions is still not possible..

83 CHAPTER 5: Accounting for comorbid conditions

1. Introduction

The previous Aim described genes and biological themes that may be relevant to

SSD. However, it did not consider the effects of reading disability (RD) and language impairment (LI), two commonly comorbid conditions. Children with speech sound disorders do not always exhibit later language and/or reading impairment; however, 25–

30% of children with SSD also have a reading disability (Peterson et al., 2009). In a subset of this cohort with moderate to severe SSD, 53% have comorbid language impairment (Sices et al., 2007). Pennington proposed the multiple deficits model to explain comorbid developmental disorders that have both shared and non-shared cognitive deficits, theorizing that commonly comorbid disabilities have shared risk factors that explain the common cognitive deficits (Pennington, 2006). Figure 5.1 below is a visual description of the possible relationship between genetics, SSD, LI, and RD and explains at a basic level why it may be important to account for LI and RD status (Fig

5.1).

Language Impairment and or Reading Disability

? ?

Quan ta ve SNP Effect Trait

Figure 5.1- Conceptual model for the relationship between SNP effect, SSD quantitative trait, language impairment, and reading disability. Figure 5.1 Conceptual model .

84 For LI, RD, and SSD, one well-described shared deficit is related to phonology.

Behaviorally, the ability to produce speech sounds, to master language, and to read involves phonological skills. In part, these skills are built upon phonological representations which involve understanding the sound structure of the target language and, in the case of written language, the graphemes that correspond to a given phoneme

(Goswami, 2012). LI, RD, and SSD impact performance on phonological tasks. Those with SSD perform more poorly on a phonological memory task, NSW, than do unaffected individuals (Lewis et al., 2011). Apart from deficits in syntax, morphology, and vocabulary, SLI can be characterized by poor performance on non-word repetition tasks (Ramus et al., 2013; Gathercole and Baddeley, 1990; Graf Estes et al., 2007).

Finally, children with reading disabilities exhibit low scores on phonological awareness tasks (Gathercole et al., 2006, for example). These phonological deficits may be based on genetic similarities for both typical and atypical development.

There is growing evidence of genetic similarities among LI, RD, and SSD. For example, a dyslexia locus on was linked to NSW and MSW in individuals ascertained for SSD (Stein et al., 2004). Also using linkage analysis, Rice et al., demonstrated that a reading locus on chromosome 6p22 was also associated with SLI and that KIAA0319, a dyslexia candidate gene, is also associated with associated with language phenotypes (2009).

Given these findings, it follows that accounting for the comorbid conditions may alter some of our original association signals while leaving others unaffected. Through this analysis we will gain insight into the genetic relationship among the comorbid

85 conditions. If adjusting for LI and RD has no effect on the strength of genetic

relationships, it would suggest that the locus has a foundational genetic effect.

Finally, speech, language, and reading are acquired in tandem (Leisman et al.,

2015), so this analysis is one way to account for the developmental component of

cognitive phenotypes. If genetic variation negatively clearly impacts the acquisition of

one skill, it may also affect a skill that is acquired at the same time.

We aimed to identify genetic loci that may be a component of the shared etiology

of SSD, RD, and SLI through simple regression methods. By adjusting the previous

models for LI, RD, and LI+RD, we will identify loci for which the relationship between

test score and allele dosage is relatively consistent across comorbidity affection status.

2. Methods

Figure 5.2 Basic workflow for Aim II. First, we performed a GWAS accounting for LI, RD, LI+RD affection status, then we extracted the new effect estimates (β) of the markers that were significant in the unadjusted GWAS (chapter 4). Finally we compared the new effect estimates to the old one to determine the Δβ Figure 5.2 Basic workflow for Aim II. Model 1: Trait~Chip+Sex+Dosage+LI

Model 2: Trait~Chip+Sex+Dosage+RD

86 Model 3: Trait~Chip+Sex+Dosage+LI+RD

It should be noted that these models are an attempt to explain biology and, in reality, are never completely correct because it is not possible to fully capture the complexities of nature especially cognitive phenotypes. In the adjusted models, we attempted to create a more complete picture of the trait of interest by adding covariates for the comorbid conditions because the conditions may explain some of the variation in the quantitative trait. Speech, language, and reading are acquired simultaneously, so variants that affect one trait may also affect another. Moreover, language and reading share phonological endophenotypes.

To identify markers most affected by LI, RD, and LI+RD, we took a standard, classical epidemiology approach used to identify confounding. Confounding is defined as an absolute change in effect estimate (β) of greater than or equal to 10% (Gordis, 2004).

Here we calculated the change using

||||*100

It should be noted that when using this type of analysis, it is not possible to differentiate between mediation in which LI and/or RD are along the causal pathway and confounding in which LI and/or RD are associated with both the predictor and outcome independently.

A greater emphasis was placed upon the changes in effect estimates than p-values because effect estimates allow us to assess confounding. Moreover, a whole family of analyses, gene set enrichment/pathway/network analysis, is founded on the notion that markers that do not have exceedingly small p-values may also be important (Kim and

Przytcka, 2013). Focusing on only p-values may obscure interesting results.

3. Results

87 3.1 General summary

Tables 5.1-5.3 present the mean/median test scores for each outcome separated by model—1. Adjusting for LI; 2. Adjusting for RD; 3. Adjusting for LI and RD. Due to the relatively non-normal distributions of GFTA, MSW, and NSW even after adjustment, a

Wilcoxon Mann-Whitney test was used (the results are the same using at t-test). These summaries are presented separately to reflect the groups for each model.

Table 5.1- Mean/median z-scores stratified by Language Impairment affection status (Model 1) NoLI LI n 465 136 p-vlaue Mean (SD) Mean (SD) Articulation and Motor Control Fletcher t 1.77 (0.33) 1.91 (0.36) <0.001 GFTA* t 0.64 [-0.73, 1.58] 0.49 [-0.65, 1.87] <0.001 Language EOWPVT t 8.21 (2.77) 6.34 (3.24) <0.001 PPVT 0.22 (0.91) -0.54 (1.16) <0.001 WIATLC 0.09 (1.05) -0.70 (1.04) <0.001 Phonology MSW * -0.02 [-4.17, 2.36] -1.97 [-4.32, 2.73] <0.001 NSW * -0.08 [-2.77, 2.05] -1.40 [-3.54, 1.18] <0.001 Reading WRDATK t 65.53 (16.57) 50.68 (15.30) <0.001 WRDID t 23.35 (5.01) 18.73 (4.97) <0.001 Spelling TWS 0.06 (1.08) -0.92 (1.14) <0.001 *median and range presented due to distribution. Wilcoxon test used t – z-scores transformed

Table 5.2 Mean/median score stratified by Reading Disability affection status (Model 2) No RD RD n 478 121 p Mean (SD) Mean (SD) Articulation and Motor Control Fletcher t 1.78 (0.34) 1.90 (0.35) 0.009 GFTA*t 0.58 [-0.73, 1.58] 0.68 [-0.65, 1.87] 0.075 Language

88 EOWPVT 7.86 (3.10) 6.65 (2.78) 0.002 PPVT 0.14 (1.04) -0.59 (0.95) <0.001 WIATLC 0.02 (1.09) -0.67 (0.99) <0.001 Phonology MSW * -0.30 [-4.17, 2.36] -0.62 [-4.32, 2.73] 0.06 NSW * -0.26 [-3.07, 2.05] -0.55 [-3.54, 1.53] 0.008 Reading WRDATK t 64.92 (16.65) 48.40 (14.16) <0.001 WRDID t 23.23 (5.09) 17.83 (4.32) <0.001 Spelling TWS 0.07 (1.10) -1.17 (0.93) <0.001 *median and range presented due to distribution. Wilcoxon test used t – z-scores transformed

Table 5.3 Mean/median scores stratified by all groups except SSD affection status. Differences tested with ANOVA or Kruskal-Wallis NoLI, LI+RD+ LI +SSD RD+SSD NoRD SSD n 407 71 57 64 Mean Mean (SD) Mean (SD) Mean (SD) p (SD) Articulation and Motor Control 1.97 <0.001 1.74 (0.33) 1.88 (0.37) 1.78 (0.24) Fletchert (0.37) 0.77 [- 0.65 [-0.65, 0.62 [-0.73, <0.001 0.21 [-0.57, 1.43] 0.13, 1.87] 1.58] GFTA*t 1.43] Language 6.02 <0.001 8.39 (2.96) 6.49 (3.58) 8.08 (2.67) EOWPVT (2.72) -0.76 <0.001 0.22 (0.95) -0.43 (1.25) -0.36 (0.66) PPVT (1.07) -0.97 <0.001 0.12 (1.05) -0.51 (1.11) -0.08 (0.96) WIATLC (0.94) Phonology -1.40 [- -0.30 [-4.17, <0.001 -0.62 [-4.32, 2.73] -0.01 [-2.59, 4.32, 2.36] MSW * 2.27] 2.73] -1.13 [- -0.26 [-3.07, <0.001 -0.55 [-3.54, 1.53] -0.23 [-2.72, 3.54, 2.05] NSW * 1.53] 1.18] Reading 67.43 46.80 <0.001 54.97 (14.15) 53.97 (12.60) WRDATK t (15.28) (14.61) 17.12 <0.001 23.79 (4.59) 20.57 (4.72) 19.83 (3.38) WRDID t (4.31) Spelling

89 -1.35 <0.001 0.21 (1.06) -0.46 (1.10) -0.84 (0.69) TWS (1.01) *median and range presented due to distribution. Wilcoxon test ; t – z-scores transformed

In each of the 10 quantitative traits, roughly 50% of the individuals are affected

with language impairment, 33% with reading deficit, and 20% with both LI and RD.

(Table 5.1-5.3). Moreover, there is a significant difference for all traits between noLI and

LI individuals. When stratifying by reading deficit, there is no significant difference

between the groups for GFTA (p=0.08) or MSW (p=0.06) (Table 5.2).

Figure 5.3. A. Proportion of markers with p<1x10-5 in Aim I that have Δβ greater than or equal to 10% in Aim II. Fletcher Time by Count had no confounded markers. Figure 5.3. Proportion of markers with p<1x10-5 in Aim I

In general, there was less evidence of confounding when adjusting for reading

than for language. When adjusting for language, the lowest proportion of confounded

90 markers were found for Fletcher Time by Count (0%), NSW (7.6%), and GFTA (38%); the highest were for WRIDID (93%) and TWS (91%) (Fig. 5.3). When adjusting for RD, the least affected scores were Fletcher, GFTA, Expressive Vocabulary, and MSW which exhibited no confounding where as WRDID and Listening Comprehension exhibited the most. Adjusting for both RD and LD revealed the most confounding for PPVT (97%),

WRDID (93%), and WRDATK (99%) (Fig. 5.3)

3.2 Trait specific results

QQ plots, Manhattan Plots, tables for all markers with suggestive p-values, and the most significant p-values for gene previously associated with SSD are in Appendix C. Any changes in the top locus are reported, but rather than repeating Chapter 3 with new plots, the primary focus of this chapter will be to describe the effects of adjusting for comorbid conditions on the top markers from Aim I.

Figures 5.3-5.12 illustrate the effects of accounting for comorbid conditions for each trait.

The top gray portion of the figure is from the original Manhattan plot and the bottom half illustrates the % change of for all markers with p<1x10-5 from the Aim I analysis. C illustrates the β estimates and p-values for each model at a given locus.

3.2.1 Articulation and Motor Control

Fletcher Time by Count- Of the markers that were initially genome-wide suggestive, there are no markers that are significantly confounded based on the ∆β approach (∆βLI=-

91 2%, ∆βRD = -4%, ∆βLI+RD = -3.8%). Moreover, the p-values from the baseline model and adjusted models are consistent (Fig. 5.4C).

Figure 5.4 Effects of adjusting for LI and RD –Fletcher Time by Count. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. C- Zoomed in look at the top locus from Aim I Figure 5.4 Effects of adjusting for LI and RD –Fletcher Time by Count

The most significant locus following adjustment for LI and RD is an intergenic region

between CBX7 and PDGFB on chromosome 22q13.1 β=-0.232 ± 0.044, p=1.82x10-7,

MAF=0.11 (Appendix C Fig C2).

92 GFTA-The PAPP-A locus on is not affected by adjustment for LI, RD, or

LI+RD (∆βLI=-7.0%, ∆βRD = -5.8%, ∆βLI+RD = -8.2%) (Fig. 5.5), and it is also the most significant locus following adjustment for both LI and RD.

Figure 5.5 Effects of adjusting for LI and RD – Goldman-Fristoe Test of Articulation. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1; C- top locus from Aim I Figure 5.5 Effects of adjusting for LI and RD – Goldman-Fristoe Test of Articulation

93 3.2.2 Language

EOWPVT- The strength of the relationship between the broad intergenic peak on chromosome 5 and EOWPVT scores is attenuated by adjustment for language (Δ β =-

14.2%) and LI+RD (Δ β =-15.6) but not by reading (Δ β =-6%). The significance is also reduced by each adjustment (Fig. 5.6).

Top locus The new top locus is on 2p16 between MIR217HG (107kb) and

LOC100129434 (18 kb) (β=1.125, p=3.25x10-6, MAF=0.46) (Appendix C, Fig. C4)

94

Figure 5.6 Effects of adjusting for LI and RD –Expressive One Word Picture Vocabulary Test. A- Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Model is indicated by shapes; C-Top locus from Aim I Figure 5.6 Effects of adjusting for LI and RD –Expressive One Word Picture Vocabulary Test.

PPVT- The locus on chromosome seven is confounded by reading (Δ β=-13%) and both

(Δ β= -11%) but not by language (∆βLI -3%). The top marker in TM2D1 shared with

PPVT is not significantly affected by adjustment (∆βLI=-2.0%, ∆βRD = -3.5%, ∆βLI+RD = -

5.8%), but the remaining markers appear to be (Fig. 5.7B).

Top locus- Following adjustment for LI+RD, rs7247941, (β=-0.677 ±0.14, p=4.54x10-6,

MAF=0.83) in IL12RB1 is the most significant marker. (Appendix C, Fig C5)

95

Figure 5.7 Effects of adjusting for LI and RD Peabody Picture Vocabulary Test. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by shapes. C- Zoomed in look at shared locus from Aim I; D. Top locus from Aim I. Figure 5.7 Effects of adjusting for LI and RD Peabody Picture Vocabulary Test

WIATLC -The top marker in locus SLC39A8 is confounded slightly by LI (Δ β =-

15.2%) and LI+RD (Δβ =-18.5%) but not by RD alone (Δβ =9.3%). However, there does appear to be confounding due to reading at other markers in the association locus (Fig

5.8B). The SLC39A8 locus remains the most significant after adjustment.

96

Figure 5.8 Effects of adjusting for LI and RD Weschler Individual Achievement Test- Listening Comprehension subtest. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by shapes. C- Zoomed in look at shared locus from Aim I; D. Top locus from Aim I. Figure 5.8 Effects of adjusting for LI and RD Weschler Individual Achievement Test- Listening Comprehension subtest 3.3.3 Phonology

MSW- The strength of association between top locus on chromosome 6q23 and MSW scores is reduced by accounting for LI (Δ=31.8%), LI+RD(Δ=32.8%), but not for RD

(Δ=3.6%) (Fig. 5.9). The trend is similar for the entire peak.

97 Top locus- Following adjustment, the locus on chromosome 13q12.2 shared by MSW

and NSW is the most significant (β=0.560±0.113, p=7.83x10-7, MAF=0.24) (Fig. 5.9)

(Appendix C, Fig. C7).

Figure 5.9 Effects of adjusting for LI and RD Multisyllabic Word Repetition. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by shapes C. Top locus from Aim 1; D. shared locus from Aim I. Figure 5.9 Effects of adjusting for LI and RD Multisyllabic Word Repetition NSW- The top marker in the peak on chromosome 10q23.1 is not confounded by LI (Δ=-

6.7%), RD (Δ=6.1), or LI+RD(Δ=6.1); however, the effect of some loci is reduced by

98 greater than 10% (Fig 5.10B). The effect of the locus shared with MSW

is attenuated (∆βLI=-14.4%, ∆βRD = -13.0%, ∆βLI+RD = -13.7%). There are some markers that exhibit a greater decrease in effect (Fig 5.9B).

Top locus- The peak is still the most significant peak after adjustment; however, now, rather than being inter-genic, the most significant marker is within

ANXA11 (rs9645553, β=-0.462 ±0.088, p=1.40x10-7, MAF=0.26) (Appendix C, Fig C8 ).

Figure 5.10 Effects of adjusting for LI and RD Nonsense Word Repetition. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by point shapes. C. Top locus from Aim 1; D. shared locus from Aim I. Figure 5.10 Effects of adjusting for LI and RD Nonsense Word Repetition

99 3.4 Reading

WRDATK- The locus on 14q32.2 in SETD3 is confounded by LI (Δ=-19.4%), RD (Δ=-

12%), and RD+LI (Δ=-18%) (Fig. 5.11). By comparison, adjustment more profoundly

decreases the genetic influence in ARHGAP23( ∆βLI=-25%, ∆βRD = -26.0%, ∆βLI+RD = -

31%). The locus in IQCE shared with WRDID is also attenuated (∆βLI=-21.9%, ∆βRD =

-28.6%, ∆βLI+RD = -36.8%).

100

Figure 5.11 Effects of adjusting for LI and RD Word Attack. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by shapes. Zoomed in look at C. shared locus from Aim I; D. Top locus from Aim 1; E. shared locus from Aim I. Figure 5.11 Effects of adjusting for LI and RD Word Attack

Top locus- Even in the presence of confounding, the SETD3 locus is the most significant.

101 WRDID- The strength of the relationship between markers in ATP2C2 and reading scores is slightly reduced when accounting for LI and RD affection status (∆βLI=-11.5%,

∆βRD = -8.9%, ∆βLI+RD = -13.6%), but the marker remains genome-wide suggestive (Fig.

5.12B&C). As in WRDATK, the effects of the shared locus decreased and are no longer genome-wide suggestive following adjustment (∆βLI =-22.9%, ∆βRD=-30%, and ∆βLI+RD

=-38%) (Fig. 5.12C).

Top locus- ATP2C2 is still the most significant locus even after adjusting for comorbid conditions.

102

Figure 5.12 Effects of adjusting for LI and RD Word Identification. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by shapes. Zoomed in look at C. shared locus from Aim I; D. Top locus from Aim 1. Figure 5.12 Effects of adjusting for LI and RD Word Identification

3.2.5- Spelling

TWS- After accounting for language, the top marker in SPAM1 is confounded (∆βLI =-

20%)(Fig 5.13C) and after adjusting for RD and both RD and LI, the most significant marker is severely attenuated and no longer genome-wide suggestive (∆βRD=-34% and

∆βLI+RD =-38%) (Fig. 5.13C).

103 Top locus- rs193110168, in an inter-genic locus on chromosome 11q22.1 97kb from

TRPC6 and 210kb from ANGPTL5 was most significant (p=5.40E-06, β=-0.690 ±0.152,

MAF=0.084) (Appendix C, Fig. C11).

Figure 5.13 Effects of adjusting for LI and RD Test of Written Spelling. A-Aim 1 Manhattan plot; B- Δβ for markers that were significant in Aim 1. Adjustment is indicated by shapes. Zoomed in look at C. Top locus from Aim 1. Figure 5.13 Effects of adjusting for LI and RD Test of Written Spelling

104 4. Discussion

We have explored the effects of language impairment and reading disability on the associations between genetic loci and SSD endophenotypes and describe inconsistent results. Based on changes in beta values, Fletcher Time by Count scores, a measure of motor control, are least affected by adjustment for LI and RD; on the contrary, measures of spelling and reading, especially the Word Identification test, are affected most by adjustment for comorbid conditions. In spite of changes in effect estimates, the most significant loci remained consistent for six traits. Our results support the concept that there is a shared genetic component of communication skills, independent of LI, and RD affection status.

First, we will briefly discuss traits for which the most significant locus changed; for PPVT and NSW, the most significant loci now fall within IL12RB1 and ANXA11, respectively. The top loci also changed for EOWPVT and TWS, but to intergenic regions; discussion will be limited to genic results. IL12RB1 is a cytokine receptor that is involved in Salmonella and mycobacteria infection (NCBI, Gene ID:3594). The relationship between the gene and receptive language is not clear, but cytokine signaling has pleiotropic effects including roles in central nervous system development (Reviewed in Viviani et al., 2007 and Mousa and Bakhiet, 2013). There is evidence of cytokine involvement in neural cell differentiation (Mehler and Kessler, 1995), synapse formation

(Pickering and O’Connor, 2007), and long-term potentiation (Viviani et al., 2007).

Long-term potentiation is critical for memory which is also necessary for receptive language development; IL12RB may exert its effect through involvement in long-term potentiation.

105 ANXA11, the new NSW locus, is also implicated in immune function; it is a calcium-regulated phospholipid binding protein, is expressed in all cell types, and has been implicated in exocytosis, cytokinesis, and sarcoidosis susceptibility (Wang et al.,

2014). Although ANXA11 is involved in vesicle trafficking, there is no evidence of its involvement in neurotransmitter release, which would be the most plausible biological connection.

4.1 Endophenotype specific results

We attempted to account for variance in the quantitative traits by including LI and

RD in our association model. After adjusting for the comorbid conditions, if the β estimates and p-values remained consistent, we can postulate that variation in the quantitative trait is not explained by the comorbid conditions, suggesting a general genetic effect of the locus. In other words, if we stratified the group between individuals with the given comorbid condition and those without, we would expect the same genetic effect.

For all the test scores, there is a significant difference between LI and NoLI groups with LI groups having consistently lower scores than NoLI groups (Table 5.1). In this context, we may expect that accounting for LI explains the majority of the variance, minimizing the genetic effect. However, if the genetic effect is general and independent of LI status, and the SNP still explains some variance in the test scores, the genetic effect estimate will not be significantly reduced. We can apply the same logic for RD.

4.1.1 Articulation and motor control

The observed genetic effects are not reduced by inclusion of LI or RD in the model. For Fletcher Time by Count, the most significant locus is intergenic; however,

106 whatever the function of the locus may be it has a consistent effect. Similarly, the GFTA results suggest that PAPPA has a role in articulation skills regardless of comorbidity affection status. It is interesting that these traits are least affected by accounting for comorbid conditions; developmentally, they are among the first acquired and may be genetically distinct from the other more cognitive traits.

4.1.2 Language

The genetic effects and significance of all language associated loci are reduced by adjustment for LI, a fact which indicates that the tests used to interrogate the language endophenotype are good at discriminating between individuals with and without LI. The receptive language loci are also confounded by reading, and in the case of PPVT, there is a greater reduction in the genetic effect estimate due to RD adjustment than LI.

Vocabulary knowledge, as measured by the Peabody Picture Vocabulary test, is a key component of reading performance (Mitchel and Brady, 2013); based on the observations in our data, reading deficits and receptive vocabulary performance are proxies for each other, reducing the importance of genetics.

4.1.3 Phonology

Because phonological processes are implicated in LI and RD, we expected the effects of the phonology loci to remain consistent. However, for MSW, the effect and significance of the top locus was reduced by accounting for LI but not for RD; language impairment explained some variance in the MSW scores but reading disability did not.

The trends for NSW are more consistent with expectation, with ΔB being about 6% for all adjustments. The 10q23.1 locus may have a general effect on phonological encoding.

107 4.1.4 Reading

As was expected, adjusting for LI and RD reduced the effect estimates and

significance of loci in SETD3, ARGHAP23, and IQCE. The effects of the genes/loci on

the Word Attack measure are not independent of LI and RD affection status.

Contrary to the Word Attack Locus, for the Word ID locus, the effects of ATP2C2

were only mildly attenuated by adjustment for LI (∆βLI=-11.5%, ∆βRD = -8.9%, ∆βLI+RD =

-13.6%),). There is a significant difference in scores between LI/NoLI and RD/NoRD

(Table 5.2); however, the affection status is not sufficient to explain these differences in scores, and the SNP association still accounts for variance in the phenotype. ATP2C2 was previously associated with phonological measures in a sample ascertained for specific language impairment, and phonological skills is necessary for reading (Newbury et al.2009, Newbury et al., 2011). In conjunction these findings imply that ATP2C2 is important in a fundamental aspect of phonological skills.

4.1.5 Spelling

The spelling loci demonstrate a greater reduction in genetic effect when adjusting for reading alone as compared to for language alone; the genetic effects are not independent of reading abilities. These findings reflect the high behavioral correlation between reading and spelling; reported correlation between word identification and spelling ranges from 0.60-0.80 (Fayol, Zorman , and Lété, 2009) and in this sample r2=0.78. Moreover, single deficits in reading are not reported as frequently as double

deficits in reading and spelling (reviewed in Bar-Kochva and Amiel, 2016). There is also

evidence that reading and spelling share brain regions. When typically developing

individuals were asked to silently spell and read (slightly different than tasks in our

108 study), the researchers described overlapping activation of the fusiform gyrus (Rapp and

Lipka, 2011). These researchers demonstrated that spelling and reading are not independent at the brain network level and our findings indicate that they may not be at a genetic level, either.

5. Conclusions, limitations and future directions

In this analysis we accounted for language impairment and reading disability status when analyzing quantitative traits that capture speech sound disorder endophenotypes. In some cases, comorbid conditions attenuated the genetic signal but in others they had little effect. The latter provide evidence of fundamental genetic effect which we observed for Fletcher Time by Count and GFTA loci as well as for the ATP2C2 locus. Conversely, for reading and spelling scores, traits that require more formal learning and teaching compared to motor speech and language skills, the genetic effects are not strong when also considering RD and LI. In these situations there is complex relationship between genetics and performance on a given task. These results also reflect the reality that reading and spelling are acquired together developmentally so loci that affect one trait may also affect the other.

While the results do help elucidate the impact of between LI and RD on estimated genetic effects, there are some shortcomings. The ideal way to address the shared and unique genetic components of SSD, LI, and RD would be genome wide complex trait analysis (GCTA), but we have family data and a small sample, so such an analysis was not possible. Next one potential explanation for our observations that LI had a more pronounced effect than RD is that there were more individuals with LI than with RD. A major limitation is that we did not have any measures of model fit (e.g. Mallow’s Cp,

109 AIC, BIC) nor did we obtain effect estimates for the other covariates. Granted, there are upwards of 5 million models thus five million statistics, but future work could develop and implement more robust measures to assess goodness of fit and aid in model selection for GWAS. As for the latter, there would be at least five million estimates for each covariate, but at the top loci it would be helpful to understand their relative contributions of covariates to the outcome of interest.

In order to better control for comorbidity affection status, future studies should attempt to collect large enough samples to allow for stratification by comorbidity status.

Stratifying is preferable because we will not lose power due to an additional predictor in the model.

A final thought is that the z-scores were calculated based on SSD affection status.

It is not known what the effect would be if these scores were recalculated based on LI and RD status. Alternatively we could calculate z-scores based on individuals who are unaffected with SSD, LI, and RD. If the estimated genetic effects truly are independent of comorbidity status, we should be able to replicate our results using z-scores calculated based on RD and LI status. The cohort was ascertained based on SSD affection status, so the proposed analysis may not be methodologically sound.

110

CHAPTER 6: Hypothesis driven pathway analysis

1. Introduction

In Aim 1, we identified specific loci that may be associated with speech sound disorder (SSD) and clustered those loci into groups based on possible functions. In Aim

II, we provided further evidence that SSD, language impairment (LI), and reading disability (RD) share a common genetic basis. We further explored the ideas developed in Aims I and II using hypothesis driven pathway analysis.

We co-opted pathway analysis methods to test for enrichment of the groups defined in Aim 1—axon migration, glycosylation, neuron differentiation, synapse fuctnion—and sets of genes described in the literature as associated with RD and language impairment (LI). The latter explores the genetic relationship among communication traits. Additionally, for thorough comparisons of our results with

FOXP2, the only gene causally associated with SSD, we tested for enrichment of GWAS signal within a FOXP2 interaction network (Fisher et al., 1998; Lai et al., 2001). We performed a similar analysis for CNTNAP2 because deletions in this gene have been associated with apraxia of speech, dyslexia, SLI, and autism (Centanni et al; 2015; Laffin et al., 2012; Poot et al., 2015). There were neither genome-wide significant nor suggestive results for these genes, FOXP2 and CNTNAP2, but the networks in which they participate may carry a high burden of marginally significant variation.

111 First, we hypothesized that the biological groups based on Aim 1 results would be significant in the endophenotypes where the group originated. We hypothesized that gene sets comprised of genes previously associated with RD, SLI, and SSD would significant in reading, language, and articulation-motor control endophenotypes respectively.

2. Methods

We used GWAS results from the fully adjusted model

(Trait~Sex+Chip+Dosage+SLI+RD) because adjustment for the comorbid conditions ensures the least confounded genetic effect (see Chapter 5) and reduces the probability of biased results. We took a multistep approach, which included hypothesis testing and hypothesis generating analyses and in this chapter I will discuss the left arm of the diagram (Figure 6.1).

112

Figure 6.1 Workflow for pathway analysis of genome-wide association results. This chapter covers the left portion of the diagram. Figure 6.1 Workflow for pathway analysis of genome-wide association results All pathway analyses were completed in PARIS, and the parameters were kept at the PARIS default: p-value cutoff =0.05, Gene±50kb, CEU based LD map, 1,000 permutations (Yaspan et al.,2011). To test our hypotheses, we used both the built-in

Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and the “User defined” pathway feature in Paris. User defined pathways do not need to be directional biological pathways.

2.1. Biological groups based on Aim I results

In Aim I, we clustered significant loci into biologically meaningful groups. Here

we will explore the significance of those groups using pathways that already exist in the

113 KEGG (Table 6.1). The Amyotrophic lateral sclerosis (ALS) pathway was the best match for myelination because demyelination is part of disease progression. It should be noted that a multiple sclerosis (MS) pathway would be preferable to ALS because demyelination occurs early in disease progression; a MS pathway may capture the processes of typical myelination better than the ALS pathway.

Table 6.1 Pathways of interest based on Aim I GWAS results Biological group from Aim I Relevant KEGG pathway Galactose metabolism Galactose metabolism N-glycan biosynthesis Fetal development None Neuron differentiation and Axon Migration Axon Guidance Neurotrophin signaling pathway Signal transduction (synapse) Glutamatergic synapse GABAergic synapse Cholinergic synapse Dopaminergic synapse Serotonergic synapse Long-term potentiation Long-term depression Signal transduction (mylenation) Best match-Amyotrophic lateral sclerosis (ALS)

There are no KEGG pathways that encompass fetal development, placentation, and preterm birth, and knowledge is, in general, limited; studies of fetal contributions to preterm birth have not yielded significant results (Parets et al., 2015).

2.2. Gene based networks

While we relied on KEGG pathways for the above analyses, we used user defined pathways to test for enrichment of association signal in CNTNAP and FOXP2 interaction networks. We built networks of experimentally validated interaction partners from

STRING, IntAct, Vernes et al. (2011), and Estruch et al. (2016).

114

Table 6.2 Genes included in the FOXP2 and CANTNAP2 gene sets FOXP2 CNTNAP2 PIAS1 HSP90AB1 KCNA2 PIAS2 FOXP3 CNTN2 PIAS3 FOXP1 CASK PIAS4 ACVR2A EPB41L3 SUMO1 EFNB2 UBC SUMO2 ETV1 CTR9 SUMO3 NFAT5 MACF1 UBC9 NPTN RANBP9 NFATC2 NRN1 IQCB1 CTBP1 NRP2 FBXO21 FOXP4 SEMA6D FOXP1 WASF1 SP4 YWHAH RPIA SDCBP CCNC CTBP2 AES CTPB1 PIN1 FAM124A TSACC RPIA SP4

2.3 Comorbidity based gene sets

We also generated lists of genes previously associated with specific language

impairment, reading/dyslexia, and syndromes that are known to affect speech (Table 2.4

and 2.6, Appendix D Table D1).

2.4 Top genes

PARIS uses the same permutation approach to assess the significance of

individual genes. For the comorbidity based gene sets we examined which genes were

significant for each gene set.

115 3. Results

We defined pathways with p<=0.001 as significantly enriched with GWAS signal for a given trait. This threshold indicates that in 1000 permutations of the genome to match the distribution of features within the target pathway, only 1 permutation was more significant than the real pathway.

3.1. Enrichment in groups derived from Aim I

The significance of the pathways representing biological groups identified in Aim

1 varies by trait, and when pathways are significant, it is not necessarily in the trait from which the pathway was derived (Table 6.3). Notably, Axon Guidance is significant in

Expressive One Word Picture Vocabulary, Listening Comprehension, Peabody Picture

Vocabulary, Test of Written Spelling, and Word Identification. At least one Signal

Transduction related pathways is significant in all traits except GFTA, MSW, and TWS

Four of the signal transduction related pathways are significant in EOWPVT.

116

Table 6.3 Significance of Aim I based pathways for each trait. Articulation/ Spellin Language Phonology Reading motor g

TWS NSW NSW MSW MSW PPVT PPVT GFTA WRDID Fletcher Fletcher BTspeeh WDATK WDATK WIATLC EOWPVT EOWPVT

Galactose metabolism Galactose metabolism 0.249 0.007 0.623 1 0.412O 0.01 0.998 0.237 0.776 1 0.762 N-glycan biosynthesis 1 < 0.001 0.001 0.978 0.983O <0.001 0.015 < 0.001 0.983 0.011 0.975 Neuron differentiation and Axon Migration Axon Guidance 0.448 0.899 < 0.001 < 0.001 < 0.001 0.994 0.633 0.004 <0.001O <0.001O 0.003 Neurotrophin signaling 0.708 0.999 < 0.001 < 0.001 < 0.001 0.535 0.531 <0.001O 0.007O 0.109O 1 pathway Signal transduction (synapse) Cholinergic synapse 0.535 0.863 0.899 < 0.001 1 1 0.999 < 0.001 0.001O 0.853 0.249 Dopaminergic synapse 0.531 0.3 0.994 0.706 1 1 0.989 0.598 1O 0.463 0.007 GABAergic synapse < 0.001 0.004 0.633 0.908 0.003 0.626 0.078 0.71 <0.001O 0.512 0.01 Glutamatergic synapse < 0.001 0.432 < 0.001 < 0.001 < 0.001 1 0.985 0.913 0.015O 0.415 1 Long-term potentiation < 0.001 1 < 0.001 < 0.001 0.855 0.985 <0.001 0.834 0.978O 1 1 Long-term depression 0.009 1 0.004 0.659 < 0.001 1 1 0.222 0.011O 0.985 0.412 Serotonergic synapse 0.477 0.084 < 0.001 0.022 1 1 1 0.974 0.267O 1 0.003 Signal transduction (myelination) Best match-Amytrophic 0.71 0.997 0.598O 0.222O <0.001O 0.913 0.834 0.974 0.892 0.996O <0.001 lateral sclerosis (ALS)

117

3.1.2. Gene based pathways

The FOXP2 network was not significant for any of the quantitative traits analyzed. It obtained a permutation p-value of 0.002 for the binary trait and 0.004 for

Test of Written Spelling. The CNTNAP2 network was not significant for any traits, either

(Table 6.4).

Table 6.4 p-values for gene based networks Trait CNTNAP2 FOXP2 Motor/articulation Fletcher Time by Count 0.999 0.38 Goldman-Fristoe Test of Articulation 0.992 0.775 Language Expressive One Word 0.791 0.292 Peabody Picture Vocabulary 0.277 0.946

Listening Comprehension 0.407 0.997 Phonology MSW 0.374 0.611 NSW 0.982 0.107 Reading Word Attack 0.551 0.999 Word Identification 0.935 0.78 Spelling Test of Written Spelling 0.992 0.004 BT Speech 0.976 0.002

3.1.3. Comorbid condition based networks

The reading disability/dyslexia pathway was significant for the binary trait, Listening

Comprehension (language) (p<0.001), Word Attack (reading) (p<0.001), Word

Identification (reading) (p=0.001), and Test of Written Spelling (p=0.001).

The SLI gene group was significant for the binary trait and Expressive One Word

Vocabulary and Listening Comprehension. The group of genes previously associated with SSD was significant for Fletcher Time by Count, Expressive One Word Picture

Vocabulary, and Word Attack, Table 6.5).

118

Table 6.5 p-values for gene groups previously associated with comorbid conditions Trait RD SLI SSD Syndromes Motor/articulation Fletcher Time by Count 0.137 0.676 0.001 <0.001 Goldman-Fristoe Test of Articulation 0.972 0.93 0.928 0.883 Language Expressive One Word 0.051 <0.001 <0.001 0.008 Peabody Picture Vocabulary 0.882 0.007 0.831 <0.001 Listening Comprehension <0.001 0.001 0.907 0.934 Phonology MSW 0.995 0.998 0.999 0.983 NSW 0.478 0.981 0.984 0.978 Reading Word Attack <0.001 <0.001 0.003 0.324 Word Identification 0.001 0.095 0.114 0.025 Spelling Test of Written Spelling 0.001 1 0.328 0.663 BT Speech <0.001 <0.001 0.008 0.79

3.1.4 .Genes within RD, SLI, SSD, and syndrome groups

For the traits with significant pathways in Table 6.4, we further investigated the pathways

by identifying the significant genes. In the reading pathway, ROBO1 had a p-value less

than 0.001 Listening Comprehension, Word Identification, and Test of Written Spelling.

Although the RD group was not significant for WIATLC, ROBO1 was. For the language

impairment group, there was no consistency— for Listening Comprehension, GCFC2

and CMIP were significant, while for Expressive One Word Picture Vocabulary, ATP2C2

was significant. For the binary trait GCFC2 , SETBP1, CMIP, and CNTNAP2 were all

significant.

4. Discussion

We performed hypothesis driven pathway analysis on GWAS results from 10

quantitative traits that capture endophenotypes of SSD. Support for our initial

119 hypotheses varies, but our findings do provide further evidence of a shared genetic architecture with LI and RD and suggest a polygenic model for SSD.

The Aim I based pathways were enriched for association signal in some instances while not in others, a situation which calls our attention to the interpretation of pathway analyses. Just because a pathway is not significant does not mean that the pathway is not biologically relevant; KEGG(and other pathway databases) are proxies for the biological pathways, and it is nearly impossible for them to be exhaustive.

The N-glycan biosynthesis pathway, included based on a potential association between SSD and disorders of glycosylation, was enriched for signal in GFTA,

EOWPVT, MSW, and Word Attack, but not in Listening Comprehension, the original endophenotype of interest. In relation to our original hypothesis, even though a gene implicated in disorders of glycosylation, SLC39A8, is suggestively associated with

Listening Comprehension scores, the pathway is not. Interpreting this result is challenging because we do not know the impact of the variants in SLC39A8; glycosylation could have no impact on SSD or it could.

Although many of the synapse pathways were enriched in the language traits, only one of the pathways in enriched in WRDID which was the impetus for the group.

These results do not necessarily signify that signal transduction pathways are unimportant in reading, but rather that the curated pathways are not enriched for association signal.

The axon guidance pathway is significant for language, reading, and spelling. It is not surprising that a processes involved in establishing neural connections is associated with many phenotypes .

120 Comorbid pathways

While the Aim I results are challenging to interpret examining potential etiological overlap is more straight forward. We predicted that the SLI gene set will be enriched for association signal in language traits, and it is except for in PPVT where the p-value is 0.007; additional permutations may lead to a significant association. The hypothesis that the RD gene set would be enriched in reading traits is also supported.

The most interesting results, though, are that the RD set was significant in Listening

Comprehension and Test of Written Spelling; the SLI set was significant in Word Attack

(read); and SSD was significant in Expressive One Word Vocabulary. These results imply that there is a shared basis of SSD and RD. It is difficult to extrapolate any further because the gene sets were based upon single SNP associations whereas in this analysis we examined SNP sets

For the traits with significant pathways in Table 6.4, we further investigated the pathways by identifying the significant genes. In the RD gene set, ROBO1 was significant for Listening Comprehension, Word Identification, and Test of Written

Spelling. ROBO1 has previously been associated with dyslexia, language, and math abilities (Mascheretti et al., 2014; Wang et al., 2015; Massinen et al., 2016), and it mediates axon guidance (NCBI). The gene has previously been implicated in language but not in relation to spelling, a fact which further demonstrates the genetic similarities between language and spelling abilities. The gene was not significant in either MSW or

NSW, though, suggesting that if phonological skills are the common thread between communication phenotypes, it is not through ROBO1.

121 While FOXP2 and CNTNAP2 have been associated with apraxia of speech (Lai et al. 2001; Centanni et al., 2015) and subsequently studied with animal models, in this cohort, there is no evidence of significant association within the genes or their immediate interaction partners(Table 4.3). These findings highlight the heterogeneity of the genetic basis of SSD; causal loci in one individual or family may not be important in another. It follows that a type of threshold model in which common variation increases disease risk and rare or de novo variation causes disease may be applicable to SSD.

Overall these results indicate that there is enrichment of GWAS signal within some of the pathways identified in Aim I and provide further evidence of a shared genetic basis of SSD, RD, and SLI. Additionally, gene based tests may be informative for SSD because when we accounted for SNPs that were not highly significant, we identified genes that were significantly enriched for association signal.

122 CHAPTER 7: Exploratory pathway analysis

1. Introduction

Complex diseases are not caused by a single mutation in a single gene; SSD is no exception. In Aim I, we identified variants that may be associated with SSD, in Aim II we further explored the effects of LI and RD on those variants, and in the first part of this

Aim III, we tested hypotheses based on results from Aim I and published findings.

However, we have yet to explicitly address the reality that SSD is not a Mendelian disorder. On the contrary, it is generally accepted that the genetic basis of SSD is multifactorial and heterogeneous, just like the phenotype (Newbury and Monaco, 2010); we should expect (and have found) evidence of variants of small effect that are not necessarily rare being associated with the disease (Chapters 4 and 6). It follows that analytical endeavors to disentangle the genetic architecture of SSD should account for this assumed inheritance model.

One method to address the complex, multifactorial nature of SSD is exploratory pathway analysis. In this analysis, biologically relevant pathways are tested for enrichment of GWAS signals. Such an approach is an ideal secondary analysis for our results due to the heterogeneity and lack of genome-wide significant loci.

Recent work in autism spectrum disorders (ASD), a heterogeneous collection of brain development disorders, validates the approach. In ASD, enrichment analysis has successfully grouped apparently incongruous de novo exonic copy number variation

(CNV) and loss of function single nucleotide variants (SNV) into networks related to neuronal development and signaling, synapse function, and chromatin regulation (Pinto et

123 al., 2014). In another highly heterogeneous disorder, schizophrenia, a pathway approach identified enrichment in , vesicle trafficking, and MAP Kinase pathways

(Crisafulli et al., 2012). Because network analysis on disorders which are just as, if not more, heterogeneous than SSD yielded biologically plausible results, it is a viable approach for SSD.

We performed the first exploratory pathway analysis of GWAS results from an

SSD cohort in order to characterize the landscape of variation associated with the disorder. In so doing, we provided general insights into SSD, placed the disorders in a broader context, and finally developed hypotheses for future work.

2. Methods

The basic analytical methods are the same as in Chapter 6; we used PARIS with the p-values from the fully adjusted GWAS (Trait~Sex+Chip+Dosage+LI+RD) (Yaspan et al., 2011). The definable parameters were kept at the PARIS default: p-value cutoff

=0.05, gene=Gene±50kb, CEU based LD map, 1,000 permutations (Yaspan et al., 2011).

For a full description of PARIS methods, see Chapter 3.

2.1 Identifying significant pathways and redundancy control

We defined pathways with p<=0.001 as significant because this threshold indicates that in 1000 permutations of the genome to match the distribution of features within the target pathway, only 1 permutation was more significant than the real pathway.

We used the 1000 permutation PARIS default so <=0.001 is the most significant p-value possible. After identifying significant pathways, we performed redundancy control in

ReCiPa to account for the fact that many genes are present in multiple pathways (Vivar et

124 al., 2013). Any pathways with 75% overlap or greater were joined to super pathways.

After the paths were merged, multiple secondary analyses were conducted.

2.2 Interaction within endophenotypes

Neither genes nor pathways function in isolation; therefore, we sought to identify highly connected pathways by building an interaction matrix for pathways significant in multiple traits. The groupings included pathways significant in all traits of a given endophenotype as well as those significant in the two diagnostic components of SSD, articulation and phonology.

To determine the connectivity of a pathway, I searched each pathway on the

KEGG online database; any pathway listed on the map of another interacts with that second pathway (Adopted from Wen et al., 2016)(Example Fig 1). A highly connected pathway within an endophenotype provides insight into the fundamental genetic basis of a given trait because it is a central component of the system. Even though we may not fully grasp the complexities of the entire system, understanding a major functional component serves as a first step in unraveling the complex genetic basis of speech and language.

125

Figure 7.1 Section of the KEGG Calcium signaling pathway. In the present analysis, the circled pathways calcium interacts with MAPK signaling, Apoptosis, Long-term potentiation, Long-term depression, and Phosphatidylinositol signaling pathway. (Kanehsa et al., 2016; Kanehisa et al., 2000) Figure 7.1 Section of the KEGG Calcium signaling pathway

2.3 Gene based analysis

Each KEGG pathway tested using PARIS contains a variable number of genes.

Identifying genes that are represented in many pathways with p<=0.001 may also provide insight into the genetic architecture of SSD and the associated endophenotypes. PARIS assigns p-values to genes using the same permutation method as for pathways (Yaspan et al., 2011). Identifying significant genes may seem reductionist considering that we are attempting to identify emergent networks associated with endophenotypes, but genes that are significant and present in many significant pathways provide complementary information.

When identifying genes over-represented in pathways with p<=0.001, it is necessary to recognize that some genes are present in many KEGG pathways while others are present in only one or two. Probabilistically, we would expect that the gene in more pathways to be present in a greater number of significant pathways than the gene

126 present in one or two; thus, the former may not be as interesting as a gene that is not

represented in many KEGG pathways but all the pathways it is in are significant. To

account for these realities and prioritize genes in top pathways, we developed a simple χ2 test which, to our knowledge, has not been defined previously.

Given Gene A in Trait 1 (Equation 7.1)

2 χ2 Equation 7.1

Where

O= the number of significant pathways in which a gene appears for Trait 1.

E= the expected number of significant pathways in which a gene appears based

on the number of significant pathways and the frequency of the gene in all KEGG

pathways.

Equation 7.2

f= frequency of GENE A in all KEGG pathways

t= total number of top pathways for Trait 1

293= total number of pathways tested

We calculated this statistic for each gene found in more than one top pathway, found the

corresponding p-value (poverrep), and prioritized genes for further investigation based on

the poverrep.

3. Results

There were 205 pathways that reached permutation p-value <=0.001. Of these

pathways, 123 were shared by at least two traits, not including the binary trait for speech

(Appendix E, Table E6-7). Of the pathways that are shared by at least two traits, 29% are

127 annotated as involved in human disease, 28% in organismal systems, and 23% in

metabolism (Figure 7.3).

Figure 7.3 Classification of pathways significant in two or more traits

Figure 7.3 Classification of pathways significant in two or more traits

The number of significant pathways is not consistent across traits (Table 7.1), but there are 14 pathways that are shared by five or more traits (Table 7.2). Some of these pathways are the same as those discussed in Chapter 6 such as Axon Guidance and

Neurotrophin Signalling Pathway (Tables 7.2)

Table 7.1 Number of significant pathways for each trait Trait Significant paths Articulation and Motor control Fletcher time by Count 29 GFTA 27 Language EOWPVT 58 PPVT 81 WIATLC 55 Phonology MSW 38

128 NSW 29 Reading WRDATK 48 WRDID 35 Spelling TWS 40 BTSpeech 42

Table 7.2 Pathways shared by four or more traits. Gray boxes indicate the pathway was significant at p<0.001 Artic./ Language Phono. Reading Motor Pathways

r SW Fletche N GFTA GFTA EOWPVT PPVT WIATLC MSW WRDATK WRDATK WRDID TWS BTSPEECH Grand Total Without BT cGMP-PKG signaling pathway 1 1 1 1 1 1 1 7 7 Collecting duct acid secretion 1 1 1 1 1 1 1 7 7 Glycosaminoglycan biosynthesis 1 1 1 1 1 1 6 6 Hepatitis B 1 1 1 1 1 1 6 6 Progesterone-mediated oocyte maturation 1 1 1 1 1 1 6 6 Rheumatoid arthritis 1 1 1 1 1 1 6 6 Riboflavin metabolism 1 1 1 1 1 1 6 6 Axon guidance 1 1 1 1 1 5 5 Epstein-Barr virus infection 1 1 1 1 1 5 5 Melanogenesis 1 1 1 1 1 5 5 Osteoclast differentiation 1 1 1 1 1 5 5 Oxidative phosphorylation 1 1 1 1 1 1 6 5 RIG-I-like receptor signaling pathway 1 1 1 1 1 5 5 Ubiquitin mediated proteolysis 1 1 1 1 1 5 5 Adrenergic signaling in cardiomyocytes 1 1 1 1 4 4 Biosynthesis of unsaturated fatty acids 1 1 1 1 1 5 4 Calcium signaling pathway 1 1 1 1 4 4 Epithelial cell signaling in Helicobacter pylori infection 1 1 1 1 4 4 Glycosylphosphatidylinositol (GPI)-anchor biosynthesis 1 1 1 1 4 4 Herpes simplex infection 1 1 1 1 1 5 4 HTLV-I infection 1 1 1 1 4 4 Huntingtons disease 1 1 1 1 4 4 Inflammatory bowel disease (IBD) 1 1 1 1 4 4

129 Mineral absorption 1 1 1 1 4 4 Neurotrophin signaling pathway 1 1 1 1 4 4 Pertussis 1 1 1 1 4 4 Protein processing in endoplasmic reticulum 1 1 1 1 4 4 Ribosome 1 1 1 1 4 4 Toll-like receptor signaling pathway 1 1 1 1 4 4

Description of individual quantitative traits will be brief and the discussion of interaction

will be the main focus.

3.1 Articulation and motor control

Fletcher Time by Count- After merging redundant pathways there were 28 significant

pathways (Appendix E, Table E1). Within these top pathways, SUCLG2 was

significantly overrepresented occurring in 11% (3) vs 1% (4) of all pathways total

(p=1x10-5).

Goldman-Fristoe Test of Articulation- There were 27 significant pathways (Appendix E,

Table E1). Isolating the top gene, PPP2R2B was in 22% (6) of all top pathways

-7 (poverrep=1.2x10 ).

Shared pathways

Carbon metabolism, Glycosphingolibid Biosynthesis-ganglio series, and Prion disease

pathways were significant in both Fletcher Time by Count and Goldman-Fristoe Test of

Articulation. There were no interactions.

3.2 Language

Expressive One Word Picture Vocabulary- There were 58 significant pathways for

EOWPVT (Appendix E, Table E2), and MAPK10 was in 40% (23) of those

-7 (poverrep=2x10 ).

130 Peabody Picture Vocabulary Test- There are 81 significant pathways for PPVT

-6 -6 (Appendix E, Table E2); MAPK8 (poverrep=2x10 ) were MAPK10 (poverrep=4x10 ) were in 34% (28) of the top pathways.

Listening Comprehension- There were 55 significant pathways (Appendix E, Table E2);

-14 GRIN2A is in 30% (16) of the pathways (poverrep=1x10 ); GRIN2A is only in 16 total

KEGG pathways.

Shared pathways

There are 44 pathways that are significant in at least two language traits

(Appendix F). The most interactive pathways are Mitogen Activated Protein Kinase

(MAPK) signaling pathway and Calcium Signaling. MAPK is connected to 30 other

pathways (68% of the shared pathways), and Calcium has connections to 14 pathways

(32% of the shared pathways) (Figure 6.3).

The pathways were significant for the receptive language traits, PPVT and

WIATLC; therefore, the enrichment is confined to receptive language. Calcium

signaling was significant for PPVT (p<0.001) and WIATLC (p<0.001) but not for

EOWPVT (p=0.012); MAPK was significant for PPVT and WIATLC p<0.001 but not

EOWPVT (p=0.9).

131

Figure 7.4 Interactions between significant pathways for language traits. Dotted lines are for Calcium Signaling, solid lines are for MAPK Signaling. Not all significant pathways are shown. Squares represent KEGG classifications of pathways.

Figure 7.4 Interactions between significant pathways for language traits. 3.3 Phonology

Multisyllabic word repetition-There were 38 pathways (Appendix E, Table E3) that were

-4 significant and PPP3CA appears in 22% (6) of them (poverrep =2x10 ).

Nonsense word repetition There were 29 significant pathways (Appendix E, Table E3)

-9 and AP6V appears in 20% (6) (poverrep=4x10 ).

Shared

There were 11 pathways that are significant in both MSW and NSW. Oxidative phosphorylation+ collecting duct acid secretion is the most connected (Figure 7.5).

132

Figure 7.5 Pathways significant in both MSW and NSW. Arrows indicate connections between pathways Figure 7.5 Pathways significant in both MSW and NSW 3.4 Reading

Word Attack- There were 48 pathways significant in Word Identification (Appendix E,

-6 Table E4), and MAPK10 appears in 41% (19) (poverrep=1x10 ).

Word Identification- There were 35 significant pathways (Appendix E, Table E4), and

-6 TLR5 appears in 5 (14%); TLR5 is only in 5 total KEGG pathways (poverrep=1x10 ).

Shared

There are 9 pathways and one super pathway (Cancer+ Melanogenesis) that are significant in both WRDID and WRDATK, and they converge on Toll Like Receptor

(TLR) signaling (Figure 7.5).

133

Figure 7.5 Shared pathways for reading traits. Figure 7.5 Shared pathways for reading traits 3.5 Spelling

There were 40 pathways that are significant for TWS (Appendix D). The analysis for

TWS includes pathway interactions while other individual traits do not because it is the only trait for the spelling endophenotype. We identified connections between the pathways on MAPK and Calcium signaling (Fig 7.6).

134

Figure 7.6 Interactions identified between significant spelling pathways Figure 7.6 Interactions identified between significant spelling pathways

3.6 Cross phenotype analysis

In this analysis we focused upon the diagnostic endophenotypes of SSD, articulation and phonology. There are 8 pathways significant in both endophenotypes (Table 7.3), but there are no connections between or among the pathways (no figure shown).

Table 7.3 Pathways significant in MSW or NSW and GFTA to account for the diagnostic endophenotypes of SSD Butirosin and neomycin biosynthesis Insulin signaling pathway N-Glycan biosynthesis Porphyrin and chlorophyll metabolism Staphylococcus aureus infection Sulfur metabolism T cell receptor signaling pathway Ubiquitin mediated proteolysis** **present in one trait for all five endophenotypes

4. Discussion

We performed the first pathway analysis on 10 quantitative traits that capture endophenotypes of speech sound disorders and characterized the genetic architecture

135 from a global (pathway) rather than from a local (gene) perspective. Our findings reveal connections to neuropathies and simultaneously highlight the multifactorial genetic underpinning of SSD.

To appropriately discuss the results, is necessary to consider the limitations of this analysis. The findings are entirely dependent upon the results of the initial GWAS, so any biases in the original analysis will also affect this analysis. Additionally, KEGG based results are biased by the pathway curation which is, in turn, biased by the research that exists in the literatures. We are only able to identify enrichment of signal not the direction of effect, so any pathways discussed may harbor both risk and protective loci; the fundamental point is that there is enrichment of GWAS signal within a pathway. With a firm understanding of these constraints, the results are meaningful and provide valuable insights into SSD and the endophenotypes studied.

First, the diverse classification of significant pathways among human disease, metabolism, information processing, etc. illustrates that the etiology of SSD is not specific to a single system or pathway. Rather the genes enriched for variation likely have pleiotropic effects participating in multiple pathways and behaving in a context specific manner.

Identifying genes that are significant and overrepresented in significant pathways revealed connections between SSD endophenotypes and neurological impairments.

Additionally, the gene based results support a multifactorial disease model in which multiple variants of small effect cause the observed phenotype; the overrepresented genes were not significant in the original GWAS a p<1x10-5, but by including less significant loci, they are. Mutations within PP2R2B, the most significantly overrepresented gene for

136 GFTA, cause spino-cerebellar ataxia resulting in poor speech and body movements, a similarity with the poor oral motor control observed in severe SSD (OMIM:604325 ;

Hara et al, 2007). GRIN2A, the most over represented gene for Weschler Listening

Comprehension subtest, is associated with epilepsy and aphasia (Tuner et al., 2015).

Finally, MAPK10 a map kinase, has also been associated with epilepsy and intellectual delay, and was overrepresented in the language traits as well as in the Test of Written

Spelling (Kunde et al., 2013). MAPK10 is a neuron specific MAP kinase that participates in signaling pathways during apoptosis (GeneCards). Programmed cell death is an important part of development (Haaneen et al., 1996) and Gilman and Mattson describe that regulators of apoptosis may also regulate plasticity of neural circuits (2002). Neural plasticity is a key component of language acquisition (White et al., 2013), thus the

MAPK10 cascade may be part of general language development.

Next, narrowing our focus to pathways shared among traits within an endophenotype and identifying the most interactive pathways draws connections between

SSD and autism spectrum disorder (ASD), Alzheimer’s disease (AD), and multiple sclerosis (MS).

First, convergence of the language and spelling pathways on MAPK and Calcium

Signaling mirrors recent findings in ASD. Using genes that have been associated with autism, Wen et al. performed pathway analysis revealing that the KEGG defined Calcium

Signaling pathway was most enriched for genes associated with ASD, and the MAPK

Signaling pathway was the most interactive (Wen et al., 2016). Though we cannot rank pathways using PARIS, the similarities between our language findings and those of Yen et al. are striking especially because difficulties with language are a component of autism

137 (Bishop et al., 2016; ASHA, 2016). Networks that affect language in autism are the same as those that affect language abilities in individuals with SSD. Ultimately, these pathways are likely important in typical language acquisition.

Calcium signaling and MAPK pathways have myriad functions so it is difficult to propose a single role in SSD etiology. Calcium signaling is necessary for neurotransmitter release at synapses, muscle contraction, and short-term memory (Dash et al., 2007). The MAPK pathway is similarly ubiquitous and is involved in cell proliferation, differentiation, , and apoptosis (Reviewed in Zhang and

Liu, 2002). As previously described, one member of the MAPK pathway that is involved in apoptotic signaling, MAPK10, is significantly over represented in the language and spelling associated pathways. Moreover, drawing connections between SSD and

Alzheimer’s, the protein encoded by MAPK10, JNK3, has been studied as a target for AD drug therapy development due to its role in apoptosis (Antoniou et al., 2011). For cognition, neural cell death may be just as important as cell differentiation and growth.

Next, the reading pathways converged on the TLR signaling, a pathway which is well known for its role in the immune response. However, there is growing evidence that this signaling pathway is pleiotropic, involved in typical neural development and neuropathology via complex mechanisms. Mouse models indicate that TLR activity affects typical development. TLR3 knockout mice exhibit increases in proliferation of subventricular zone neuroprogenitor cells (NPC), while activation of TLR2 inhibits neural cell proliferation (Lathia et al, 2008; Okun et al., 2010). Neuroprogenitor cells are the stem cells that differentiate into neurons; the subventricular zone is especially interesting because one component, the dentate gyrus, is thought to be involved in learning (Alvarez-

138 Buylla and Garcia-Verdugo, 2002). In fact, in the dentate gyrus, deficiencies of TLR4 increase proliferation of NPC while activation of TLR4 and TLR2 inhibits proliferation

(Okun et al., 2010 Rolls et al., 2007). Our GWAS results suggesting that IQCE, the shared reading locus, could impact neural proliferation in the dentate gyrus.

There is also evidence to suggest that TLRs play a role in working memory, a executive function that is impaired in individuals with SSD (Adams and Gathercole,

1995). TLR3 knock out mice exhibited evidence of enhanced working memory when they out-performed TLR3 +/+ mice on the Morris water maze, a test of learning and memory; the mechanism of this effect is unknown (Okun et al., 2010).

In mouse models of Alzheimer’s disease, deficiencies in TLR2 exacerbated memory loss, but the effects were reversed by bone marrow transplant with cells expressing TLR2 (Richard et al., 2008). In MS, it is thought that TLR2 activated by hyaluronan prevents re-myelination of neurons (Sloane et al., 2010; Soulika et al., 2009).

Interestingly, SPAM1, associated with spelling in Aim I, is a hyaluronidase and its enzymatic products inhibit remyelination (Back et al., 2015). The pathway results parallel our GWAS findings.

The single endophenotypes did provide into SSD etiology through highly connected pathways, but our combined analysis of articulation and phonology did not yield any connected pathways to discuss. One pathway of interest, though, is Ubiquitin

Mediated Proteolysis which further demonstrates the relationship between the endophenotypes in our study and other disorders. Ubiquitination is a process with numerous outcomes but this pathway is specific to targeting proteins for degradation

(Strieter and Kroasick, 2012)(Hatakeyama et al., 2003). Deletion of in UBE3A is

139 associated with Angleman’s syndrome, motor delay, and severe intellectual deficits

(Penzes et al, 2011; OMIM, GeneCards). Triplication of UBE3A and mutations at the phosphorylation site are associated with autism. In mouse models of the mutated phosphorylation site, the mutations leads to hyperactive UBE2A and over development of dendritic spines (Yi et al., 2015). Finally, because ubiquitination is necessary for protein degradation, the process had been implicated in plaque formation in Alzheimer’s

(Reviewed in Hedge and Upadhya, 2011).

Given the connections with AD and MS through these pathways, one question that may arise is, “Are SSD and its comorbidities risk factors for neurodegenerative disorders?” The question has been investigated in primary progressive aphasia, a syndrome in which language is progressively lost, but other cognitive functions are spared. Individuals with the aphasia reported childhood learning disabilities (dyslexia, spelling, and language impairment) more frequently than individuals with standard

Alzheimer’s disease (Rogalski et al., 2008). In a separate retrospective study of 85 adults with dementia, those with childhood learning disabilities (not well-defined in the paper) had 13 times the odds (1.3-128.4) of developing dementia characterized by localized degeneration instead of generalized degeneration as in typical AD. The authors posit this may be due to hypo-connectivity of specific brain regions which can also explain LD in childhood (Seifan et al., 2015). Both of these studies are limited by poor phenotyping and potentially biased reports of childhood LD, but if the results are real, they provide an explanation of the similarities we have identified. The genetic causes SSD, LI, or RD in childhood may related to those that casuse AD in adulthood. This line of questioning will

140 likely become important as the population continues to age, general understanding of the molecular basis of dementia improves, and the quest for treatment grows stronger.

Finally the similarities between SSD endophenotypes, Alzheimer’s, and MS lead to questions of how the same pathways participate in relatively benign and quite severe disorders.

These findings also highlight the importance of variation that is not highly significant; pathways were significant because they had a high burden of variants that were at least marginally associated with the quantitative traits. In order to fully understand the genetic etiology of SSD, we will need to understand the effects of genetic bacground, that is common variants of marginal effect and significance

We describe the result of a multistep pathway analysis and illustrate that pathways associated with endophenotypes of SSD are also associated with autism spectrum disorder, Alzheimer’s disease, and multiple sclerosis. We provide support for the notion that diseases with similar phenotypes have shared genetic etiologies on the pathway level, at least. Additionally Calcium, TLR, and MAPK signaling may be important components of normal speech and language development.

5. Future Directions

It would be informative to characterize the variants present in GRIN2A, MAPK10, and PP2R2B. Additionally, I did not focus upon pathways that are shared by many traits.

Those pathways could also be informative and should be examined. In general, the challenge that remains to be addressed is attributing more biological significance to these pathways as they specifically relate to SSD. Perhaps taking a narrower approach by characterizing significant pathways in detail—i.e. determining if significance driven by

141 one gene or many, how many significant features each gene has, and if significant genes are significant in many traits—would be helpful to this end.

Specific to this cohort, there is one family, Family 48, in which the proband has childhood apraxia of speech, one brother has Williams syndrome, and another has autism.

In this analysis, we have illustrated that autism and language endophenotypes converge on calcium and MAPK signaling. We could leverage population level information to help elucidate the unique phenotypes of this family by performing a burden test within the calcium and MAPK pathways on the autistic and apraxic children.

142 CHAPTER 8: General Conclusions and Future Directions

This work is the first GWAS performed in a sample ascertained based on speech sound disorder diagnosis. Examining ten quantitative traits that captured key communication endophenotypes—articulation, phonology, language (expressive and receptive), reading, and spelling—enabled exploration of the genetic basis of human communication and we have identified novel loci, replicated previous findings, and developed future research questions.

First, the novel loci we identified are in ANXA11, PAPP-A, SETD3, SLC39AI,

SPAM1 which will need to be replicated, an analysis that has begun. Additionally, we identified intergenic loci, and the characteristics of those regions need to be better interrogated using, for example, the ENCODE databases.

In each aim there is evidence of genetic similarities between SSD and other communication disorders, language impairment and reading disability. We are not the first to report these similarities, but as evidence of genetic similarities between SSD, LI, and RD grows categorical definitions becomes questionable. From a clinical standpoint, such groupings are necessary for treatment planning, but it seems, that in order to truly understand the complex genetic etiology of the emergent phenotype it may be necessary to disregard dichotomous categorizations and use endophenotypes as we did in this project.

Strikingly, many of our results have connections to more severe neuropathies such as autism spectrum disorder and Alzheimer’s disease. If these similarities are real then they force consideration of what factors result in one phenotype instead of another.

143 There are most likely environmental and epigenetic factors as well as actual deleterious variants at the root of the differences.

Future directions

First, as with any GWAS, a replication analysis is necessary. We are in the process of completing this analysis and have received results form groups in the United

States and the UK. Our analyses indicated that SSD are heterogenous, so we may not replicate any of our initial findings. Additionally, Chapter 5 indicates that accounting for language impairment and reading disability alters some of the genetic effect estimates, so ascertainment will be important to consider for our replication. Some of the samples are population based samples and others were originally ascertained based on LI.

Using the current sample, there are many possible additional analyses. This dissertation did not delve into the varying levels of speech sound disorder severity, and future analyses should do so. One approach could be to conduct a polygenic risk score/burden type test using the loci we have identified; do severely affected individuals have a higher burden variants at the genome wide suggestive loci? Similarly, what is the relationship between SSD severity and dosage at the various loci we identified.

Progressing to the pathway analysis results, future research could examine if and how the burden of variants in significant pathways vary by SSD severity.

It is also important to account for the developmental trajectory of SSD, so future work could use test scores from multiple time points or a repeated measures approach.

Recent work has found that for spelling, the effect of genetics is more pronounced in adolescence than in childhood (Unpublished results, Lewis, Stein, and Benchek). The effects of the variants may change if we used test scores from a later test date.

144 We did not account for any environmental factors such as socioeconomic status because including it would further reduce our power to detect genetic effects. Also, the shared environmental factors due to SES are accounted for by the linear mixed model, so we felt it would be acceptable to leave SES out. However, this does not indicate that environmental factors are not important SSD and cognitive phenotypes, in general.

Future work should account for these factors. Also, we did not account for overall IQ because of difficulties harmonizing measures as well as missing data leading to a decreased sample size. Some results may therefore be partially confounded by overall

IQ.

Our results also provide insights for future studies. In ascertaining a new sample, it is important to be mindful of comorbid conditions and attempt to ascertain balanced comorbidity groups (SSD only, SSD+LI, SSD+RD). Similarly, ascertaining a larger sample will be important for identifying rare variants. This analysis focused on English, but future work should focus on languages with phonemic orthographies (e.g. Castilian

Spanish, Croatian, Finnish, Serbian) meaning that sounds correspond directly to letters

(graphemes). In these languages, the genetics of reading and spelling maybe the same as those for phonology.

145 Appendix A Additional Material for Chapter 3- Methods

Distributions of z-scores pre and post transformation, if applicable

Figure A1 Fletcher Time by Count and Goldman-Fristoe Test of Articulation Figure A1 Fletcher Time by Count and Goldman-Fristoe Test of Articulation

146 Figure A2 PPVT and WIATLC

Figure A3- MSW and NSW

147 Figure A4 Reading- Word Attack and Word Identification

148 Figure A5 Test of Written Spelling

149

Sample Ancestry

All non-Cacuasian individuals were removed from the analysis because the sample

(n=85) was not large enough to perform a stratified analysis and adjusting for the first three principal components would lead to a potentially avoidable loss of power in our analyses. Table A1 and Figure A1.6 below describes the ancestry of the sample

Table A1- Ancestry of the individuals who passed quality control.

African Middle East Pacific Caucasian Hispanic Mixed NA American Eastern Asian Islander

n 606 44 3 7 5 2 20 3

proportion 83% 6% 0% 1% 1% 0% 3% 0% total

proportion non- 0% 52% 4% 8% 6% 2% 24% 4% Caucasians

BT Speech 285 (48.3) 20 (45) 0 1 (14%) 2 (40) 0 10 (50) 1 (33)

2 10 Sex 283 (46.7) 23 (52.3) 2 (66.7) 3 (42.9) 2 (100.0) (40.0) (50.0)

Figure A6 Principle component plots-Left PC1 vs PC2 with Caucasians clustering together, African Americans, and other in the middle. Right PCV2 vs PC3 exhibiting looser clustering than PC1 vs PC2

150 Power Calculations

An R wrapper (Minikel cureffi.org) for Genomic Power Calculator was used for these calculations (Purcell et al., 2003). The values below are rough estimates. Because we have heterogeneous family structures (singletons, sibships of unequal size, and parent offspring trios), and the variance explained by a given marker is unknown, it is difficult to precisely specify the parameters. Nonetheless, the parameters used are described below. Figure A7 is a heat map that depicts power at various minor allele frequencies and effect sizes. It should be noted that this analysis does not account for additional covariates nor the use of a linear mixed model.

Parameters Sample size (number of families)= 156 Alpha= 1x10-5 (cutoff for suggestive results) Variance explained by marker= 0.01 D’=0.8 (LD between marker and causal variant) Sibling correlation= 0.5 Size of sibships= 2 (this is an underestimate) MAF= 0.05-0.25 (all markers with MAF<0.05 were removed from analysis) Effect estimate=range , reported in standard deviations

Figure A7 Power at various minor allele frequencies and effect estimates. Blue indicates the commonly accepted 80% power.

151 Binary trait

We used GAS Power Calculator from the Abecasis group at the University of Michigan

(Skol et al., 2004) to perform power calculations for the binary trait of speech sound disorder affection status. The calculation does not account for having related individuals or for use of mixed model.

The following parameters were used based on our data: Cases: 296 Controls: 316 Significance: 1x10-5 Allele frequency: 0.25 Relative risk: 1.2 (largest beta in suggestive markers) Prevalence: 16%

Power was low; 0.003 (.3%) making the significance level cutoff less stringent (0.001) increase power to 0.052 ( 5%) (Figure A1.8). Given these calculations, any analysis of the binary trait should not be restricted to highly significant markers.

152

Figure A8 Effects of altering various parameters on power for binary outcome. A- Significance level. B. Risk Allele Frequency , and C Relative Risk. All figures obtained from GAS Power Calculator available from the university of Michigan. (Skol et al., 2004)

153 Appendix B- Additional Materials for Chapter 4

1. Model Selection

Selecting an appropriate model for the analysis was an iterative process resulting in selection of a linear mixed model without parents for all traits except MSW and NSW.

First, a linear mixed model is most appropriate for our study because accounts for the correlation between family members. This relationship matrix is used as a the covariance matrix for the genetic effect. We selected LMM instead of generalized estimating equations as implemented in GWAF because GEE clusters families but does not account for the correlation between family members. Moreover, the GEE results exhibited evidence of inflation of p-values which could have been due to inadequate correction for family structure (figures not shown).

After settling on a LMM using GCTA, there was still evidence of slight over correction. This could, in part be due to inclusion of parents in the sample. Figures B1-

B4 are the QQ plots comparing the model with parents to those without. Even though

LMM precisely accounts for familial relationships, we did not have data for all parents which could have caused over adjustment. Moreover, for most traits, parents’ z-scores were close to the mean which increased weight in the middle of the distribution and our power to detect genetic components is in the extreme phenotypes (Fig B5 and B16).

154

Figure B1. QQ plots for Articulation and Oral Motor Control

155

Figure B2. QQ plots for language endophenotypes

156

Figure B3. QQ plots for reading endophenotypes

157

Figure B4. QQ plots for spelling

158

Figure B5. Histograms of articulation and language traits

Figure B5. Histograms of articulation and language traits

159

Figure B6. Histograms of phonology, reading, and spelling traits

160 Table B1. Lambda values for four models. GEE-Generalized estimating equations as implemented in GWAF; LMM=Linear mixed model *indicates the analysis was not run

Model LMM- LMM- LMM- GEE with PCs no PCs no PCs, no parents Fletcher 1.05 1.05 0.98 0.98 GFTA 1.04 1.04 0.98 0.99 MSW 1.06 * 0.99 0.99 NSW 1.04 1.02 0.98 PPVT 1.08 * 1.03 0.99 TWS 1.09 * 1.05 0.99 WIATLC 1.09 1.09 1.00 0.97 WRDATK 1.07 * 1.00 0.97 WRDID 1.07 1.05 1.00 0.98

Table B2 Sample sizes with and without parents With Without Parents Parents EOWPVT 397 347 Fletcher 393 315 GFTA 334 334 MSW 438 NSW 438 PPVT 404 353 TWS 300 300 WIATLC 313 266 WRDID 400 318 WRDATK 402 320

161

2. Full GWAS Results

All markers with p <1x10-5\ SE= Standard error MAF= Minor allele frequency in this cohort KGP= 1000 Genomes Project Minor Allele Frequency KGPeur= 1000 Genomes Project Minor allele 1. Articulation and Motor Control 1.1 Fletcher Time by Count

Gene Start cytoBand Function rsID P Beta SE MAF KGP KGP eur AKAP3 4726297 12p13.32 intronic rs12829711 6.80E-06 -0.135 0.0300 0.323 0.156 0.294 ATP12A 25285768 13q12.12 UTR3 rs2722 2.59E-06 0.2 0.0426 0.124 0.204 0.145 CBX7,PDG 39616878 22q13.1 intergenic rs116540704 9.46E-07 -0.221 0.0450 0.114 0.039 0.108 FB CBX7,PDG 39605755 22q13.1 intergenic rs56310790 7.35E-06 -0.207 0.0463 0.114 0.039 0.109 FB HBS1L,MY 1.35E+08 6q23.3 intergenic rs9321489 3.63E-06 -0.149 0.0323 0.264 0.433 0.272 B HS3ST3B1, 14840665 17p12 intergenic rs57680355 3.24E-06 0.146 0.0314 0.283 0.219 0.293 CDRT7 14833481 17p12 intergenic rs16950649 5.35E-06 0.142 0.0312 0.286 0.276 0.298 14841227 17p12 intergenic rs1981651 5.40E-06 0.142 0.0312 0.286 0.239 0.293 14840171 17p12 intergenic rs2215275 5.41E-06 0.142 0.0312 0.286 0.234 0.293 14843341 17p12 intergenic rs2215274 5.41E-06 0.142 0.0312 0.286 0.234 0.294 14835085 17p12 intergenic rs8073675 5.41E-06 0.142 0.0312 0.286 0.237 0.293 14832800 17p12 intergenic rs11869727 6.02E-06 0.156 0.0344 0.226 0.146 0.238 14826838 17p12 intergenic rs59932116 7.53E-06 0.154 0.0343 0.225 0.189 0.238 14826457 17p12 intergenic . 7.90E-06 0.153 0.0343 0.224 0.189 0.238

162 14834781 17p12 intergenic . 9.37E-06 0.14 0.0315 0.295 0.248 0.295

NDUFA9 4788366 12p13.32 intronic rs7302268 8.94E-06 -0.138 0.0310 0.306 0.189 0.285

PLD5,LINC 2.43E+08 1q43 intergenic rs10926785 1.73E-06 0.158 0.0330 0.241 0.281 0.27 01347 2.43E+08 1q43 intergenic rs4658834 1.78E-06 0.158 0.0330 0.242 0.303 0.278

2.43E+08 1q43 intergenic rs10926781 2.66E-06 0.155 0.0331 0.241 0.277 0.278 2.43E+08 1q43 intergenic rs10926778 4.07E-06 0.153 0.0331 0.242 0.279 0.278

2.43E+08 1q43 intergenic rs12046498 4.79E-06 0.151 0.0331 0.241 0.278 0.278 2.43E+08 1q43 intergenic rs10926783 4.87E-06 0.164 0.0358 0.213 0.182 0.22

2.43E+08 1q43 intergenic rs7527225 4.91E-06 0.154 0.0338 0.23 0.276 0.264 2.43E+08 1q43 intergenic rs10926782 5.25E-06 0.153 0.0337 0.234 0.273 0.265

PRTFDC1 25161334 10p12.1 intronic rs2478094 9.80E-06 -0.17 0.0386 0.846 0.834 0.845 25164979 10p12.1 intronic rs1932422 9.80E-06 -0.17 0.0386 0.846 0.82 0.844 25157455 10p12.1 intronic rs1033960 9.89E-06 -0.17 0.0386 0.846 0.819 0.846 SOS2,L2HG 50699868 14q21.3 intergenic rs1955926 7.19E-06 0.13 0.0290 0.589 0.59 0.617 DH

1.2 Goldman Fristoe Test of Articulation Gene Start cytoBand Function rsID PVAL Beta SE MAF KGP KGP eur . 4:5711401 . . . 9.96E-06 -0.219 0.05 0.611 5 ADAM12 10:1.28E+ 10q26.2 intronic rs17684713 5.86E-06 -0.297 0.066 0.167 0.085 0.198 08 ICA1 7:8169008 7p21.3 intronic rs59728847 3.85E-06 0.245 0.053 0.296 0.518 0.317 KIAA1211 4:5711408 4q12 intronic rs13140685 9.67E-06 -0.219 0.05 0.615 0.591 0.624 3 PAPPA 19:.19E+0 9q33.1 intronic rs2273977 4.38E-06 0.242 0.053 0.294 0.214 0.315 8 9:1.19E+09q33.1 intronic rs3761843 5.11E-06 -0.236 0.052 0.692 0.66 0.677 8 9:1.19E+09q33.1 intronic rs10513273 8.26E-06 -0.231 0.052 0.689 0.664 0.679 8 1.19E+08 9q33.1 intronic rs13294988 8.76E-06 -0.225 0.051 0.669 0.564 0.667

163

2. Language 2.1 Expressive One Word Picture Vocabulary Test Gene Start cytoBand Function rsID pval Beta SE MAF KGP KGP eur CACUL1,N 1.21E+08 10q26.11 intergenic rs12255925 3.73E-06 2.148 0.464 0.086 0.169 0.091 ANOS1 1.21E+08 10q26.11 intergenic rs12255920 3.73E-06 2.148 0.464 0.086 0.169 0.091 1.21E+08 10q26.11 intergenic rs12262895 4.01E-06 2.141 0.464 0.086 0.202 0.092 DMRT2,S 1528456 9p24.3 intergenic rs882793 6.81E-06 1.098 0.244 0.553 0.378 0.541 MARCA2 DTNBP1 15636388 6p22.3 intronic . 7.40E-06 1.292 0.288 0.239 0.117 0.228 EPHA4,PA 2.23E+08 2q36.1 intergenic . 8.01E-07 -1.35 0.274 0.322 0.367 0.319 X3 KIAA0391 35635320 14q13.2 intronic rs8016860 6.36E-06 1.099 0.243 0.522 0.565 0.527 35635209 14q13.2 intronic rs8015470 6.37E-06 1.098 0.243 0.522 0.564 0.527 35634851 14q13.2 intronic . 6.38E-06 1.099 0.243 0.522 0.568 0.527 LINC01340 97929333 5q15 intergenic rs191730 4.44E-07 -1.458 0.289 0.757 0.716 0.769 ,RGMB 97831479 5q15 intergenic rs13154568 7.85E-07 1.5 0.304 0.221 0.26 0.216 97827290 5q15 intergenic rs9327251 8.85E-07 1.492 0.304 0.221 0.265 0.216 97826575 5q15 intergenic rs11952745 1.06E-06 1.483 0.304 0.222 0.265 0.216 97921377 5q15 intergenic rs193500 3.31E-06 -1.364 0.293 0.764 0.715 0.769 97901867 5q15 intergenic rs36760 3.45E-06 -1.365 0.294 0.763 0.746 0.769 97918829 5q15 intergenic rs29661 3.49E-06 -1.36 0.293 0.764 0.74 0.769 97914318 5q15 intergenic rs27971 3.50E-06 -1.36 0.293 0.764 0.747 0.77 97908772 5q15 intergenic rs468674 3.51E-06 -1.36 0.293 0.764 0.745 0.769 97906631 5q15 intergenic . 3.51E-06 -1.36 0.293 0.764 0.746 0.769

164 97910984 5q15 intergenic rs27030 3.51E-06 -1.36 0.293 0.764 0.738 0.769 97908557 5q15 intergenic rs152967 3.51E-06 -1.36 0.293 0.764 0.745 0.769 97910223 5q15 intergenic rs27031 3.52E-06 -1.36 0.293 0.764 0.744 0.769 97901167 5q15 intergenic rs36762 3.55E-06 -1.364 0.294 0.763 0.746 0.768 97884862 5q15 intergenic rs78952764 3.80E-06 -1.354 0.293 0.742 0.705 0.762 97928295 5q15 intergenic rs171528 4.29E-06 -1.349 0.294 0.765 0.751 0.773 97844976 5q15 intergenic rs2042966 4.63E-06 -1.362 0.297 0.777 0.692 0.778 97901942 5q15 intergenic rs36759 4.75E-06 -1.346 0.294 0.765 0.748 0.772 97901661 5q15 intergenic rs36761 4.81E-06 -1.346 0.294 0.764 0.748 0.772 97911704 5q15 intergenic rs27029 4.85E-06 -1.342 0.293 0.766 0.749 0.772 97904330 5q15 intergenic rs29670 4.85E-06 -1.342 0.293 0.766 0.748 0.771 97915737 5q15 intergenic rs469191 4.86E-06 -1.341 0.293 0.766 0.703 0.773 97904587 5q15 intergenic . 5.11E-06 -1.344 0.295 0.763 0.743 0.765 97903751 5q15 intergenic rs27786 5.41E-06 -1.336 0.294 0.765 0.748 0.771 97925647 5q15 intergenic rs152956 5.84E-06 -1.369 0.302 0.774 0.712 0.782 97925585 5q15 intergenic rs152957 6.23E-06 -1.364 0.302 0.774 0.745 0.782 97915265 5q15 intergenic rs27870 8.32E-06 -1.307 0.293 0.764 0.67 0.773 97898434 5q15 intergenic rs36763 9.71E-06 -1.304 0.295 0.754 0.744 0.766 LOC10272 69441249 18q22.3 ncRNA_int rs62100637 3.58E-06 2.343 0.506 0.083 0.085 0.099 4913 ronic MGC34796 62144561 1p31.3 intergenic rs111751274 5.61E-06 -1.263 0.278 0.269 0.107 0.226 ,TM2D1 62145065 1p31.3 intergenic rs17378384 5.64E-06 -1.262 0.278 0.269 0.107 0.226 62142236 1p31.3 intergenic rs11207807 5.69E-06 -1.261 0.278 0.27 0.106 0.226 62142097 1p31.3 intergenic rs12117475 5.69E-06 -1.261 0.278 0.27 0.106 0.226 62136871 1p31.3 intergenic rs72672375 8.58E-06 -1.266 0.284 0.272 0.133 0.229 MICB,MC 31487222 6p21.33 intergenic rs2516490 9.28E-06 -1.5 0.338 0.178 0.163 0.187 CD1 31488256 6p21.33 intergenic rs2516488 1.00E-05 -1.493 0.338 0.178 0.163 0.187

165 NTRK3 88464110 15q25.3 intronic rs1465747 4.53E-06 1.137 0.248 0.443 0.432 0.372 88463831 15q25.3 intronic rs11855377 4.53E-06 1.137 0.248 0.443 0.499 0.373 88465787 15q25.3 intronic rs6496456 4.53E-06 1.137 0.248 0.443 0.442 0.372 PDZRN3- 73911317 3p13 intergenic rs17708394 7.08E-06 -1.57 0.349 0.136 0.137 0.189 AS1,CNTN 3 TM2D1 62188432 1p31.3 intronic rs1286625 1.13E-06 -1.344 0.276 0.273 0.181 0.241 62182133 1p31.3 intronic rs1692132 1.69E-06 -1.318 0.275 0.272 0.162 0.226 62186724 1p31.3 intronic rs1151773 1.70E-06 -1.321 0.276 0.271 0.161 0.226 62165018 1p31.3 intronic rs6697770 2.57E-06 -1.295 0.275 0.271 0.107 0.224 62180322 1p31.3 intronic rs7539026 2.69E-06 -1.302 0.277 0.269 0.107 0.223 62187067 1p31.3 intronic rs12144284 2.75E-06 -1.303 0.278 0.267 0.107 0.223 62191459 1p31.3 upstream rs12141280 2.86E-06 -1.298 0.277 0.268 0.106 0.224 62180070 1p31.3 intronic rs7536565 2.86E-06 -1.29 0.276 0.27 0.109 0.224 62180373 1p31.3 intronic rs7550631 2.88E-06 -1.29 0.276 0.27 0.107 0.224 62182686 1p31.3 intronic rs12137410 2.88E-06 -1.292 0.276 0.27 0.107 0.224 62170615 1p31.3 intronic rs12130439 2.89E-06 -1.297 0.277 0.269 0.106 0.222 62188362 1p31.3 intronic rs72674608 2.92E-06 -1.294 0.277 0.269 0.107 0.224 62173950 1p31.3 intronic rs12134428 2.95E-06 -1.287 0.275 0.27 0.107 0.224 62171474 1p31.3 intronic rs2051136 3.03E-06 -1.286 0.276 0.27 0.107 0.224 62171015 1p31.3 intronic rs2051135 3.07E-06 -1.286 0.276 0.27 0.106 0.221 62169178 1p31.3 intronic rs4443919 3.10E-06 -1.286 0.276 0.27 0.107 0.224 62169589 1p31.3 intronic rs6666057 3.10E-06 -1.285 0.276 0.27 0.108 0.224 62183235 1p31.3 intronic . 3.30E-06 -1.286 0.277 0.27 0.112 0.222 62155806 1p31.3 intronic rs72672390 3.61E-06 -1.287 0.278 0.268 0.107 0.224 62150826 1p31.3 intronic rs12124430 4.17E-06 -1.272 0.276 0.27 0.107 0.226 62180170 1p31.3 intronic rs7538825 4.29E-06 -1.24 0.27 0.298 0.242 0.25 62149237 1p31.3 intronic rs12134205 4.33E-06 -1.27 0.276 0.27 0.107 0.226

166 62148968 1p31.3 intronic rs12134102 4.36E-06 -1.27 0.276 0.27 0.108 0.226 62179196 1p31.3 intronic rs2799626 4.59E-06 -1.233 0.269 0.298 0.242 0.251

62175941 1p31.3 intronic rs1286624 4.63E-06 -1.233 0.269 0.298 0.241 0.249 62175540 1p31.3 intronic rs1151771 4.63E-06 -1.233 0.269 0.298 0.241 0.249 62182011 1p31.3 intronic rs1620169 4.69E-06 -1.235 0.27 0.298 0.242 0.249 62164229 1p31.3 intronic . 4.75E-06 -1.287 0.281 0.274 0.22 0.24 62161593 1p31.3 intronic rs12141298 4.86E-06 -1.256 0.275 0.277 0.107 0.227 62156738 1p31.3 intronic rs1151762 7.83E-06 -1.207 0.27 0.297 0.289 0.251 62177027 1p31.3 intronic rs1779267 7.84E-06 -1.204 0.269 0.3 0.289 0.251 62175340 1p31.3 intronic rs1151770 7.91E-06 -1.204 0.269 0.299 0.283 0.249 62155080 1p31.3 intronic rs1151759 8.04E-06 -1.206 0.27 0.297 0.289 0.252 62182024 1p31.3 intronic rs1692131 8.05E-06 -1.206 0.27 0.299 0.288 0.25 62182802 1p31.3 intronic rs1611797 8.12E-06 -1.206 0.27 0.299 0.288 0.25 62188618 1p31.3 intronic rs1286626 8.32E-06 -1.209 0.271 0.298 0.283 0.249 62191593 1p31.3 upstream rs1151775 8.34E-06 -1.211 0.272 0.297 0.288 0.25 62152315 1p31.3 intronic rs1151757 8.56E-06 -1.203 0.27 0.297 0.289 0.251 62151773 1p31.3 intronic rs1151756 8.64E-06 -1.203 0.27 0.297 0.283 0.251 TM2D1,IN 62193432 1p31.3 intergenic rs1286628 1.12E-06 -1.348 0.277 0.272 0.185 0.242 ADL 62193863 1p31.3 intergenic rs1286629 1.20E-06 -1.348 0.278 0.271 0.186 0.243 62193871 1p31.3 intergenic rs1286630 1.21E-06 -1.348 0.278 0.271 0.184 0.243 62198755 1p31.3 intergenic rs112143036 3.13E-06 -1.327 0.285 0.264 0.185 0.243 62194515 1p31.3 intergenic . 6.02E-06 -1.287 0.284 0.266 0.167 0.225 62200846 1p31.3 intergenic . 6.14E-06 -1.291 0.285 0.262 0.186 0.243 62201203 1p31.3 intergenic rs1151776 6.21E-06 -1.3 0.288 0.26 0.185 0.241 62193338 1p31.3 intergenic . 8.19E-06 -1.214 0.272 0.297 0.288 0.251

167 2.2 Peabody Picture Vocabulary Test Gene Start cytoBand Function rsID pval beta se maf KGP KGP eur ATP8B1,N 55559953 18q21.31 intergenic rs3848518 7.31E-06 0.435 0.097 0.751 0.692 0.715 EDD4L CD101 1.18E+08 1p13.1 exonic rs3754112 5.21E-06 0.42 0.092 0.369 0.15 0.337 1.18E+08 1p13.1 intronic rs2491122 9.88E-06 -0.422 0.095 0.639 0.843 0.659 GAS2 22786413 11p14.3 intronic rs10734321 4.93E-06 0.498 0.109 0.824 0.722 0.823 22786036 11p14.3 intronic rs10741951 4.97E-06 0.498 0.109 0.824 0.747 0.824 22785969 11p14.3 intronic rs10741950 4.98E-06 0.498 0.109 0.824 0.747 0.824 KLKP1 51394776 19q13.33 ncRNA_intro . 7.56E-06 -0.539 0.12 0.157 0.174 0.168 nic KLKP1,KL 51408243 19q13.41 intergenic rs14581998 4.89E-06 -0.648 0.142 0.105 0.05 0.085 K4 2 LINC01346 4405529 1p36.32 intergenic rs478144 6.35E-06 0.522 0.116 0.149 0.114 0.14 ,LOC28466 1 LOC1720,F 84128740 2p11.2 intergenic rs6745967 6.13E-06 -0.779 0.172 0.063 0.37 0.072 UNDC2P2 84121008 2p11.2 intergenic rs13420155 6.22E-06 -0.778 0.172 0.062 0.37 0.072 84141340 2p11.2 intergenic rs7570093 6.37E-06 -0.777 0.172 0.062 0.369 0.072 84103158 2p11.2 intergenic rs4572607 6.41E-06 -0.776 0.172 0.063 0.369 0.072 84102282 2p11.2 intergenic rs6719756 6.44E-06 -0.777 0.172 0.062 0.328 0.072 84099459 2p11.2 intergenic rs35525406 6.45E-06 -0.777 0.172 0.062 0.391 0.072 84102306 2p11.2 intergenic . 6.46E-06 -0.777 0.172 0.062 0.369 0.072 84102186 2p11.2 intergenic rs6761784 6.47E-06 -0.777 0.172 0.062 0.369 0.072 84101229 2p11.2 intergenic rs7578925 6.48E-06 -0.777 0.172 0.062 0.391 0.072 84123481 2p11.2 intergenic rs11892958 8.98E-06 -0.77 0.173 0.061 0.324 0.072 84124058 2p11.2 intergenic rs13393339 8.99E-06 -0.77 0.173 0.061 0.324 0.072 84119068 2p11.2 intergenic rs11897649 9.31E-06 -0.769 0.173 0.061 0.324 0.072 84114996 2p11.2 intergenic rs13411875 9.44E-06 -0.768 0.173 0.061 0.324 0.072 84116856 2p11.2 intergenic rs10186197 9.44E-06 -0.768 0.173 0.061 0.324 0.072

168 84100004 2p11.2 intergenic rs6721721 9.46E-06 -0.769 0.174 0.061 0.322 0.072 84099675 2p11.2 intergenic rs10177040 9.47E-06 -0.769 0.174 0.061 0.322 0.072 84115985 2p11.2 intergenic rs7566318 9.55E-06 -0.768 0.173 0.061 0.314 0.072 MGC34796 62136871 1p31.3 intergenic rs72672375 9.14E-06 -0.432 0.097 0.271 0.133 0.229 ,TM2D1 PTGFRN 1.18E+08 1p13.1 intronic rs72699131 4.38E-06 0.454 0.099 0.318 0.164 0.312 TM2D1 62164229 1p31.3 intronic . 5.71E-06 -0.438 0.096 0.273 0.22 0.24 62165018 1p31.3 intronic rs6697770 6.67E-06 -0.424 0.094 0.270 0.107 0.224 62191459 1p31.3 upstream rs12141280 6.83E-06 -0.426 0.095 0.268 0.106 0.224 62188432 1p31.3 intronic rs1286625 6.93E-06 -0.424 0.094 0.272 0.181 0.241 62180322 1p31.3 intronic rs7539026 7.27E-06 -0.425 0.095 0.268 0.107 0.223 62186724 1p31.3 intronic rs1151773 7.45E-06 -0.423 0.094 0.271 0.161 0.226 62188362 1p31.3 intronic rs72674608 7.52E-06 -0.423 0.094 0.268 0.107 0.224 62187067 1p31.3 intronic rs12144284 7.59E-06 -0.425 0.095 0.267 0.107 0.223 62170615 1p31.3 intronic rs12130439 8.03E-06 -0.423 0.095 0.268 0.106 0.222 62182133 1p31.3 intronic rs1692132 8.28E-06 -0.42 0.094 0.271 0.162 0.226 62182686 1p31.3 intronic rs12137410 8.47E-06 -0.42 0.094 0.269 0.107 0.224 62155806 1p31.3 intronic rs72672390 8.85E-06 -0.422 0.095 0.268 0.107 0.224 62180373 1p31.3 intronic rs7550631 9.00E-06 -0.418 0.094 0.270 0.107 0.224 62183235 1p31.3 intronic . 9.04E-06 -0.419 0.094 0.269 0.112 0.222 62180070 1p31.3 intronic rs7536565 9.15E-06 -0.418 0.094 0.270 0.109 0.224 62173950 1p31.3 intronic rs12134428 9.63E-06 -0.416 0.094 0.270 0.107 0.224 62171474 1p31.3 intronic rs2051136 9.78E-06 -0.416 0.094 0.270 0.107 0.224 62171015 1p31.3 intronic rs2051135 9.86E-06 -0.416 0.094 0.270 0.106 0.221 62169178 1p31.3 intronic rs4443919 9.88E-06 -0.416 0.094 0.270 0.107 0.224 TM2D1,IN 62193871 1p31.3 intergenic rs1286630 6.24E-06 -0.428 0.095 0.271 0.184 0.243 ADL 62193432 1p31.3 intergenic rs1286628 6.24E-06 -0.427 0.095 0.272 0.185 0.242 62193863 1p31.3 intergenic rs1286629 6.26E-06 -0.428 0.095 0.271 0.186 0.243

169 62198755 1p31.3 intergenic rs11214303 7.01E-06 -0.435 0.097 0.263 0.185 0.243 6 62200289 1p31.3 intergenic rs12130241 8.53E-06 -0.432 0.097 0.256 0.107 0.224

62201203 1p31.3 intergenic rs1151776 8.68E-06 -0.434 0.098 0.259 0.185 0.241 62200846 1p31.3 intergenic . 9.93E-06 -0.428 0.097 0.261 0.186 0.243

VWDE,SCI 12527486 7p21.3 intergenic rs847926 5.52E-06 0.381 0.084 0.475 0.662 0.428 N 12513221 7p21.3 intergenic rs847949 5.80E-06 0.452 0.1 0.222 0.255 0.215 12513185 7p21.3 intergenic rs847950 5.85E-06 0.452 0.1 0.222 0.255 0.215

12512939 7p21.3 intergenic rs1659947 6.34E-06 0.45 0.1 0.222 0.253 0.214 12512936 7p21.3 intergenic rs1659946 6.39E-06 0.449 0.1 0.222 0.253 0.214

12512889 7p21.3 intergenic rs847951 6.44E-06 0.449 0.1 0.222 0.253 0.215 12512654 7p21.3 intergenic rs847954 6.46E-06 0.449 0.1 0.222 0.253 0.214

12512555 7p21.3 intergenic . 6.48E-06 0.449 0.1 0.222 0.252 0.214 12511924 7p21.3 intergenic rs847956 6.52E-06 0.449 0.1 0.222 0.254 0.214 12511312 7p21.3 intergenic rs11275 6.57E-06 0.449 0.1 0.222 0.253 0.213 12512831 7p21.3 intergenic rs847952 6.75E-06 0.45 0.1 0.221 0.252 0.214 12512802 7p21.3 intergenic rs847953 6.76E-06 0.45 0.1 0.221 0.252 0.214

2.3 Weschler Listening Comprehension Gene Start cytoBand Func rsID pval B SE maf KGP KGP eur CFLAR- 2.02E+08 2q33.1 ncRNA_intro rs56009967 6.82E-06 0.521 0.263 0.263 0.211 0.216 AS1 nic . 1.94E+08 . . . 7.83E-06 0.489 0.267 0.267 SLC39A8 1.03E+08 4q24 intronic rs11725311 3.76E-07 -0.514 0.341 0.341 0.264 0.358 SLC39A8 1.03E+08 4q24 intronic rs14440152 5.84E-07 -0.502 0.344 0.344 0.265 0.359 9 SLC39A8 1.03E+08 4q24 intronic rs11733504 5.95E-07 -0.501 0.344 0.344 0.268 0.359 SLC39A8 1.03E+08 4q24 intronic rs2165265 5.97E-07 -0.501 0.344 0.344 0.268 0.359 SLC39A8 1.03E+08 4q24 intronic rs10489122 1.37E-06 -0.48 0.340 0.34 0.272 0.351

170 SLC39A8 1.03E+08 4q24 intronic rs72692219 1.40E-06 -0.479 0.341 0.341 0.272 0.351 SLC39A8 1.03E+08 4q24 intronic rs2119213 1.49E-06 -0.49 0.339 0.339 0.27 0.359 SLC39A8 1.03E+08 4q24 intronic rs2165266 1.69E-06 -0.476 0.341 0.341 0.274 0.354 SLC39A8 1.03E+08 4q24 intronic rs13903638 1.92E-06 -0.498 0.295 0.295 0.225 0.304 0 SLC39A8 1.03E+08 4q24 intronic rs62327916 3.96E-06 -0.487 0.291 0.291 0.276 0.305 SLC39A8 1.03E+08 4q24 intronic rs11097772 4.03E-06 -0.457 0.344 0.344 0.281 0.355 SLC39A8 1.03E+08 4q24 intronic rs4698845 4.04E-06 -0.457 0.343 0.343 0.275 0.358 SLC39A8 1.03E+08 4q24 intronic rs11734114 4.05E-06 -0.459 0.345 0.345 0.284 0.359 SLC39A8 1.03E+08 4q24 intronic rs11737763 4.08E-06 -0.457 0.344 0.344 0.28 0.355 SLC39A8 1.03E+08 4q24 intronic rs62327949 4.12E-06 -0.457 0.344 0.344 0.282 0.356 SLC39A8 1.03E+08 4q24 intronic rs4698844 4.14E-06 -0.456 0.344 0.344 0.289 0.359 SLC39A8 1.03E+08 4q24 intronic rs7699390 4.14E-06 -0.456 0.344 0.344 0.282 0.359 SLC39A8 1.03E+08 4q24 intronic rs7694296 4.14E-06 -0.456 0.344 0.344 0.289 0.359 SLC39A8 1.03E+08 4q24 intronic rs4476601 9.05E-06 -0.46 0.325 0.325 0.467 0.354 TRAM2- 52510891 6p12.2 intergenic rs34956539 7.24E-06 -0.839 0.082 0.082 0.045 0.081 AS1,LOC7 30101 LOC10192 81764216 9q21.31 intergenic rs11138098 4.34E-07 -1.051 0.066 0.066 0.067 0.045 7450,TLE4 LOC10192 81768362 9q21.31 intergenic rs77250496 7.93E-07 -1.062 0.063 0.063 0.071 0.045 7450,TLE4 . 43930381 . . . 2.56E-07 0.549 0.548 0.548 #VAL #VA UE! LUE! SALL2 22001390 14q11.2 intronic rs11157018 4.33E-06 0.445 0.473 0.473 0.318 0.445 SALL2 22001177 14q11.2 intronic rs7147978 5.31E-06 0.441 0.479 0.479 0.389 0.444 SALL2 22001947 14q11.2 intronic rs2293702 5.32E-06 0.459 0.476 0.476 0.382 0.443 SALL2 22001733 14q11.2 intronic . 5.49E-06 0.456 0.484 0.484 0.425 0.444 SALL2 22001514 14q11.2 intronic rs11157020 7.98E-06 0.437 0.481 0.481 0.394 0.449 SALL2 22001462 14q11.2 intronic rs11157019 8.11E-06 0.434 0.481 0.481 0.394 0.449

171 3. Phonology 3.1 Multisyllabic Word Repetition Gene Start cytoBand Function rsID PVAL Beta SE MAF KGP KGP eur . 14585686 . . . 8.57E-06 -0.629 0.141 0.816 LINC01108 14593748 6p23 intergenic rs2327825 2.85E-06 -0.685 0.146 0.835 0.821 0.83 ,JARID2 14593752 6p23 intergenic rs2327826 2.85E-06 -0.685 0.146 0.835 0.821 0.83 14593914 6p23 intergenic rs6901145 3.40E-06 -0.678 0.146 0.834 0.82 0.828 14594143 6p23 intergenic rs6922247 3.42E-06 -0.678 0.146 0.834 0.82 0.828 14594799 6p23 intergenic rs9370767 3.49E-06 -0.678 0.146 0.834 0.82 0.828 14596365 6p23 intergenic rs6914079 3.75E-06 -0.675 0.146 0.833 0.819 0.828 14596624 6p23 intergenic . 3.76E-06 -0.675 0.146 0.833 0.82 0.828 14597215 6p23 intergenic rs1891279 3.81E-06 -0.675 0.146 0.833 0.821 0.828 14597170 6p23 intergenic rs1891280 3.81E-06 -0.675 0.146 0.833 0.821 0.828 14597288 6p23 intergenic rs1891278 3.82E-06 -0.675 0.146 0.833 0.821 0.828 14588105 6p23 intergenic rs10949251 6.24E-06 -0.641 0.142 0.818 0.818 0.823 14593327 6p23 intergenic rs7752432 7.24E-06 -0.653 0.146 0.833 0.782 0.828 14584396 6p23 intergenic rs6459366 7.60E-06 -0.633 0.141 0.816 0.811 0.824 14584328 6p23 intergenic rs4626426 7.60E-06 -0.633 0.141 0.816 0.812 0.824 14585441 6p23 intergenic rs8180624 8.55E-06 -0.629 0.141 0.816 0.811 0.823 14586821 6p23 intergenic rs7768617 8.57E-06 -0.629 0.141 0.816 0.809 0.823 14581298 6p23 intergenic rs9370762 8.58E-06 -0.629 0.141 0.816 0.811 0.824 14580347 6p23 intergenic rs6459364 8.58E-06 -0.629 0.141 0.816 0.811 0.824 14581116 6p23 intergenic rs7765198 8.58E-06 -0.629 0.141 0.816 0.811 0.824 14579904 6p23 intergenic rs6927825 8.58E-06 -0.629 0.141 0.816 0.811 0.824 14581709 6p23 intergenic rs9464700 8.58E-06 -0.629 0.141 0.816 0.811 0.824 14582693 6p23 intergenic rs9296942 8.59E-06 -0.629 0.141 0.816 0.811 0.824 14582504 6p23 intergenic rs9296941 8.59E-06 -0.629 0.141 0.816 0.811 0.824

172 14582770 6p23 intergenic rs9296943 8.59E-06 -0.629 0.141 0.816 0.811 0.824 14582828 6p23 intergenic rs9296944 8.59E-06 -0.629 0.141 0.816 0.811 0.824 14583679 6p23 intergenic rs1418179 8.60E-06 -0.629 0.141 0.816 0.811 0.824 14584110 6p23 intergenic rs4317412 8.60E-06 -0.629 0.141 0.816 0.811 0.824 14583764 6p23 intergenic rs1418178 8.61E-06 -0.629 0.141 0.816 0.81 0.824 14579335 6p23 intergenic rs9476590 8.65E-06 -0.628 0.141 0.816 0.811 0.824 14591024 6p23 intergenic rs12200652 8.65E-06 -0.629 0.141 0.816 0.813 0.822 14583876 6p23 intergenic rs1418177 8.68E-06 -0.629 0.141 0.816 0.812 0.824 14584041 6p23 intergenic rs1418176 8.76E-06 -0.629 0.141 0.816 0.813 0.824 14582870 6p23 intergenic rs9296945 8.78E-06 -0.629 0.141 0.816 0.812 0.824 14585453 6p23 intergenic rs8180625 8.81E-06 -0.629 0.141 0.816 0.811 0.823 14593843 6p23 intergenic rs2876367 9.09E-06 -0.645 0.145 0.828 0.82 0.828 14584351 6p23 intergenic rs6459365 9.32E-06 -0.627 0.142 0.817 0.811 0.824 LRRC30,P 7375512 18p11.23 intergenic rs57366185 2.55E-06 -0.779 0.166 0.148 0.321 0.165 TPRM POLR1D,G 28333624 13q12.2 intergenic rs1231023 3.50E-06 0.498 0.107 0.391 0.418 0.42 SX1 28333369 13q12.2 intergenic rs1231021 3.55E-06 0.497 0.107 0.391 0.415 0.42 28334138 13q12.2 intergenic rs1231027 3.56E-06 0.501 0.108 0.39 0.417 0.42 28338865 13q12.2 intergenic rs13378945 3.67E-06 0.507 0.11 0.386 0.363 0.418

3.2 Nonsense Word Repetition Gene Start cytoBand Function rsID PVAL Beta SE MAF KGP KGP_ eur ANXA11 81924830 10q22.3 intronic rs12763624 4.44E-06 0.436 0.095 0.128 0.238 0.238 ANXA11 81936547 10q22.3 intronic rs11591611 4.97E-06 0.436 0.095 0.149 0.238 0.238 ANXA11 81949685 10q22.3 intronic rs9645553 5.10E-06 0.434 0.095 0.149 0.239 0.239 ANXA11 81926339 10q22.3 intronic rs11201950 5.63E-06 0.432 0.095 0.149 0.238 0.238 ANXA11 81914787 10q22.3 downstrea rs3748242 7.40E-06 0.426 0.095 0.136 0.247 0.247 m

173 DYDC1 82095276 10q23.1 downstrea rs7895042 5.66E-07 -0.457 0.091 0.869 0.697 0.697 m DYDC1 82101728 10q23.1 intronic rs7914156 1.13E-06 -0.423 0.087 0.849 0.667 0.667 DYDC1 82100428 10q23.1 intronic rs1417221 1.14E-06 -0.423 0.087 0.849 0.667 0.667 DYDC1 82103863 10q23.1 intronic rs10788576 1.15E-06 -0.422 0.087 0.849 0.667 0.667 DYDC1 82096997 10q23.1 intronic rs1340382 1.15E-06 -0.422 0.087 0.849 0.667 0.667 DYDC1 82095968 10q23.1 intronic rs1340383 1.16E-06 -0.422 0.087 0.85 0.667 0.667 DYDC1 82116501 10q23.1 UTR5 rs4934083 2.33E-06 -0.438 0.093 0.871 0.706 0.706 DYDC1,DY 82104890 10q23.1 intronic rs7082102 5.65E-07 -0.458 0.091 0.868 0.697 0.697 DC2 DYDC1,DY 82107029 10q23.1 intronic rs7087636 5.79E-07 -0.457 0.091 0.872 0.697 0.697 DC2 DYDC1,DY 82108253 10q23.1 intronic rs10749562 5.83E-07 -0.457 0.091 0.868 0.697 0.697 DC2 DYDC1,DY 82104818 10q23.1 intronic . 1.16E-06 -0.422 0.087 0.849 0.667 0.667 DC2 DYDC1,DY 82112488 10q23.1 intronic rs1572819 1.34E-06 -0.42 0.087 0.85 0.667 0.667 DC2 DYDC1,DY 82114098 10q23.1 intronic rs6585944 1.53E-06 -0.422 0.088 0.851 0.678 0.678 DC2 DYDC1,DY 82115619 10q23.1 intronic rs1340380 1.59E-06 -0.422 0.088 0.851 0.678 0.678 DC2 DYDC2 82116960 10q23.1 intronic rs4934084 2.39E-06 -0.438 0.093 0.874 0.707 0.707 82127180 10q23.1 UTR3 rs1972370 2.72E-06 -0.434 0.093 0.874 0.707 0.707 82125230 10q23.1 intronic rs1934695 2.74E-06 -0.434 0.093 0.871 0.707 0.707

82120887 10q23.1 intronic rs2185426 2.77E-06 -0.434 0.093 0.875 0.707 0.707 82119062 10q23.1 intronic rs6585947 2.80E-06 -0.434 0.093 0.871 0.707 0.707 82117435 10q23.1 intronic rs10788586 5.75E-06 -0.395 0.087 0.846 0.677 0.677 82118292 10q23.1 intronic rs6585946 6.12E-06 -0.393 0.087 0.845 0.677 0.677 82127956 10q23.1 downstrea rs2095994 6.33E-06 -0.392 0.087 0.845 0.677 0.677 m 82127551 10q23.1 UTR3 rs946892 6.33E-06 -0.392 0.087 0.845 0.677 0.677 82127111 10q23.1 UTR3 rs1972371 6.33E-06 -0.392 0.087 0.845 0.677 0.677

174 82126808 10q23.1 UTR3 rs1047952 6.34E-06 -0.392 0.087 0.845 0.677 0.677 82128395 10q23.1 downstrea . 6.35E-06 -0.393 0.087 0.845 0.677 0.677 m 82125734 10q23.1 intronic rs1934692 6.36E-06 -0.392 0.087 0.845 0.677 0.677 82125650 10q23.1 intronic rs1934693 6.36E-06 -0.392 0.087 0.845 0.677 0.677 82125325 10q23.1 intronic rs1934694 6.37E-06 -0.392 0.087 0.845 0.677 0.677 82124966 10q23.1 intronic rs1934696 6.37E-06 -0.392 0.087 0.845 0.677 0.677 82124155 10q23.1 intronic rs7089003 6.38E-06 -0.392 0.087 0.845 0.677 0.677 82124070 10q23.1 intronic rs7077854 6.39E-06 -0.392 0.087 0.845 0.677 0.677 82123856 10q23.1 intronic rs7098045 6.39E-06 -0.392 0.087 0.845 0.677 0.677 82123563 10q23.1 intronic rs2050824 6.39E-06 -0.392 0.087 0.845 0.677 0.677 82123257 10q23.1 intronic rs1970434 6.39E-06 -0.392 0.087 0.845 0.677 0.677 82123340 10q23.1 intronic rs1970433 6.39E-06 -0.392 0.087 0.845 0.677 0.677 82122523 10q23.1 intronic rs1538819 6.40E-06 -0.392 0.087 0.845 0.677 0.677 82121455 10q23.1 intronic rs1340379 6.41E-06 -0.392 0.087 0.845 0.677 0.677 82119583 10q23.1 intronic rs4601708 6.43E-06 -0.392 0.087 0.845 0.677 0.677 82119242 10q23.1 intronic rs4536152 6.46E-06 -0.392 0.087 0.846 0.679 0.679 82119271 10q23.1 intronic rs4265545 6.47E-06 -0.392 0.087 0.846 0.679 0.679 82119276 10q23.1 intronic rs4468295 6.47E-06 -0.392 0.087 0.846 0.679 0.679 DYDC2, 82139941 10q23.1 intergenic rs2185425 2.67E-06 -0.436 0.093 0.872 0.706 0.706 FAM213A 82136856 10q23.1 intergenic rs10749574 2.68E-06 -0.435 0.093 0.862 0.707 0.707 82136333 10q23.1 intergenic rs17678947 2.68E-06 -0.435 0.093 0.862 0.707 0.707 82135831 10q23.1 intergenic rs7906736 2.68E-06 -0.435 0.093 0.862 0.707 0.707 82139115 10q23.1 intergenic rs10749578 2.68E-06 -0.435 0.093 0.872 0.706 0.706 82139732 10q23.1 intergenic rs2153455 2.69E-06 -0.436 0.093 0.872 0.706 0.706 82137600 10q23.1 intergenic rs1572816 2.69E-06 -0.435 0.093 0.874 0.707 0.707 82131453 10q23.1 intergenic rs11202699 2.71E-06 -0.435 0.093 0.871 0.707 0.707 82130005 10q23.1 intergenic rs10736342 2.71E-06 -0.435 0.093 0.874 0.707 0.707

175 82147466 10q23.1 intergenic rs4934125 2.81E-06 -0.436 0.093 0.871 0.706 0.706 82144758 10q23.1 intergenic rs376395105 3.36E-06 -0.435 0.094 0.851 0.692 0.692 82144780 10q23.1 intergenic . 3.36E-06 -0.435 0.094 0.851 0.692 0.692 82144484 10q23.1 intergenic rs7079660 4.48E-06 -0.445 0.097 0.874 0.725 0.725 82144203 10q23.1 intergenic rs7913061 6.17E-06 -0.395 0.087 0.837 0.674 0.674 82135737 10q23.1 intergenic rs35353819 6.27E-06 -0.392 0.087 0.844 0.677 0.677 82132270 10q23.1 intergenic rs1538817 6.28E-06 -0.392 0.087 0.84 0.677 0.677 82129114 10q23.1 intergenic rs1340374 6.30E-06 -0.392 0.087 0.845 0.678 0.678 82137670 10q23.1 intergenic rs1538815 6.31E-06 -0.392 0.087 0.843 0.677 0.677 82136839 10q23.1 intergenic rs10749573 6.31E-06 -0.392 0.087 0.844 0.677 0.677 82135667 10q23.1 intergenic rs11202720 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82135762 10q23.1 intergenic rs11202721 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82136190 10q23.1 intergenic rs11202724 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82136276 10q23.1 intergenic rs11202725 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82135457 10q23.1 intergenic rs55731004 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82131783 10q23.1 intergenic rs4357632 6.32E-06 -0.392 0.087 0.845 0.677 0.677 82132585 10q23.1 intergenic . 6.32E-06 -0.392 0.087 0.845 0.677 0.677 82131612 10q23.1 intergenic rs10887797 6.32E-06 -0.392 0.087 0.845 0.677 0.677 82135220 10q23.1 intergenic rs9804223 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82134465 10q23.1 intergenic rs10736347 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82134667 10q23.1 intergenic rs9804233 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82134894 10q23.1 intergenic rs7077712 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82134140 10q23.1 intergenic rs10736345 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82134281 10q23.1 intergenic rs10736346 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82135100 10q23.1 intergenic rs9804236 6.32E-06 -0.392 0.087 0.844 0.677 0.677 82133955 10q23.1 intergenic rs10736344 6.32E-06 -0.392 0.087 0.845 0.677 0.677 82129892 10q23.1 intergenic rs10749569 6.32E-06 -0.392 0.087 0.845 0.677 0.677

176 82133380 10q23.1 intergenic rs7919380 6.32E-06 -0.392 0.087 0.845 0.677 0.677 82133304 10q23.1 intergenic rs7897994 6.32E-06 -0.392 0.087 0.845 0.677 0.677 82128963 10q23.1 intergenic rs1340376 6.33E-06 -0.392 0.087 0.845 0.677 0.677 82129039 10q23.1 intergenic rs1340375 6.33E-06 -0.392 0.087 0.845 0.677 0.677 82129250 10q23.1 intergenic rs1340373 6.33E-06 -0.392 0.087 0.845 0.677 0.677 82129268 10q23.1 intergenic rs7903701 6.33E-06 -0.392 0.087 0.845 0.677 0.677 82139244 10q23.1 intergenic rs10736348 6.33E-06 -0.393 0.087 0.839 0.676 0.676 82137367 10q23.1 intergenic rs10887808 6.33E-06 -0.392 0.087 0.845 0.679 0.679 82139791 10q23.1 intergenic rs6585967 6.35E-06 -0.393 0.087 0.839 0.676 0.676 82139807 10q23.1 intergenic rs6585968 6.35E-06 -0.393 0.087 0.839 0.676 0.676 82130436 10q23.1 intergenic . 6.36E-06 -0.392 0.087 0.842 0.676 0.676 82140959 10q23.1 intergenic rs10749579 6.39E-06 -0.393 0.087 0.839 0.676 0.676 82141336 10q23.1 intergenic rs10736349 6.41E-06 -0.393 0.087 0.839 0.676 0.676 82141651 10q23.1 intergenic rs10749581 6.43E-06 -0.393 0.087 0.839 0.676 0.676 82142047 10q23.1 intergenic rs7090351 6.44E-06 -0.393 0.087 0.839 0.676 0.676 82148325 10q23.1 intergenic rs10736356 6.56E-06 -0.393 0.087 0.837 0.672 0.672 82145057 10q23.1 intergenic rs6585971 6.63E-06 -0.393 0.087 0.838 0.676 0.676 82145647 10q23.1 intergenic rs1953971 6.65E-06 -0.393 0.087 0.839 0.676 0.676 82145666 10q23.1 intergenic rs1953972 6.65E-06 -0.393 0.087 0.839 0.676 0.676 82147057 10q23.1 intergenic rs4933378 6.72E-06 -0.393 0.087 0.839 0.676 0.676 82147877 10q23.1 intergenic rs10887825 6.75E-06 -0.393 0.087 0.839 0.676 0.676 82147802 10q23.1 intergenic rs11202751 6.75E-06 -0.393 0.087 0.839 0.676 0.676 82145042 10q23.1 intergenic rs61859213 6.79E-06 -0.393 0.087 0.837 0.674 0.674 82145044 10q23.1 intergenic rs370435202 6.79E-06 -0.393 0.087 0.837 0.674 0.674 82145046 10q23.1 intergenic rs374683514 6.79E-06 -0.393 0.087 0.837 0.674 0.674 82145047 10q23.1 intergenic . 6.80E-06 -0.393 0.087 0.837 0.674 0.674 82145048 10q23.1 intergenic . 6.80E-06 -0.393 0.087 0.837 0.674 0.674

177 82131613 10q23.1 intergenic rs10887798 6.92E-06 -0.392 0.087 0.837 0.675 0.675 82162768 10q23.1 intergenic rs7078075 7.85E-06 -0.419 0.094 0.869 0.705 0.705 82162509 10q23.1 intergenic rs7081903 7.92E-06 -0.419 0.094 0.87 0.705 0.705 82161380 10q23.1 intergenic rs10887838 8.18E-06 -0.418 0.094 0.869 0.705 0.705 82146785 10q23.1 intergenic rs4933377 8.51E-06 -0.389 0.087 0.834 0.676 0.676 82157446 10q23.1 intergenic rs1815651 9.22E-06 -0.417 0.094 0.868 0.705 0.705 FAM213A 82177023 10q23.1 intronic rs4934139 7.47E-06 -0.42 0.094 0.872 0.706 0.706 82172975 10q23.1 intronic rs10749595 9.36E-06 -0.415 0.094 0.872 0.705 0.705 LINC00857, 82016530 10q23.1 intergenic rs10788541 3.30E-06 -0.403 0.087 0.795 0.665 0.665 MAT1A 82027088 10q23.1 intergenic rs7923366 3.55E-06 -0.4 0.086 0.837 0.667 0.667 82015725 10q23.1 intergenic rs3862532 3.65E-06 -0.4 0.086 0.799 0.663 0.663 82025682 10q23.1 intergenic rs3120976 5.55E-06 -0.392 0.086 0.796 0.667 0.667 82023524 10q23.1 intergenic rs946893 5.57E-06 -0.392 0.086 0.793 0.666 0.666 MAT1A 82034842 10q23.1 exonic rs10887711 8.19E-07 -0.422 0.086 0.84 0.641 0.641 82034854 10q23.1 exonic rs10788546 8.21E-07 -0.422 0.086 0.841 0.641 0.641 82045719 10q23.1 intronic rs1890578 1.78E-06 -0.417 0.087 0.849 0.667 0.667 82046302 10q23.1 intronic rs2342813 2.17E-06 -0.415 0.088 0.815 0.666 0.666 82035150 10q23.1 intronic rs9285726 2.25E-06 -0.408 0.086 0.864 0.669 0.669 82040052 10q23.1 exonic rs1143694 2.25E-06 -0.408 0.086 0.863 0.669 0.669 82040346 10q23.1 intronic rs2282367 2.25E-06 -0.408 0.086 0.863 0.669 0.669 82041409 10q23.1 intronic rs1417219 2.25E-06 -0.408 0.086 0.863 0.669 0.669 82041751 10q23.1 intronic rs756208 2.25E-06 -0.408 0.086 0.851 0.669 0.669 82043576 10q23.1 intronic rs2236569 2.25E-06 -0.408 0.086 0.857 0.669 0.669 82034675 10q23.1 intronic rs10788545 2.25E-06 -0.408 0.086 0.864 0.669 0.669 82031197 10q23.1 downstrea rs10749550 2.25E-06 -0.408 0.086 0.864 0.669 0.669 m 82037215 10q23.1 intronic rs873395 2.30E-06 -0.408 0.086 0.867 0.669 0.669 82043426 10q23.1 intronic rs9421464 2.39E-06 -0.407 0.086 0.817 0.668 0.668

178 82049251 10q23.1 UTR5 . 3.44E-06 -0.397 0.086 0.77 0.584 0.584 82039513 10q23.1 intronic . 3.47E-06 -0.4 0.086 0.862 0.669 0.669 82031160 10q23.1 downstrea rs9420349 5.54E-06 -0.392 0.086 0.77 0.668 0.668 m MAT1A, 82075324 10q23.1 intergenic rs3120977 4.06E-07 -0.461 0.091 0.872 0.696 0.696 DYDC1 82068404 10q23.1 intergenic rs9663163 4.06E-07 -0.461 0.091 0.868 0.696 0.696 82070014 10q23.1 intergenic rs7072845 4.06E-07 -0.461 0.091 0.868 0.696 0.696 82064449 10q23.1 intergenic rs2486303 4.07E-07 -0.461 0.091 0.868 0.696 0.696 82066265 10q23.1 intergenic rs140867380 4.07E-07 -0.461 0.091 0.868 0.696 0.696 82071263 10q23.1 intergenic rs7089058 4.07E-07 -0.461 0.091 0.868 0.696 0.696 82064015 10q23.1 intergenic rs2486304 4.07E-07 -0.461 0.091 0.868 0.696 0.696 82061206 10q23.1 intergenic rs10736341 4.18E-07 -0.461 0.091 0.868 0.696 0.696 82061071 10q23.1 intergenic rs10736340 4.25E-07 -0.461 0.091 0.868 0.696 0.696 82060484 10q23.1 intergenic rs1856773 4.39E-07 -0.461 0.091 0.868 0.696 0.696 82059970 10q23.1 intergenic rs10736339 4.76E-07 -0.459 0.091 0.871 0.695 0.695 82064104 10q23.1 intergenic rs2487074 5.66E-07 -0.46 0.092 0.868 0.696 0.696 82090339 10q23.1 intergenic rs1361458 5.70E-07 -0.457 0.091 0.872 0.696 0.696 82077032 10q23.1 intergenic rs10400045 8.31E-07 -0.427 0.087 0.849 0.666 0.666 82076600 10q23.1 intergenic rs2486302 8.32E-07 -0.427 0.087 0.849 0.666 0.666 82072310 10q23.1 intergenic rs10788562 8.36E-07 -0.427 0.087 0.847 0.667 0.667 82073248 10q23.1 intergenic rs9420370 8.43E-07 -0.426 0.087 0.845 0.666 0.666 82069763 10q23.1 intergenic rs10887739 8.47E-07 -0.426 0.087 0.845 0.666 0.666 82067439 10q23.1 intergenic rs7070027 8.53E-07 -0.426 0.087 0.845 0.666 0.666 82074395 10q23.1 intergenic rs2994390 8.54E-07 -0.428 0.087 0.805 0.665 0.665 82065217 10q23.1 intergenic rs9703804 8.59E-07 -0.426 0.087 0.845 0.666 0.666 82078868 10q23.1 intergenic rs2039757 8.65E-07 -0.426 0.087 0.849 0.666 0.666 82062531 10q23.1 intergenic rs1290317 8.69E-07 -0.426 0.087 0.845 0.666 0.666 82062257 10q23.1 intergenic rs2994389 8.69E-07 -0.426 0.087 0.845 0.666 0.666

179 82062121 10q23.1 intergenic rs1934146 8.70E-07 -0.426 0.087 0.845 0.666 0.666 82086532 10q23.1 intergenic rs2342815 1.05E-06 -0.424 0.087 0.849 0.666 0.666 82086546 10q23.1 intergenic rs2342816 1.05E-06 -0.424 0.087 0.849 0.666 0.666 82086779 10q23.1 intergenic rs7894762 1.06E-06 -0.424 0.087 0.849 0.666 0.666 82090643 10q23.1 intergenic rs1890890 1.09E-06 -0.423 0.087 0.849 0.666 0.666 82088040 10q23.1 intergenic rs1538823 1.09E-06 -0.423 0.087 0.849 0.666 0.666 82088702 10q23.1 intergenic rs1340384 1.12E-06 -0.422 0.087 0.849 0.666 0.666 82092801 10q23.1 intergenic rs10749560 1.17E-06 -0.422 0.087 0.849 0.666 0.666 82091950 10q23.1 intergenic rs7899791 1.17E-06 -0.422 0.087 0.849 0.666 0.666 82090827 10q23.1 intergenic rs1890889 1.17E-06 -0.422 0.087 0.849 0.666 0.666 82058851 10q23.1 intergenic rs1340385 1.74E-06 -0.422 0.088 0.862 0.667 0.667 82054001 10q23.1 intergenic rs10749551 2.06E-06 -0.416 0.088 0.865 0.667 0.667 82056667 10q23.1 intergenic rs2185427 2.07E-06 -0.416 0.088 0.85 0.667 0.667 82054873 10q23.1 intergenic rs9420359 2.07E-06 -0.416 0.088 0.849 0.667 0.667 82057334 10q23.1 intergenic rs4934038 2.19E-06 -0.416 0.088 0.862 0.667 0.667 POLR1D, 28338865 13q12.2 intergenic rs13378945 2.64E-07 0.453 0.088 0.363 0.418 0.418 GSX1 28334138 13q12.2 intergenic rs1231027 3.26E-07 0.443 0.087 0.417 0.42 0.42 28333624 13q12.2 intergenic rs1231023 4.42E-07 0.435 0.086 0.418 0.42 0.42 28333369 13q12.2 intergenic rs1231021 4.51E-07 0.434 0.086 0.415 0.42 0.42 28343508 13q12.2 intergenic rs1231040 7.35E-07 0.439 0.089 0.511 0.422 0.422 28341662 13q12.2 intergenic rs1231035 7.38E-07 0.44 0.089 0.501 0.42 0.42 28342150 13q12.2 intergenic rs1231036 9.40E-07 0.436 0.089 0.5 0.418 0.418 28330022 13q12.2 intergenic rs1231012 1.10E-06 0.42 0.086 0.416 0.418 0.418 28342614 13q12.2 intergenic rs1231037 1.13E-06 0.433 0.089 0.498 0.419 0.419 28334065 13q12.2 intergenic . 2.60E-06 0.461 0.098 0.274 0.28 0.28 28335399 13q12.2 intergenic rs1231028 2.69E-06 0.466 0.099 0.225 0.276 0.276 28334425 13q12.2 intergenic . 3.60E-06 0.455 0.098 0.291 0.28 0.28

180 28334107 13q12.2 intergenic rs1231026 3.68E-06 0.454 0.098 0.287 0.279 0.279 28334004 13q12.2 intergenic rs1231025 3.77E-06 0.453 0.098 0.294 0.281 0.281 28333467 13q12.2 intergenic rs1231022 3.92E-06 0.451 0.098 0.29 0.28 0.28 28333263 13q12.2 intergenic rs1231020 3.98E-06 0.45 0.098 0.29 0.28 0.28 28331713 13q12.2 intergenic rs1231019 5.13E-06 0.443 0.097 0.274 0.28 0.28 28336809 13q12.2 intergenic . 7.86E-06 0.49 0.11 0.239 0.234 0.234 28331565 13q12.2 intergenic rs1075083 7.97E-06 0.434 0.097 0.29 0.28 0.28 28329109 13q12.2 intergenic rs1231010 8.01E-06 0.436 0.098 0.264 0.28 0.28 28330097 13q12.2 intergenic rs1231013 8.42E-06 0.434 0.098 0.274 0.28 0.28 28329916 13q12.2 intergenic rs1231011 8.52E-06 0.434 0.098 0.274 0.281 0.281 28336590 13q12.2 intergenic rs1231030 8.77E-06 0.438 0.099 0.281 0.278 0.278

4. Reading Word Attack Gene Start cytoBand Function rsID pval beta SE maf KGP KGP CEU ARHGAP23 36592405 17q12 intronic rs12949691 2.48E-07 -7.551 1.464 0.336 0.258 0.272 ARHGAP23 36605490 17q12 intronic rs7503524 2.81E-06 -6.773 1.446 0.34 0.23 0.294 36594109 17q12 intronic rs11078990 3.49E-06 -6.648 1.433 0.365 0.275 0.301 36596161 17q12 intronic rs7211127 4.10E-06 -6.615 1.436 0.353 0.253 0.282 36597782 17q12 intronic rs3885327 4.94E-06 -6.4 1.401 0.388 0.247 0.311 36594900 17q12 intronic . 5.51E-06 -6.485 1.427 0.357 0.26 0.288 BCL11B,SE 99860890 14q32.2 intergenic rs11624392 2.99E-07 8.37 1.634 0.33 0.421 0.353 TD3 99860811 14q32.2 intergenic rs1257266 2.85E-06 7.307 1.561 0.37 0.54 0.382 99861852 14q32.2 intergenic rs749444 3.50E-06 7.245 1.562 0.37 0.448 0.381 99858970 14q32.2 intergenic rs1257267 3.75E-06 7.269 1.572 0.373 0.561 0.388 99860134 14q32.2 intergenic rs9635296 3.79E-06 7.259 1.570 0.373 0.454 0.388

181 99861645 14q32.2 intergenic rs3994980 3.90E-06 7.228 1.566 0.37 0.436 0.381 99860678 14q32.2 intergenic . 4.25E-06 7.19 1.564 0.369 0.456 0.381 99860675 14q32.2 intergenic rs369666265 4.25E-06 7.19 1.563 0.369 0.458 0.382 99860724 14q32.2 intergenic rs34042297 4.26E-06 7.189 1.563 0.369 0.45 0.382 99860858 14q32.2 intergenic rs11624368 4.28E-06 7.186 1.563 0.369 0.45 0.382 99861619 14q32.2 intergenic rs12100924 4.28E-06 7.185 1.563 0.37 0.427 0.381 99860911 14q32.2 intergenic rs11629067 4.28E-06 7.185 1.563 0.369 0.45 0.382 99861519 14q32.2 intergenic rs12879956 4.29E-06 7.185 1.563 0.37 0.434 0.381 99861588 14q32.2 intergenic rs12100920 4.29E-06 7.185 1.563 0.37 0.436 0.381 99860658 14q32.2 intergenic . 8.44E-06 1.249 1.706 0.478 0.463 0.485 CCNK 99957703 14q32.2 intronic rs2400677 9.31E-07 -7.294 1.487 0.533 0.416 0.552 99949522 14q32.2 intronic rs8015304 9.34E-07 -7.292 1.487 0.533 0.423 0.553 99964503 14q32.2 intronic rs7158915 2.90E-06 -6.969 1.490 0.551 0.458 0.572 99968377 14q32.2 intronic rs2069496 3.25E-06 6.931 1.489 0.448 0.525 0.428 99971294 14q32.2 intronic rs3918094 3.30E-06 -6.938 1.492 0.552 0.46 0.572 99963515 14q32.2 intronic rs3918074 3.36E-06 6.919 1.489 0.448 0.457 0.428 99954084 14q32.2 intronic rs3918051 3.39E-06 6.915 1.488 0.448 0.457 0.427 99948585 14q32.2 intronic rs1950598 3.39E-06 6.914 1.488 0.448 0.456 0.427 99948357 14q32.2 intronic rs3918038 3.40E-06 -6.913 1.488 0.552 0.459 0.573 99957927 14q32.2 intronic rs2400678 3.40E-06 -6.914 1.488 0.552 0.459 0.574 99961242 14q32.2 intronic rs3783319 3.40E-06 -6.913 1.488 0.552 0.462 0.572 99950336 14q32.2 intronic rs10145006 3.40E-06 -6.912 1.488 0.552 0.459 0.573 99961830 14q32.2 intronic rs2069492 3.42E-06 -6.909 1.488 0.552 0.447 0.572 FST,NDUFS 52823445 5q11.2 intergenic . 3.80E-06 7.566 1.637 0.405 0.459 0.334 4 IL1F10,IL1 1.14E+08 2q13 intergenic rs11123160 7.20E-06 7.451 1.660 0.764 0.909 0.764 RN IQCE 2651168 7p22.3 UTR3 rs3823604 4.53E-06 -6.56 1.431 0.673 0.642 0.709 2645201 7p22.3 intronic rs4721874 5.50E-06 -6.527 1.436 0.675 0.715 0.711

182 2643607 7p22.3 intronic rs4719601 5.61E-06 -6.518 1.436 0.675 0.716 0.711 2644327 7p22.3 intronic rs6961606 5.63E-06 -6.518 1.436 0.675 0.719 0.712 2644846 7p22.3 intronic rs2293406 5.65E-06 -6.518 1.436 0.675 0.702 0.711 2642187 7p22.3 intronic rs4719597 5.73E-06 -6.515 1.436 0.675 0.74 0.714 2640985 7p22.3 intronic rs4506100 5.97E-06 -6.514 1.439 0.675 0.718 0.711 2649264 7p22.3 intronic rs6950588 6.37E-06 -6.465 1.432 0.671 0.721 0.711 NR3C2,LO 1.5E+08 4q31.23 intergenic rs61319730 6.98E-06 -7.524 1.674 0.222 0.123 0.202 C101927849 SETD3 99876505 14q32.2 UTR3 rs1047351 3.80E-07 7.54 1.485 0.467 0.557 0.448 99900521 14q32.2 intronic . 8.93E-07 7.33 1.492 0.467 0.581 0.449 99904620 14q32.2 intronic rs6575719 8.96E-07 7.311 1.488 0.467 0.58 0.449 99936321 14q32.2 intronic rs8011858 9.20E-07 -7.299 1.487 0.533 0.42 0.551 99936258 14q32.2 intronic rs8011670 9.22E-07 -7.299 1.487 0.533 0.425 0.551 99938024 14q32.2 intronic rs8003260 9.26E-07 -7.296 1.487 0.533 0.42 0.553 99928924 14q32.2 intronic rs10148889 1.03E-06 -7.237 1.481 0.523 0.349 0.543 99942692 14q32.2 intronic rs10136777 1.09E-06 -7.204 1.478 0.524 0.349 0.546 99876216 14q32.2 UTR3 rs2943 1.67E-06 7.372 1.539 0.362 0.475 0.362 99908808 14q32.2 intronic rs6575720 1.78E-06 7.103 1.487 0.446 0.452 0.427 99877643 14q32.2 intronic . 1.83E-06 7.104 1.489 0.45 0.509 0.429 99885632 14q32.2 intronic rs7150963 1.85E-06 7.11 1.491 0.45 0.513 0.429 99880878 14q32.2 intronic rs2400665 1.93E-06 7.099 1.491 0.449 0.43 0.426 99879812 14q32.2 intronic rs1270108 1.93E-06 7.11 1.493 0.448 0.511 0.427 99941212 14q32.2 intronic . 2.06E-06 -7.108 1.497 0.542 0.472 0.548 99887102 14q32.2 intronic rs12878462 2.11E-06 7.072 1.491 0.45 0.429 0.428 99874257 14q32.2 intronic rs1257263 2.24E-06 7.051 1.491 0.45 0.511 0.429 99874119 14q32.2 intronic rs8008307 2.35E-06 7.038 1.491 0.451 0.429 0.429 99873471 14q32.2 intronic rs2180265 2.63E-06 7.007 1.491 0.451 0.43 0.429 99867527 14q32.2 intronic rs11624183 3.10E-06 6.958 1.492 0.451 0.43 0.428

183 99895333 14q32.2 intronic rs12433874 3.10E-06 6.965 1.493 0.45 0.537 0.429 99889161 14q32.2 intronic rs56055510 3.12E-06 6.953 1.491 0.451 0.452 0.429 99868661 14q32.2 intronic rs8006163 3.13E-06 6.953 1.491 0.451 0.431 0.429 99865672 14q32.2 intronic rs1257265 3.13E-06 6.953 1.491 0.451 0.512 0.428 99898406 14q32.2 intronic rs11627984 3.14E-06 6.959 1.493 0.45 0.537 0.429 99871812 14q32.2 intronic rs1262304 3.14E-06 6.951 1.491 0.451 0.515 0.429 99889388 14q32.2 intronic rs8015827 3.15E-06 6.95 1.491 0.451 0.453 0.429 99946577 14q32.2 intronic rs3918024 3.16E-06 6.96 1.494 0.444 0.452 0.419 99892365 14q32.2 intronic rs941731 3.24E-06 6.965 1.496 0.45 0.528 0.429 99901124 14q32.2 intronic rs10873503 3.24E-06 6.942 1.491 0.449 0.538 0.429 99892618 14q32.2 intronic rs12590989 3.25E-06 6.962 1.496 0.45 0.444 0.429 99901565 14q32.2 intronic . 3.27E-06 6.938 1.491 0.449 0.538 0.429 99922538 14q32.2 intronic rs8022919 3.32E-06 6.926 1.489 0.448 0.453 0.429 99935261 14q32.2 intronic rs11621155 3.35E-06 6.922 1.489 0.448 0.452 0.429 99934979 14q32.2 intronic . 3.36E-06 6.922 1.489 0.448 0.453 0.429 99934161 14q32.2 intronic rs1884526 3.36E-06 -6.92 1.489 0.552 0.462 0.571 99933431 14q32.2 intronic . 3.36E-06 -6.921 1.489 0.552 0.462 0.571 99929801 14q32.2 intronic rs2273386 3.36E-06 6.923 1.490 0.448 0.453 0.429 99933825 14q32.2 intronic rs3809412 3.37E-06 6.922 1.489 0.448 0.452 0.429 99938340 14q32.2 intronic rs8023115 3.37E-06 6.919 1.489 0.448 0.452 0.427 99924220 14q32.2 intronic rs941553 3.37E-06 6.92 1.489 0.448 0.452 0.429 99926499 14q32.2 intronic . 3.37E-06 6.922 1.490 0.448 0.452 0.429 99924036 14q32.2 intronic rs941554 3.38E-06 -6.918 1.489 0.552 0.462 0.571 99922406 14q32.2 intronic rs8003384 3.38E-06 6.918 1.489 0.448 0.452 0.429 99920745 14q32.2 intronic rs10145484 3.38E-06 -6.916 1.489 0.552 0.462 0.571 99938225 14q32.2 intronic rs941552 3.38E-06 6.918 1.489 0.448 0.453 0.427 99932341 14q32.2 intronic rs2144798 3.38E-06 6.921 1.490 0.448 0.453 0.429

184 99921905 14q32.2 intronic rs1884527 3.39E-06 6.917 1.489 0.448 0.453 0.429 99903973 14q32.2 intronic rs2400667 3.39E-06 6.915 1.489 0.448 0.537 0.429 99917574 14q32.2 intronic rs11848590 3.39E-06 6.917 1.489 0.448 0.454 0.429 99945392 14q32.2 intronic rs8013888 3.40E-06 6.914 1.488 0.448 0.454 0.427 99907896 14q32.2 intronic rs12894479 3.40E-06 6.916 1.489 0.448 0.537 0.429 99914891 14q32.2 intronic rs12882612 3.40E-06 6.917 1.489 0.448 0.454 0.429 99946512 14q32.2 intronic rs3918022 3.40E-06 6.913 1.488 0.448 0.453 0.426 99909296 14q32.2 intronic rs12884473 3.40E-06 6.917 1.489 0.448 0.537 0.429 99946823 14q32.2 intronic rs3918027 3.40E-06 -6.912 1.488 0.552 0.462 0.574 99931123 14q32.2 intronic rs2400669 3.40E-06 6.921 1.490 0.448 0.453 0.429 99907532 14q32.2 intronic rs11160519 3.40E-06 6.915 1.489 0.448 0.537 0.429 99913671 14q32.2 intronic rs12898128 3.40E-06 6.917 1.489 0.448 0.453 0.429 99906393 14q32.2 intronic rs7149149 3.40E-06 6.914 1.489 0.448 0.538 0.429 99905929 14q32.2 intronic rs7144486 3.41E-06 6.913 1.488 0.448 0.537 0.429 99910059 14q32.2 intronic rs2180841 3.41E-06 6.915 1.489 0.448 0.453 0.429 99906863 14q32.2 intronic rs1951142 3.42E-06 6.912 1.488 0.448 0.453 0.429 99904186 14q32.2 intronic rs12895433 3.42E-06 6.909 1.488 0.448 0.451 0.429 99911420 14q32.2 intronic rs7147843 3.50E-06 6.909 1.489 0.448 0.454 0.43

Word Identification Gene Start cytoBand Function rsID pval B se MAF KGP KGP eur ATP2C2 84486700 16q24.1 intronic rs193704 3.63E-07 2.466 0.4404 0.485 0.555 0.44 ATP2C2 84488930 16q24.1 intronic rs8064169 1.29E-06 -2.28 0.5606 0.471 0.428 0.561 ATP2C2 84484938 16q24.1 intronic rs247887 4.70E-06 2.122 0.3877 0.464 0.465 0.388 CACNB2 18540122 10p12.33 intronic rs58703423 8.06E-06 -1.97 0.3469 0.441 0.329 0.347

185 IQCE 2651168 7p22.3 UTR3 rs3823604 2.53E-06 -2.093 0.7087 0.445 0.642 0.709 IQCE 2645201 7p22.3 intronic rs4721874 3.12E-06 -2.082 0.7107 0.447 0.715 0.711 IQCE 2643607 7p22.3 intronic rs4719601 3.13E-06 -2.081 0.7107 0.446 0.716 0.711 IQCE 2644327 7p22.3 intronic rs6961606 3.13E-06 -2.081 0.7117 0.446 0.719 0.712 IQCE 2642187 7p22.3 intronic rs4719597 3.14E-06 -2.081 0.7137 0.447 0.74 0.714 IQCE 2644846 7p22.3 intronic rs2293406 3.15E-06 -2.081 0.7107 0.446 0.702 0.711 IQCE 2640985 7p22.3 intronic rs4506100 3.25E-06 -2.082 0.7107 0.447 0.718 0.711 IQCE 2649264 7p22.3 intronic rs6950588 5.06E-06 -2.032 0.7107 0.445 0.721 0.711 LOC1019 1.15E+08 6q22.1 intergenic rs36112731 9.88E-06 2.131 0.7266 0.482 0.723 0.727 27768,FR K

5. Spelling Test of Written Spelling Gene Start cytoBand Function rsID pval B se MAF KGP KGP eur AMZ2 66243894 17q24.2 upstream rs12936261 7.66E-06 0.501 0.112 0.342 0.399 0.326 ARHGAP20, 1.11E+08 11q23.1 intergenic rs11601022 8.89E-06 -0.89 0.2 0.075 0.064 0.079 C11orf53 ARSG 66256956 17q24.2 intronic rs7210814 9.72E-06 0.507 0.115 0.320 0.375 0.314 ARSG,SLC1 66265977 17q24.2 intronic rs12944412 6.86E-06 0.521 0.116 0.320 0.336 0.307 6A6 GPATCH2L, 76724299 14q24.3 intergenic rs56121047 5.78E-06 -0.689 0.152 0.138 0.212 0.149 ESRRB PROK2,LIN 71982696 3p13 intergenic rs58603523 8.73E-07 -0.68 0.138 0.161 0.31 0.181 C00877 QSOX1 1.8E+08 1q25.2 exonic rs12371 6.25E-06 0.816 0.181 0.107 0.027 0.078 SPAM1 1.24E+08 7q31.32 UTR3 . 5.15E-07 -0.775 0.154 0.137 0.136 0.144 1.24E+08 7q31.32 intronic rs28812505 6.73E-07 -0.803 0.162 0.125 0.139 0.132 1.24E+08 7q31.32 intronic rs4443586 6.75E-07 -0.802 0.162 0.125 0.121 0.132 1.24E+08 7q31.32 intronic rs12673406 6.75E-07 -0.802 0.162 0.125 0.12 0.132

186 1.24E+08 7q31.32 intronic rs73228007 6.75E-07 -0.802 0.162 0.125 0.12 0.132 1.24E+08 7q31.32 intronic rs10228077 6.75E-07 -0.802 0.162 0.125 0.12 0.132 1.24E+08 7q31.32 intronic rs56121632 6.75E-07 -0.802 0.162 0.125 0.12 0.132 1.24E+08 7q31.32 intronic rs73228008 6.75E-07 -0.802 0.162 0.125 0.12 0.132 1.24E+08 7q31.32 intronic rs73228009 6.75E-07 -0.802 0.162 0.125 0.12 0.132 1.24E+08 7q31.32 intronic rs76679685 3.22E-06 -0.782 0.168 0.110 0.099 0.112 1.24E+08 7q31.32 intronic rs76922510 3.22E-06 -0.782 0.168 0.110 0.099 0.112 1.24E+08 7q31.32 intronic rs114450684 5.07E-06 -0.77 0.169 0.108 0.08 0.112 1.24E+08 7q31.32 intronic rs76511367 5.07E-06 -0.77 0.169 0.108 0.08 0.112 1.24E+08 7q31.32 intronic rs75263544 5.65E-06 -0.787 0.173 0.103 0.068 0.113 SPAM1,TM 1.24E+08 7q31.32 intergenic rs17146532 5.14E-07 -0.775 0.154 0.137 0.166 0.146 EM229A 1.24E+08 7q31.32 intergenic rs12668266 1.00E-06 -0.751 0.153 0.138 0.122 0.134 1.24E+08 7q31.32 intergenic rs10260580 1.02E-06 -0.761 0.156 0.135 0.162 0.138 1.24E+08 7q31.32 intergenic rs7784681 1.20E-06 -0.759 0.156 0.133 0.134 0.135 1.24E+08 7q31.32 intergenic rs7784986 1.21E-06 -0.758 0.156 0.133 0.135 0.135 1.24E+08 7q31.32 intergenic rs6952422 1.27E-06 -0.756 0.156 0.133 0.135 0.135 1.24E+08 7q31.32 intergenic rs6945060 1.50E-06 -0.751 0.156 0.132 0.158 0.138 1.24E+08 7q31.32 intergenic rs11973429 1.56E-06 -0.749 0.156 0.132 0.134 0.135 1.24E+08 7q31.32 intergenic rs10242798 6.56E-06 -0.73 0.162 0.127 0.161 0.128 1.24E+08 7q31.32 intergenic rs12668298 8.80E-06 -0.747 0.168 0.109 0.068 0.111 TBC1D4 75990684 13q22.2 intronic rs9600465 3.35E-06 0.579 0.124 0.760 0.588 0.729 75987419 13q22.2 intronic rs12429477 4.47E-06 0.539 0.118 0.708 0.587 0.666 75986912 13q22.2 intronic rs7337411 4.47E-06 0.539 0.118 0.708 0.613 0.666 75987566 13q22.2 intronic rs7982953 4.47E-06 0.539 0.118 0.708 0.581 0.667

187

Appendix C Additional Materials for Chapter 5 Figures C1-C11 Are Manhattan plots for the adjusted models

188

189

190

191

192

193

194

195

196

197

198 Tables C1- C11 list the markers with p<=1x10-5 for each outcome trait Binary Outcome, speech Table C1. Top 20 loci for binary outcome after adjusting for LI and RD Chr Start rsID Function Gene maf beta se pval 10 101340425 rs10883379 intergenic NKX2-3 SLC25A28 0.545645 -0.111175 0.0266141 2.95E-05 6 106373965 rs4245527 intergenic PREP PRDM1 0.253997 0.149456 0.0303783 8.66E-07 14 35147425 rs8022575 intergenic SNX6 CFL2 0.538502 -0.12342 0.0257185 1.60E-06 3 56954521 rs9311631 intronic ARHGEF3 . 0.601519 -0.130495 0.0272355 1.66E-06 14 35124314 rs35268989 intergenic SNX6 CFL2 0.282609 -0.138475 0.0289222 1.69E-06 8 112027937 rs117978519 ncRNA_intronic LINC01608 . 0.103753 0.203231 0.043543 3.05E-06 3 78620108 rs6808203 intergenic ROBO2 ROBO1 0.558626 -0.120492 0.0259274 3.36E-06 3 78620000 rs7651772 intergenic ROBO2 ROBO1 0.558656 -0.120506 0.0259339 3.37E-06 3 78619515 rs7628964 intergenic ROBO2 ROBO1 0.487568 -0.120269 0.0259765 3.66E-06 3 78620359 rs9810679 intergenic ROBO2 ROBO1 0.428412 0.119639 0.02604 4.34E-06 3 78623446 rs6797185 intergenic ROBO2 ROBO1 0.428816 0.119202 0.0260357 4.69E-06 3 78623579 rs2055707 intergenic ROBO2 ROBO1 0.428871 0.11914 0.026034 4.73E-06 3 78627824 rs9876050 intergenic ROBO2 ROBO1 0.42905 0.119037 0.0260259 4.79E-06 3 78624117 rs9309807 intergenic ROBO2 ROBO1 0.428875 0.119059 0.0260342 4.80E-06 3 78626105 rs9865740 intergenic ROBO2 ROBO1 0.428588 0.119065 0.0260676 4.93E-06 3 78627833 rs9875774 intergenic ROBO2 ROBO1 0.428155 0.118912 0.0260875 5.16E-06 3 78633965 rs200914857 intergenic ROBO2 ROBO1 0.42765 0.118966 0.0261234 5.26E-06 3 78634464 rs150782457 intergenic ROBO2 ROBO1 0.427669 0.118924 0.0261159 5.27E-06 3 78651638 rs9816191 intronic ROBO1 . 0.426544 0.118055 0.0260255 5.73E-06 3 78637361 rs141319084 intergenic ROBO2 ROBO1 0.413961 0.119168 0.0264052 6.39E-06 3 78651501 rs9871559 intronic ROBO1 . 0.425882 0.117072 0.0260836 7.18E-06

199 ARTICULATION and MOTOR CONTROL

Table C2. Top loci for Fletcher Time by Count after adjusting for LI and RD. There was only one locus that was suggestive. Chr Start snp138 Function Gene maf beta se pval 22 39616878 rs116540704 intergenic CBX7 PDGFB 0.116091 -0.232576 0.0445826 1.82E-07 22 39605755 rs56310790 intergenic CBX7 PDGFB 0.116495 -0.219608 0.0458585 1.68E-06 1 242764961 rs10926785 intergenic PLD5 LINC01347 0.239118 0.153702 0.0328149 2.81E-06 1 242763192 rs4658834 intergenic PLD5 LINC01347 0.239725 0.153684 0.0328295 2.85E-06 19 44309292 rs365785 intronic LYPD5 . 0.262505 -0.143968 0.0310865 3.64E-06 1 242758744 rs10926781 intergenic PLD5 LINC01347 0.239573 0.151595 0.0329018 4.08E-06 6 135494590 rs9321489 intergenic HBS1L MYB 0.264639 -0.147524 0.0321107 4.34E-06 17 14840665 rs57680355 intergenic HS3ST3B1 CDRT7 0.280074 0.143562 0.03142 4.90E-06 1 242754326 rs10926778 intergenic PLD5 LINC01347 0.239718 0.149286 0.0329515 5.89E-06 1 242753464 rs12046498 intergenic PLD5 LINC01347 0.239388 0.148055 0.0329264 6.91E-06 17 14832800 rs11869727 intergenic HS3ST3B1 CDRT7 0.223735 0.155164 0.0345523 7.10E-06 17 14841227 rs1981651 intergenic HS3ST3B1 CDRT7 0.283138 0.139338 0.0312402 8.19E-06 17 14833481 rs16950649 intergenic HS3ST3B1 CDRT7 0.283123 0.139431 0.0312619 8.19E-06 13 25285768 rs2722 UTR3 ATP12A 0.124617 0.189197 0.0424219 8.20E-06 17 14840171 rs2215275 intergenic HS3ST3B1 CDRT7 0.283117 0.139338 0.0312432 8.20E-06 17 14843341 rs2215274 intergenic HS3ST3B1 CDRT7 0.283172 0.139298 0.0312345 8.21E-06 17 14835085 rs8073675 intergenic HS3ST3B1 CDRT7 0.283131 0.139318 0.0312395 8.21E-06 1 242760115 rs10926783 intergenic PLD5 LINC01347 0.210456 0.158472 0.0357507 9.31E-06 1 242759884 rs10926782 intergenic PLD5 LINC01347 0.230021 0.149645 0.0337973 9.52E-06 1 242765473 rs7527225 intergenic PLD5 LINC01347 0.228188 0.148891 0.0336806 9.84E-06 17 14826838 rs59932116 intergenic HS3ST3B1 CDRT7 0.22246 0.152448 0.0344985 9.92E-06

200 Table C3. Top 10 loci for Goldman Fristoe Test of Articulation after adjusting for LI and RD. There was only one locus that was suggestive. Chr Start rsID Function Gene maf beta se pval 2 216287497 rs16854041 intronic FN1 . 0.308645 0.242672 0.0530449 4.77E-06 9 119093923 rs2273977 intronic PAPPA . 0.300273 0.226484 0.0513264 1.02E-05 3 45253196 rs887742 intergenic CDCP1 TMEM158 0.262338 -0.221241 0.050584 1.22E-05 3 45252911 rs13067693 intergenic CDCP1 TMEM158 0.262343 -0.221242 0.0505813 1.22E-05 3 45252540 rs1559995 intergenic CDCP1 TMEM158 0.262346 -0.221231 0.0505796 1.22E-05 3 45252377 rs1559993 intergenic CDCP1 TMEM158 0.262352 -0.221237 0.0505802 1.22E-05 3 45252117 rs12634711 intergenic CDCP1 TMEM158 0.262341 -0.221227 0.0505803 1.22E-05 3 45251900 rs745676 intergenic CDCP1 TMEM158 0.262344 -0.221223 0.0505804 1.22E-05 3 45251591 rs733865 intergenic CDCP1 TMEM158 0.262307 -0.221264 0.0505901 1.22E-05 3 45251128 rs732145 intergenic CDCP1 TMEM158 0.262302 -0.221264 0.0505918 1.22E-05

201 LANGUAGE Table C4. Top 20 loci for Expressive One Word Picture Vocabulary Test after adjusting for LI and RD Chr Start rsID Function Gene maf beta se pval 18 69441249 rs62100637 ncRNA_intronic LOC102724913 . 0.0814053 2.52575 0.493519 3.09E-07 18 69436901 . . . . 0.0764275 2.15417 0.458459 2.62E-06 2 56381924 rs12476224 intergenic MIR217HG LOC100129434 0.468932 1.12533 0.241781 3.25E-06 2 56383694 rs6545553 intergenic MIR217HG LOC100129434 0.468929 1.12534 0.241784 3.25E-06 2 56383762 rs6545554 intergenic MIR217HG LOC100129434 0.468935 1.12528 0.241778 3.25E-06 2 56392656 rs2193479 intergenic MIR217HG LOC100129434 0.46178 1.15119 0.249024 3.79E-06 5 97929333 rs191730 intergenic LINC01340 RGMB 0.758882 -1.27466 0.276608 4.06E-06 5 97827290 rs9327251 intergenic LINC01340 RGMB 0.21879 1.32521 0.289481 4.70E-06 5 97831479 rs13154568 intergenic LINC01340 RGMB 0.218382 1.32665 0.289818 4.70E-06 5 97826575 rs11952745 intergenic LINC01340 RGMB 0.219938 1.32355 0.289549 4.85E-06 2 56392820 rs3861575 intergenic MIR217HG LOC100129434 0.465549 1.12688 0.248164 5.60E-06 7 12757272 rs35208347 intergenic ARL4A ETV1 0.186852 -1.41747 0.314435 6.54E-06 9 1528456 rs882793 intergenic DMRT2 SMARCA2 0.556456 1.07247 0.238786 7.08E-06 7 12768454 rs35773728 intergenic ARL4A ETV1 0.179892 -1.40948 0.316281 8.33E-06 2 18166989 rs150094912 intergenic KCNS3 RDH14 0.165235 1.45811 0.327922 8.73E-06

202 Table C5 Top 20 loci for Peabody Picture Vocabulary test after adjusting for LI and RD Chr Start rsID Function Gene maf beta se pval 19 18175129 rs7247941 intronic IL12RB1 . 0.0831035 -0.676835 0.147615 4.54E-06 19 18173558 rs17879124 intronic IL12RB1 . 0.0831531 -0.673013 0.147062 4.73E-06 19 18173513 rs17878594 intronic IL12RB1 . 0.0831662 -0.672778 0.147009 4.73E-06 19 18171395 rs17879591 intronic IL12RB1 . 0.0830627 -0.67048 0.146634 4.82E-06 4 156583393 rs2625266 intergenic MAP9 GUCY1A3 0.542305 -0.357582 0.0782536 4.89E-06 4 156584522 rs1825283 intergenic MAP9 GUCY1A3 0.542302 -0.357621 0.0782693 4.90E-06 4 156580768 rs2705441 intergenic MAP9 GUCY1A3 0.542274 -0.357266 0.0782304 4.95E-06 19 18177749 rs17878401 intronic IL12RB1 . 0.0800306 -0.686355 0.150944 5.44E-06 19 18177573 rs17878265 intronic IL12RB1 . 0.0800423 -0.685858 0.150874 5.47E-06 13 71514508 rs9564760 intergenic ATXN8OS LINC00348 0.413217 0.343126 0.0761454 6.60E-06 13 71515049 rs9599752 intergenic ATXN8OS LINC00348 0.413405 0.342959 0.0762066 6.78E-06 13 71517175 rs28636525 intergenic ATXN8OS LINC00348 0.414273 0.341703 0.076384 7.70E-06 19 18184629 rs17885060 intronic IL12RB1 . 0.0813061 -0.674237 0.15102 8.02E-06 19 18183818 rs17878896 intronic IL12RB1 . 0.0816399 -0.670442 0.150326 8.20E-06 19 18183917 rs3761041 intronic IL12RB1 . 0.0816327 -0.670234 0.15033 8.26E-06 19 18185582 rs17879435 intronic IL12RB1 . 0.0816443 -0.670181 0.150334 8.27E-06 4 156581066 rs4691826 intergenic MAP9 GUCY1A3 0.543743 -0.350257 0.0786304 8.41E-06 21 21818068 . intergenic MIR548XHG LINC00320 0.575652 0.349088 0.0784629 8.62E-06 13 71511580 rs9564759 intergenic ATXN8OS LINC00348 0.418742 0.332247 0.0749046 9.18E-06 13 71512745 rs116395084 intergenic ATXN8OS LINC00348 0.418716 0.33432 0.0754156 9.29E-06

203 Table C6 Suggestive loci for Weschler Individual Achievement Test –Listening Comprehension after adjusting for LI and RD

Chr Start rsID Function Gene maf beta se pval 4 103196355 rs11725311 intronic SLC39A8 . 0.3387 -0.451784 0.0956639 2.33E-06 4 103198466 rs139036380 intronic SLC39A8 . 0.292212 -0.465425 0.0991201 2.66E-06 4 103194531 rs144401529 intronic SLC39A8 . 0.341469 -0.440547 0.095063 3.58E-06 4 103203467 rs11733504 intronic SLC39A8 . 0.341697 -0.439979 0.0950032 3.64E-06 4 103203500 rs2165265 intronic SLC39A8 . 0.341634 -0.439834 0.0949943 3.65E-06 4 103187146 rs62327916 intronic SLC39A8 . 0.287756 -0.460831 0.100012 4.07E-06 10 43930381 . . . . 0.549857 0.45552 0.0997591 4.97E-06 9 81764216 rs11138098 intergenic LOC101927450 TLE4 0.0649943 -0.877795 0.196086 7.58E-06 14 57703627 . . . . 0.0513359 -0.904796 0.202506 7.90E-06 8 131204233 rs16893265 intronic ASAP1 . 0.191002 0.53454 0.119987 8.39E-06 14 57656473 rs6573133 intergenic OTX2-AS1 EXOC5 0.0497195 -0.893085 0.201481 9.31E-06 14 57692354 rs11845312 intronic EXOC5 . 0.0496927 -0.892771 0.201529 9.42E-06 14 57693793 rs10148605 intronic EXOC5 . 0.0496813 -0.89293 0.201605 9.46E-06 10 15325076 rs72776170 intronic FAM171A1 . 0.10455 -0.674426 0.152278 9.47E-06 14 57694988 rs7142694 intronic EXOC5 . 0.049729 -0.89213 0.201519 9.55E-06

204 PHONOLOGY Table C7 Top 20 loci for multisyllabic word repetition after adjusting for LI and RD Chr Start rsID Function Gene maf beta se pval 13 28335399 rs1231028 intergenic POLR1D GSX1 0.240218 0.559584 0.113289 7.83E-07 6 106319561 rs12525732 intergenic PREP PRDM1 0.0784098 -0.879366 0.180918 1.17E-06 13 28334107 rs1231026 intergenic POLR1D GSX1 0.243425 0.541263 0.111954 1.33E-06 13 28334004 rs1231025 intergenic POLR1D GSX1 0.243782 0.539181 0.111737 1.40E-06 13 28333467 rs1231022 intergenic POLR1D GSX1 0.244059 0.536572 0.111472 1.48E-06 13 28333263 rs1231020 intergenic POLR1D GSX1 0.244123 0.535966 0.111418 1.51E-06 13 28331713 rs1231019 intergenic POLR1D GSX1 0.244097 0.529511 0.110714 1.73E-06 13 28336590 rs1231030 intergenic POLR1D GSX1 0.244022 0.536423 0.112293 1.78E-06 13 28331565 rs1075083 intergenic POLR1D GSX1 0.245042 0.522216 0.110843 2.46E-06 1 114631393 rs2774290 downstream SYT6 . 0.600752 -0.454961 0.0971488 2.83E-06 3 1429611 rs145380805 intronic CNTN6 . 0.301063 -0.483756 0.103385 2.88E-06 13 28329109 rs1231010 intergenic POLR1D GSX1 0.243742 0.518102 0.111268 3.22E-06 13 28330097 rs1231013 intergenic POLR1D GSX1 0.242978 0.516308 0.1112 3.43E-06 13 28329916 rs1231011 intergenic POLR1D GSX1 0.242989 0.516145 0.111197 3.46E-06 13 28330601 rs913091 intergenic POLR1D GSX1 0.244193 0.512485 0.111168 4.03E-06 13 28330549 rs913090 intergenic POLR1D GSX1 0.244156 0.511944 0.111173 4.13E-06 20 5879480 rs11087700 intergenic C20orf196 CHGB 0.102406 -0.721612 0.157224 4.44E-06 9 124966449 rs3793617 UTR3 LHX6 0.234248 0.511629 0.112691 5.62E-06 3 1408503 rs429922 intronic CNTN6 . 0.246211 -0.486832 0.107288 5.69E-06 8 28714877 rs10103360 intronic INTS9 . 0.533102 0.435696 0.0961762 5.89E-06

205

Table C8. Top 20 loci for nonsense word repetition after adjusting for LI and RD Chr Start rsID Function Gene maf beta se pval 10 81949685 rs9645553 intronic ANXA11 0.262219 0.461713 0.0876888 1.40E-07 10 81924830 rs12763624 intronic ANXA11 0.26339 0.461482 0.087651 1.40E-07 10 81936547 rs11591611 intronic ANXA11 0.263239 0.462296 0.0880263 1.51E-07 10 81926339 rs11201950 intronic ANXA11 0.26356 0.458827 0.0876841 1.67E-07 10 81914787 rs3748242 downstream ANXA11 0.265946 0.455197 0.0877327 2.12E-07 10 82059970 rs10736339 intergenic MAT1A DYDC1 0.687732 -0.432748 0.0840555 2.63E-07 10 82060484 rs1856773 intergenic MAT1A DYDC1 0.688937 -0.432228 0.0840717 2.73E-07 10 82061071 rs10736340 intergenic MAT1A DYDC1 0.689617 -0.431575 0.0840424 2.82E-07 10 82061206 rs10736341 intergenic MAT1A DYDC1 0.689963 -0.431177 0.0840189 2.87E-07 10 82090339 rs1361458 intergenic MAT1A DYDC1 0.691855 -0.431939 0.0843136 3.01E-07 10 82075324 rs3120977 intergenic MAT1A DYDC1 0.690936 -0.429837 0.0839368 3.04E-07 10 82068404 rs9663163 intergenic MAT1A DYDC1 0.690924 -0.429848 0.0839423 3.04E-07 10 82070014 rs7072845 intergenic MAT1A DYDC1 0.690937 -0.429806 0.0839354 3.04E-07 10 82095276 rs7895042 downstream DYDC1 0.691951 -0.432091 0.0843816 3.04E-07 10 82066265 rs140867380 intergenic MAT1A DYDC1 0.69092 -0.429828 0.0839413 3.05E-07 10 82071263 rs7089058 intergenic MAT1A DYDC1 0.690924 -0.429775 0.0839316 3.05E-07 10 82064449 rs2486303 intergenic MAT1A DYDC1 0.690922 -0.429769 0.0839325 3.05E-07 10 82064015 rs2486304 intergenic MAT1A DYDC1 0.690918 -0.429716 0.0839273 3.05E-07 13 28335399 rs1231028 intergenic POLR1D GSX1 0.240218 0.474224 0.0927232 3.15E-07

206

SPELLING

Table C9. Suggestive loci for TWS after adjusting for LI and RD Chr Start rsID function Gene maf beta se pval 21 43109579 rs117703989 ncRNA_intronic LINC00111 . 0.0743673 -0.751951 0.162121 3.51E-06 11 101551847 rs193110168 intergenic TRPC6 ANGPTL5 0.0838571 -0.690026 0.1517 5.40E-06 20 1873160 rs2749556 intergenic LOC100289473 SIRPA 0.285968 -0.466179 0.103109 6.15E-06 10 32007130 rs7918852 intergenic ZEB1 ARHGAP12 0.201827 -0.488845 0.109326 7.77E-06 11 87220744 rs1986778 intergenic TMEM135 RAB38 0.495007 0.403479 0.0902524 7.80E-06 10 32003611 rs11008605 intergenic ZEB1 ARHGAP12 0.201986 -0.488599 0.10936 7.90E-06 11 87080166 rs11235119 intergenic TMEM135 RAB38 0.416281 0.401966 0.0904262 8.78E-06 3 71982696 rs58603523 intergenic PROK2 LINC00877 0.162964 -0.532205 0.1199 9.05E-06

READING

Table C10. Top 20 loci for Word Attack after adjusting for LI and RD Chr Start rsid Function Gene maf beta se pval 14 99876505 rs1047351 UTR3 SETD3 0.466013 6.45806 1.31613 9.25E-07 14 99860890 rs11624392 intergenic BCL11B SETD3 0.328849 7.08867 1.45219 1.05E-06 14 99904620 rs6575719 intronic SETD3 0.466577 6.36133 1.31901 1.42E-06 14 99957703 rs2400677 intronic CCNK 0.533631 -6.34501 1.31792 1.48E-06 14 99949522 rs8015304 intronic CCNK 0.533673 -6.34342 1.31771 1.48E-06 14 99936321 rs8011858 intronic SETD3 0.533492 -6.34508 1.31825 1.48E-06 14 99938024 rs8003260 intronic SETD3 0.53354 -6.34386 1.31801 1.49E-06 14 99936258 rs8011670 intronic SETD3 0.533503 -6.34479 1.31825 1.49E-06 14 99863065 rs2295697 intergenic BCL11B SETD3 0.579542 6.22819 1.30112 1.69E-06 14 99928924 rs10148889 intronic SETD3 0.523712 -6.2691 1.31359 1.82E-06

207 14 99942692 rs10136777 intronic SETD3 0.524038 -6.23869 1.3107 1.94E-06 14 99908808 rs6575720 intronic SETD3 0.445045 6.23022 1.31807 2.28E-06 14 99946577 rs3918024 intronic SETD3 0.442548 6.23964 1.32381 2.44E-06 14 99860811 rs1257266 intergenic BCL11B SETD3 0.369404 6.53701 1.38694 2.44E-06 14 99861852 rs749444 intergenic BCL11B SETD3 0.369255 6.48073 1.38749 3.00E-06 14 99964503 rs7158915 intronic CCNK 0.552099 -6.16109 1.32072 3.09E-06 14 99879812 rs1270108 intronic SETD3 0.447349 6.17094 1.32327 3.11E-06 14 99885632 rs7150963 intronic SETD3 0.448753 6.15509 1.32075 3.16E-06 14 99876216 rs2943 UTR3 SETD3 0.360894 6.3731 1.36767 3.16E-06

Table C11 . Suggestive makers Word Identification after adjusting for LI and RD Chr Start rsID Function Gene maf beta se pval 16 84486700 rs193704 intronic ATP2C2 0.487314 2.09935 0.427083 8.85E-07 16 84488930 rs8064169 intronic ATP2C2 0.510597 -1.90921 0.414839 4.18E-06 16 84484938 rs247887 intronic ATP2C2 0.43025 1.80626 0.408234 9.66E-06

208 Table C12. Most significant SNP in genes previously associated with SSD.

Gene Marker P value Phenotype Model

ASPM 1:197067131 0.052 NSW +RD

ATP13A4 rs2280476 0.001 WIATLC +RD ATP2C2 rs193704 3.01E‐07 EOWPVT +RD

AVPR1A rs3021529 0.02 NSW +RD+LI

CNTNAP1 rs3826427 0.078 Fletcher +RD

CYP19A1 rs1143704 0.0004 PPVT +RD+LI

DRD2 rs4938017 0.003 PPVT +LI

FOXP2 rs9969232 0.009 MSW +LI

KIAA0319 rs113456233 0.006 MSW +RD

SETX rs514279 0.001 EOWPVT +LI

209

Appendix D- additional materials for Chapter 6

Table D1. Pathway Analysis- User defined pathways Syndromes RD SLI SSD that affect speech CYP19A1 ATP13A4 ATP13A4 VPS13B ACOT13 ATP2C2 CNTNAP1 PNPLA6 CMIP BDNF CNTNAP2 HPRT1 CNTNAP2 CFTR FOXP1 MAN2B1 DCDC2 CMIP FOXP2 SIL1 DOCK4 CNTNAP2 KIAA0319 KMT2D DYX1C1 DYX1C1 SETX KDM6A FOXP2 FOXP2 CYP19A1 SNORD116 GCFC2 GCFC2 DHCR7 KIAA0320 KIAA0319 GNAQ NRSN1 KIAA0321 EZH2 ROBO1 SETBP1 NSD1 TDP2 UBE3A K1F7 PRKAR1A JAG1 NOTCH2 PAX6 RECQL4 PTEN OCRL1 NFI GPC3 NIPBL SMC1A SMC3

210 Appendix E Additional Materials for Chapter 7 Table E1. Significant (p<=0.001) pathways for articulation and motor control Fletcher GFTA ABC transporters Adrenergic signaling in cardiomyocytes Asthma AMPK signaling pathway Basal transcription factors Butirosin and neomycin biosynthesis Biosynthesis of unsaturated fatty acids Carbon metabolism arbon metabolism Cardiac muscle contraction Chagas disease (American Citrate cycle (TCA cycle) trypanosomiasis) Collecting duct acid secretion Complement and coagulation cascades Glycosphingolipid biosynthesis - ganglio Drug metabolism - other series Glycosylphosphatidylinositol(GPI)-anchor Fatty acid elongation biosynthesis Folate biosynthesis Hepatitis C FoxO signaling pathway Influenza A Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate Insulin signaling pathway Glycosphingolipid biosynthesis - ganglio series mRNA surveillance pathway Glyoxylate and dicarboxylate metabolism N-Glycan biosynthesis Leishmaniasis Non-small cell lung cancer Maturity onset diabetes of the young Porphyrin and chlorophyll metabolism Nitrogen metabolism Prion diseases Protein processing in endoplasmic Oxidative phosphorylation reticulum Pertussis Purine metabolism Phagosome Small cell lung cancer Prion diseases Sphingolipid signaling pathway Propanoate metabolism Staphylococcus aureus infection Protein export Steroid hormone biosynthesis Rheumatoid arthritis Sulfur metabolism Riboflavin metabolism T cell receptor signaling pathway Ribosome Ubiquitin mediated proteolysis Systemic lupus erythematosus Valine leucine and isoleucine biosynthesis

Table E2. Significant pathways for language traits EOWPVT PPVT WIATLC Adipocytokine signaling Adrenergic signaling in pathway cardiomyocytes Adherens junction Alanine aspartate and Adrenergic signaling in glutamate metabolism Alcoholism cardiomyocytes EOWPVT PPVT WIATLC Antigen processing and Aminoacyl-tRNA biosynthesis Alcoholism

211 presentation Apoptosis Amoebiasis Alzheimers disease Arrhythmogenic right ventricular cardiomyopathy Arrhythmogenic right ventricular (ARVC) cardiomyopathy (ARVC) Amphetamine addiction Amyotrophic lateral Axon guidance Axon guidance sclerosis (ALS) B cell receptor signaling pathway Calcium signaling pathway Axon guidance Bacterial invasion of epithelial Biosynthesis of unsaturated cells cAMP signaling pathway fatty acids Central carbon metabolism in cancer Cardiac muscle contraction Bladder cancer cGMP-PKG signaling pathway Cell adhesion molecules (CAMs) Calcium signaling pathway Circadian rhythm Central carbon metabolism in cancer cAMP signaling pathway Collecting duct acid secretion cGMP-PKG signaling pathway Cardiac muscle contraction Cytokine-cytokine receptor Central carbon metabolism interaction Circadian rhythm in cancer Degradation of aromatic cGMP-PKG signaling compounds Collecting duct acid secretion pathway Chagas disease (American Dilated cardiomyopathy Cyanoamino acid metabolism trypanosomiasis) Endocrine and other factor- regulated calcium reabsorption Degradation of aromatic compounds Circadian entrainment Epithelial cell signaling in Helicobacter pylori infection Dilated cardiomyopathy Cocaine addiction Collecting duct acid Epstein-Barr virus infection Dopaminergic synapse secretion ErbB signaling pathway Dorso-ventral axis formation Dopaminergic synapse Endocrine and other factor-regulated Dorso-ventral axis Fat digestion and absorption calcium reabsorption formation Epithelial cell signaling in Helicobacter pylori FoxO signaling pathway Endocytosis infection Glycosaminoglycan Epithelial cell signaling in biosynthesis - keratan sulfate Helicobacter pylori infection Epstein-Barr virus infection GnRH signaling pathway Estrogen signaling pathway Estrogen signaling pathway Hematopoietic cell lineage Fc gamma R-mediated phagocytosis Fatty acid elongation Hepatitis B Focal adhesion Glutamatergic synapse Herpes simplex infection FoxO signaling pathway Hepatitis B Hypertrophic cardiomyopathy Glycosaminoglycan biosynthesis - (HCM) chondroitin sulfate / dermatan sulfate Long-term potentiation Inflammatory regulation of TRP channels Glycosaminoglycan degradation Lysine degradation Glycosphingolipid biosynthesis - globo Influenza A series MAPK signaling pathway Glycosphingolipid biosynthesis - lacto Inositol phosphate metabolism and neolacto series Melanoma Insulin signaling pathway (GPI)-anchor biosynthesi Mineral absorption EOWPVT PPVT WIATLC Mineral absorption GnRH signaling pathway Neuroactive ligand-receptor

212 interaction Neurotrophin signaling N-Glycan biosynthesis Hepatitis B pathway Neurotrophin signaling Nicotinate and nicotinamide pathway Hepatitis C metabolism Non-alcoholic fatty liver disease (NAFLD) Herpes simplex infection Nicotine addiction Osteoclast differentiation Hippo signaling pathway Notch signaling pathway Other types of O-glycan biosynthesis Homologous recombination Oxidative phosphorylation Oxytocin signaling pathway HTLV-I infection Pancreatic cancer Pertussis Huntingtons disease Parkinsons disease Progesterone-mediated oocyte Progesterone-mediated maturation Hypertrophic cardiomyopathy (HCM) oocyte maturation Prolactin signaling pathway Inflammatory bowel disease (IBD) Prostate cancer Inflammatory mediator regulation of Protein processing in Rap1 signaling pathway TRP channels endoplasmic reticulum Regulation of actin cytoskeleton Influenza A Rap1 signaling pathway Intestinal immune network for IgA Rheumatoid arthritis production Ras signaling pathway RIG-I-like receptor signaling Retrograde endocannabinoid pathway Leishmaniasis signaling Shigellosis MAPK signaling pathway Rheumatoid arthritis RIG-I-like receptor Steroid hormone biosynthesis Melanogenesis signaling pathway Thyroid hormone signaling Synthesis and degradation pathway Mineral absorption of ketone bodies Systemic lupus Tight junction mRNA surveillance pathway erythematosus Terpenoid backbone TNF signaling pathway Neurotrophin signaling pathway biosynthesis Toll-like receptor signaling Ubiquitin mediated pathway NF-kappa B signaling pathway proteolysis Vasopressin-regulated water Toxoplasmosis NOD-like receptor signaling pathway reabsorption Tuberculosis Oocyte meiosis Vibrio cholerae infection Ubiquinone and other terpenoid-quinone biosynthesis Osteoclast differentiation x VEGF signaling pathway Other types of O-glycan biosynthesis Vibrio cholerae infection Oxytocin signaling pathway Viral carcinogenesis Pancreatic cancer x Pertussis PI3K-Akt signaling pathway Primary immunodeficiency Progesterone-mediated oocyte maturation Protein digestion and absorption Rap1 signaling pathway PPVT

213 Retinol metabolism Rheumatoid arthritis Riboflavin metabolism RIG-I-like receptor signaling pathway Salmonella infection Shigellosis Small cell lung cancer Sphingolipid signaling pathway Taste transduction Taurine and hypotaurine metabolism Tight junction TNF signaling pathway Toll-like receptor signaling pathway Toxoplasmosis Tuberculosis Type II diabetes mellitus Viral carcinogenesis

Table E3. Significant pathways for phonology traits MSW NSW Adipocytokine signaling pathway Apoptosis B cell receptor signaling pathway Basal cell carcinoma Basal cell carcinoma cGMP-PKG signaling pathway Biosynthesis of unsaturated fatty acids Collecting duct acid secretion Butirosin and neomycin biosynthesis Epithelial cell signaling in Helicobacter pylori infection cGMP-PKG signaling pathway Epstein-Barr virus infection Collecting duct acid secretion Glucagon signaling pathway Fanconi anemia pathway Histidine metabolism Glucagon signaling pathway Huntingtons disease Glycerolipid metabolism Inositol phosphate metabolism Glycolysis / Gluconeogenesis Lysine degradation Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate Melanogenesis Glycosaminoglycan biosynthesis - heparan sulfate / heparin Metabolic pathways Glycosaminoglycan degradation Mucin type O-Glycan biosynthesis Hepatitis B Nicotinate and nicotinamide metabolism HTLV-I infection Oxidative phosphorylation Huntingtons disease Porphyrin and chlorophyll metabolism Inflammatory bowel disease (IBD) Prostate cancer Insulin signaling pathway Pyruvate metabolism MSW NSW

214 Lysine degradation Rheumatoid arthritis Maturity onset diabetes of the young Riboflavin metabolism N-Glycan biosynthesis Ribosome Natural killer cell mediated cytotoxicity Selenocompound metabolism Oocyte meiosis Sulfur metabolism Osteoclast differentiation cycle Oxidative phosphorylation Tryptophan metabolism Pertussis Tyrosine metabolism Phagosome Vasopressin-regulated water reabsorption Phototransduction Wnt signaling pathway Rheumatoid arthritis Ribosome Salivary secretion Staphylococcus aureus infection Starch and sucrose metabolism T cell receptor signaling pathway Ubiquitin mediated proteolysis Vasopressin-regulated water reabsorption Wnt signaling pathway

Table E4. Significant Pathways for reading traits. WRDATK WRDID ABC transporters Axon guidance Choline metabolism in cancer Calcium signaling pathway Colorectal cancer cGMP-PKG signaling pathway Degradation of aromatic compounds Choline metabolism in cancer Endocrine and other factor-regulated calcium Endocytosis reabsorption Epstein-Barr virus infection Gap junction Gastric acid secretion Gastric acid secretion Glutamatergic synapse Glycerolipid metabolism Glycosaminoglycan biosynthesis - chondroitin Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate sulfate / dermatan sulfate Glycosphingolipid biosynthesis - lacto and Glycosaminoglycan biosynthesis - keratan sulfate neolacto series Glycosylphosphatidylinositol(GPI)-anchor Glycosphingolipid biosynthesis - ganglio series biosynthesis Glycosphingolipid biosynthesis - lacto and neolacto series Graft-versus-host disease Glycosylphosphatidylinositol(GPI)-anchor biosynthesis HTLV-I infection GnRH signaling pathway Inflammatory bowel disease (IBD) Hepatitis B Insulin secretion MSW NSW

215 Herpes simplex infection Legionellosis Inflammatory bowel disease (IBD) Melanogenesis Inflammatory mediator regulation of TRP channels Mineral absorption Intestinal immune network for IgA production mTOR signaling pathway Legionellosis Other types of O-glycan biosynthesis Melanogenesis Pathogenic Escherichia coli infection N-Glycan biosynthesis Peroxisome Natural killer cell mediated cytotoxicity Phototransduction Neurotrophin signaling pathway Progesterone-mediated oocyte maturation NOD-like receptor signaling pathway Protein export Non-homologous end-joining Retinol metabolism Osteoclast differentiation Riboflavin metabolism Pancreatic secretion Ribosome Pathogenic Escherichia coli infection Salmonella infection Pathways in cancer Synthesis and degradation of ketone bodies Primary immunodeficiency TGF-beta signaling pathway Progesterone-mediated oocyte maturation Thyroid cancer Prolactin signaling pathway Toll-like receptor signaling pathway Ubiquinone and other terpenoid-quinone Protein export biosynthesis Protein processing in endoplasmic reticulum Ras signaling pathway Riboflavin metabolism RIG-I-like receptor signaling pathway Salmonella infection Steroid biosynthesis T cell receptor signaling pathway Toll-like receptor signaling pathway Ubiquinone and other terpenoid-quinone biosynthesis Ubiquitin mediated proteolysis Vitamin digestion and absorption Wnt signaling pathway

Table E5. Significant pathways for spelling TWS Adrenergic signaling in cardiomyocytes alpha-Linolenic acid metabolism Axon guidance Biosynthesis of unsaturated fatty acids Calcium signaling pathway cAMP signaling pathway cGMP-PKG signaling pathway

216 Chagas disease (American trypanosomiasis) Collecting duct acid secretion Cyanoamino acid metabolism Cytosolic DNA-sensing pathway Dorso-ventral axis formation Epstein-Barr virus infection Fatty acid elongation Gap junction Glutamatergic synapse Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate Hepatitis B Herpes simplex infection HTLV-I infection Huntingtons disease Long-term potentiation MAPK signaling pathway Melanogenesis NOD-like receptor signaling pathway Non-homologous end-joining Osteoclast differentiation Ovarian steroidogenesis Oxidative phosphorylation Peroxisome Primary immunodeficiency Progesterone-mediated oocyte maturation Protein processing in endoplasmic reticulum Retrograde endocannabinoid signaling Riboflavin metabolism RIG-I-like receptor signaling pathway Salivary secretion Selenocompound metabolism Tyrosine metabolism Ubiquitin mediated proteolysis

217 Table E6 Pathways significant in 3 traits. Gray box indicates the pathway was significant Artic./ Language Phonol Reading motor ogy

Pathways

T

Fletcher GFTA EOWPV Total Without BT PPVT WIATLC MSW NSW WRDATK WRDID TWS BTSPEECH Grand cAMP signaling pathway 1 1 1 3 3 Cardiac muscle contraction 1 1 1 3 3 Central carbon metabolism in cancer 1 1 1 3 3 Chagas disease (American trypanosomiasis) 1 1 1 3 3 Degradation of aromatic compounds 1 1 1 1 4 3 Dorso‐ventral axis formation 1 1 1 1 4 3 Endocrine and other factor‐ regulated calcium reabsorption 1 1 1 3 3 Fatty acid elongation 1 1 1 3 3 FoxO signaling pathway 1 1 1 3 3 Glutamatergic synapse 1 1 1 3 3 Glycosphingolipid biosynthesis ‐ ganglio series 1 1 1 3 3 Glycosphingolipid biosynthesis ‐ lacto and neolacto series 1 1 1 3 3 GnRH signaling pathway 1 1 1 3 3 Inflammatory mediator regulation of TRP channels 1 1 1 3 3 Influenza A 1 1 1 3 3 Insulin signaling pathway 1 1 1 1 4 3 Lysine degradation 1 1 1 1 4 3 MAPK signaling pathway 1 1 1 3 3 N‐Glycan biosynthesis 1 1 1 4 3 NOD‐like receptor signaling pathway 1 1 1 3 3 Other types of O‐glycan biosynthesis 1 1 1 3 3 Primary immunodeficiency 1 1 1 3 3 Protein export 1 1 1 3 3 Rap1 signaling pathway 1 1 1 3 3 Salmonella infection 1 1 1 3 3 T cell receptor signaling pathway 1 1 1 3 3 Ubiquinone and other terpenoid‐ quinone biosynthesis 1 1 1 3 3 Vasopressin‐regulated water reabsorption 1 1 1 1 4 3 Wnt signaling pathway 1 1 1 3 3

218

219 Bibliography

Abad, E. et al. The analysis of semantic networks in multiple sclerosis identifies preferential damage of long-range connectivity. Mult. Scler. Relat. Disord. 4, 387–394 (2015).

Ackermann, H., Mathiak, K. & Riecker, A. The contribution of the cerebellum to speech production and speech perception: clinical and functional imaging data. Cerebellum 6, 202–213 (2007).

Alarcón, M. et al. Linkage, association, and gene-expression analyses identify CNTNAP2 as an autism-susceptibility gene. Am. J. Hum. Genet. 82, 150–159 (2008).

Alldred, S. K. et al. in Cochrane Database of Systematic Reviews (John Wiley & Sons, Ltd, 1996). doi:10.1002/14651858.CD011975

Alvarez-Buylla, A. & Garcia-Verdugo, J. M. Neurogenesis in adult subventricular zone. J. Neurosci. 22, 629–634 (2002).

Antoniou, X., Falconi, M., Di Marino, D. & Borsello, T. JNK3 as a therapeutic target for neurodegenerative diseases. J. Alzheimers. Dis. 24, 633–642 (2011).

Ball, E., Robson, S. C., Ayis, S., Lyall, F. & Bulmer, J. N. Early embryonic demise: no evidence of abnormal spiral artery transformation or trophoblast invasion. J. Pathol. 208, 528–534 (2006).

Ballif, B. C. et al. High-resolution array CGH defines critical regions and candidate genes for microcephaly, abnormalities of the corpus callosum, and seizure phenotypes in patients with microdeletions of 1q43q44. Hum. Genet. 131, 145–156 (2012).

Basel-Vanagaite, L. et al. Deficiency for the ubiquitin ligase UBE3B in a blepharophimosis-ptosis-intellectual-disability syndrome. Am. J. Hum. Genet. 91, 998– 1010 (2012).

Behrman, R. E., Butler, A. S. & Institute of Medicine (US) Committee on Understanding Premature Birth and Assuring Healthy Outcomes. Neurodevelopmental, Health, and Family Outcomes for Infants Born Preterm. (National Academies Press (US), 2007).

Boycott, K. M. et al. Autosomal-Recessive Intellectual Disability with Cerebellar Atrophy Syndrome Caused by Mutation of the Manganese and Zinc Transporter Gene SLC39A8. Am. J. Hum. Genet. 97, 886–893 (2015).

Breunig, J. J. et al. Primary cilia regulate hippocampal neurogenesis by mediating sonic hedgehog signaling. Proc. Natl. Acad. Sci. U. S. A. 105, 13127–13132 (2008).

Browning, B. L. & Browning, S. R. Genotype Imputation with Millions of Reference

220 Samples. Am. J. Hum. Genet. 98, 116–126 (2016).

Capra, J. A., Erwin, G. D., McKinsey, G., Rubenstein, J. L. R. & Pollard, K. S. Many human accelerated regions are developmental enhancers. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20130025 (2013).

Catts, H. W., Adlof, S. M., Hogan, T. P. & Weismer, S. E. Are specific language impairment and dyslexia distinct disorders? J. Speech Lang. Hear. Res. 48, 1378–1396 (2005).

Centanni, T. M. et al. Speech sound processing deficits and training-induced neural plasticity in rats with dyslexia gene knockdown. PLoS One 9, e98439 (2014).

Chen, K. et al. Activation of Toll-like receptor 2 on microglia promotes cell uptake of Alzheimer disease-associated amyloid beta peptide. J. Biol. Chem. 281, 3651–3659 (2006).

Chen, W. Y. & Abatangelo, G. Functions of hyaluronan in wound repair. Wound Repair Regen. 7, 79–89 (1999).

Chow, C. Y. Bringing genetic background into focus. Nat. Rev. Genet. 17, 63–64 (2016).

Christians JK and Beristain AG. ADAM12 and PAPP-A: Candidate regulators of trophobast invation and first trimester markers of health trophoblasts. Cell Adhesion & Migration 10, 147-153 (2016).

Ciechanover, A., Orian, A. & Schwartz, A. L. Ubiquitin-mediated proteolysis: biological regulation via destruction. Bioessays 22, 442–451 (2000).

Conover, C. A. Key questions and answers about pregnancy-associated plasma protein-A. Trends Endocrinol. Metab. 23, 242–249 (2012).

Crider, K. S., Whitehead, N. & Buus, R. M. Genetic variation associated with preterm birth: A HuGE review. Genet. Med. 7, 593–604 (2005).

Crisafulli, C., Drago, A., Calabrò, M., Spina, E. & Serretti, A. A molecular pathway analysis informs the genetic background at risk for schizophrenia. Prog. Neuropsychopharmacol. Biol. Psychiatry 59, 21–30 (2015).

Cutting, G. R. Cystic fibrosis genetics: from molecular understanding to clinical application. Nat. Rev. Genet. 16, 45–56 (2015).

David AL and Jauniaux. Ultrasound and endocrinological markers of first trimester placentation and subsequent fetal size, Placenta 40, 29-33 (2016).

Dillon, C. M. & Pisoni, D. B. Non word Repetition and Reading Skills in Children Who Are Deaf and Have Cochlear Implants. Volta Rev. 106, 121–145 (2006).

221

Duan, X., Kang, E., Liu, C. Y., Ming, G.-L. & Song, H. Development of neural stem cell in the adult brain. Curr. Opin. Neurobiol. 18, 108–115 (2008).

Dugoff, L. et al. First-trimester maternal serum PAPP-A and free-beta subunit human chorionic gonadotropin concentrations and nuchal translucency are associated with obstetric complications: a population-based screening study (the FASTER Trial). Am. J. Obstet. Gynecol. 191, 1446–1451 (2004).

Dyer AH, Vahdatpour C, Sanfeliu A, Tropea D. The role of Insulin-Like Growth Factor 1 (IGF-1) in brain development, maturation and neuroplasticity. Neuroscience 325, 89-99 (2016).

Ecker, J. R. et al. Genomics: ENCODE explained. Nature 489, 52–55 (2012).

Eom, G. H. et al. Histone methyltransferase SETD3 regulates muscle differentiation. J. Biol. Chem. 286, 34733–34742 (2011).

Etienne-Manneville, S. & Hall, A. Rho GTPases in cell biology. Nature 420, 629–635 (2002).

Fernandez AM, Torres-Aleman I. The many faces of insulin-like peptide signalling in the brain. Nat Rev Neurosci.13, 225–239 (2012).

Ferris, S. H. & Farlow, M. Language impairment in Alzheimer’s disease and benefits of acetylcholinesterase inhibitors. Clin. Interv. Aging 8, 1007–1014 (2013).

Franco, P. G. et al. Paving the way for adequate myelination: The contribution of galectin-3, transferrin and iron. FEBS Lett. 589, 3388–3395 (2015).

Furuya, M., Ishida, J., Aoki, I. & Fukamizu, A. Pathophysiology of placentation abnormalities in pregnancy-induced hypertension. Vasc. Health Risk Manag. 4, 1301– 1313 (2008).

García-Bueno, B. et al. Evidence of activation of the Toll-like receptor-4 proinflammatory pathway in patients with schizophrenia. J. Psychiatry Neurosci. 41, E46–55 (2016).

Gathercole, S. E., Alloway, T. P., Willis, C. & Adams, A.-M. Working memory in children with reading disabilities. J. Exp. Child Psychol. 93, 265–281 (2006).

Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).

Gibon, J. et al. The X-linked inhibitor of apoptosis regulates long-term depression and learning rate. FASEB J. (2016). doi:10.1096/fj.201600384R

222 Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2011).

Gilman, C. P. & Mattson, M. P. Do apoptotic mechanisms regulate synaptic plasticity and growth-cone motility? Neuromolecular Med. 2, 197–214 (2002).

Goswami, U. in Encyclopedia of the Sciences of Learning (ed. Seel, N. M.) 2625–2627 (Springer US, 2012). doi:10.1007/978-1-4419-1428-6_148

Govek, E.-E., Newey, S. E. & Van Aelst, L. The role of the Rho GTPases in neuronal development. Genes Dev. 19, 1–49 (2005).

Graf Estes, K., Evans, J. L. & Else-Quest, N. M. Differences in the nonword repetition performance of children with and without specific language impairment: a meta-analysis. J. Speech Lang. Hear. Res. 50, 177–195 (2007).

Hamdan, F. F. et al. De novo mutations in FOXP1 in cases with intellectual disability, autism, and language impairment. Am. J. Hum. Genet. 87, 671–678 (2010).

Han, Y.-G. et al. Hedgehog signaling and primary cilia are required for the formation of adult neural stem cells. Nat. Neurosci. 11, 277–284 (2008).

Handschuh, K. et al. Modulation of PAPP-A expression by PPARgamma in human first trimester trophoblast. Placenta 27 Suppl A, S127–34 (2006).

Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, McDermott MG, Ma'ayan A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford). (2016).

Harris, T. P., Schimenti, K. J., Munroe, R. J. & Schimenti, J. C. IQ motif-containing G (Iqcg) is required for mouse spermiogenesis. G3 4, 367–372 (2014).

Hatakeyama, S. & Nakayama, K. I. Ubiquitylation as a quality control system for intracellular proteins. J. Biochem. 134, 1–8 (2003).

Hon, G. C., Hawkins, R. D. & Ren, B. Predictive chromatin signatures in the mammalian genome. Hum. Mol. Genet. 18, R195–201 (2009).

Huang, Z., Zhao, C. & Radi, A. Characterization of hyaluronan, hyaluronidase PH20, and HA synthase HAS2 in inflammation and cancer. Inflammation and Cell Signaling 1, (2014).

Huppert B and Peeters LLH. Vascular biology in implantation and placentation Angiogenesis. 8, 157-167 (2005).

Jenkitkasemwong, S., Wang, C.-Y., Mackenzie, B. & Knutson, M. D. Physiologic

223 implications of metal-ion transport by ZIP14 and ZIP8. Biometals 25, 643–655 (2012).

Jennische, M. & Sedin, G. Speech and language skills in children who required neonatal intensive care: evaluation at 6.5 y of age based on interviews with parents. Acta Paediatr. 88, 975–982 (1999).

Jonas, P. & Lisman, J. Structure, function, and plasticity of hippocampal dentate gyrus microcircuits. Front. Neural Circuits 8, 107 (2014).

Jones, A. C. & Rawson, K. A. Do reading and spelling share a lexicon? Cogn. Psychol. 86, 152–184 (2016).

Jonsdottir, S., Bouma, A., Sergeant, J. A. & Scherder, E. J. A. The impact of specific language impairment on working memory in children with ADHD combined subtype. Arch. Clin. Neuropsychol. 20, 443–456 (2005).

Kajkowski, E. M. et al. β-Amyloid Peptide-induced Apoptosis Regulated by a Novel Protein Containing a G Protein Activation Module. J. Biol. Chem. 276, 18748–18756 (2001).

Kambe, T., Hashimoto, A. & Fujimoto, S. Current understanding of ZIP and ZnT zinc transporters in human health and diseases. Cell. Mol. Life Sci. 71, 3281–3295 (2014).

Katoh, M. & Katoh, M. Identification and characterization of human ARHGAP23 gene in silico. Int. J. Oncol. 25, 535–540 (2004).

Kolundžić, Z., Lenček, M., Klarić-Šimić, A. & Tesari, H. IMPACT OF PRETURITY ON ARTICULATION IN CHILDREN. Paediatria Croatica (2008).

Kunde, S.-A. et al. Characterisation of de novo MAPK10/JNK3 truncation mutations associated with cognitive disorders in two unrelated patients. Hum. Genet. 132, 461–471 (2013).

Lathia, J. D. et al. Toll-like receptor 3 is a negative regulator of embryonic neural progenitor cell proliferation. J. Neurosci. 28, 13978–13984 (2008).

Lee, J. W., Kim, W. R., Sun, W. & Jung, M. W. Disruption of dentate gyrus blocks effect of visual input on spatial firing of CA1 neurons. J. Neurosci. 32, 12999–13003 (2012).

Levelt, W. Phonological encoding in speech production: Comments on Jurafsky et al., Schiller et al., and van Heuven & Haan. Lab. Phonol. (2002).

Lewis, B. A. et al. Adolescent outcomes of children with early speech sound disorders with and without language impairment. Am. J. Speech. Lang. Pathol. 24, 150–163 (2015).

Lewis, B. A. et al. Subtyping Children With Speech Sound Disorders by

224 Endophenotypes. Top. Lang. Disord. 31, 112–127 (2011).

Li, M. et al. Recent Positive Selection Drives the Expansion of a Schizophrenia Risk Nonsynonymous Variant at SLC39A8 in Europeans. Schizophr. Bull. 42, 178–190 (2016).

Loucas, T., Baird, G., Simonoff, E. & Slonims, V. Phonological processing in children with specific language impairment with and without reading difficulties. Int. J. Lang. Commun. Disord. (2016). doi:10.1111/1460-6984.12225

Madar, R. et al. Postnatal TLR2 activation impairs learning and memory in adulthood. Brain Behav. Immun. 48, 301–312 (2015).

Madsen, G. F., Bilenberg, N., Cantio, C. & Oranje, B. Increased prepulse inhibition and sensitization of the startle reflex in autistic children. Autism Res. 7, 94–103 (2014).

Mascheretti et al. KIAA0319 and ROBO1: evidence on association with reading and pleiotropic effects on language and mathematics abilities in developmental dyslexia. J Hum Gen 59, 189-97 (2014)

Massinen S, Wang J, Laivuori K, et al. Genomic sequencing of a dyslexia susceptibility haplotype encompassing ROBO1. Journal of Neurodevelopmental Disorders. 8:4 (2016)

Matise, M. P. & Wang, H. in Current Topics in Developmental Biology (ed. Carmen Birchmeier) Volume 97, 75–117 (Academic Press, 2011).

McCorvie, T. J. et al. Molecular basis of classic galactosemia from the structure of human galactose 1-phosphate uridylyltransferase. Hum. Mol. Genet. (2016). doi:10.1093/hmg/ddw091

Micaroni, M. Calcium around the Golgi apparatus: implications for intracellular membrane trafficking. Adv. Exp. Med. Biol. 740, 439–460 (2012).

Mirsaeidi, M., Gidfar, S., Vu, A. & Schraufnagel, D. Annexins family: insights into their functions and potential role in pathogenesis of sarcoidosis. J. Transl. Med. 14, 89 (2016).

Mitchell, A. M. & Brady, S. A. The effect of vocabulary knowledge on novel word identification. Ann. Dyslexia 63, 201–216 (2013).

Morgan TK. Placental insufficiency is a leading cause of preterm brith. NeoReviews 15:12 pp. e518-e525 (2015).

Mousa, A. & Bakhiet, M. Role of cytokine signaling during nervous system development. Int. J. Mol. Sci. 14, 13931–13957 (2013).

Nalivaeva, N. N. & Turner, A. J. The amyloid precursor protein: a biochemical enigma in

225 brain development, function and disease. FEBS Lett. 587, 2046–2054 (2013).

Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

Newbury, D. F. et al. Investigation of dyslexia and SLI risk variants in reading- and language-impaired subjects. Behav. Genet. 41, 90–104 (2011).

Newbury, D. F. & Monaco, A. P. Genetic advances in the study of speech and language disorders. Neuron 68, 309–320 (2010).

Northam, G. B. et al. Speech and oromotor outcome in adolescents born preterm: relationship to motor tract integrity. J. Pediatr. 160, 402–408.e1 (2012).

Okun, E., Griffioen, K. J. & Mattson, M. P. Toll-like receptor signaling in neural plasticity and disease. Trends Neurosci. 34, 269–281 (2011).

Okun, E. et al. TLR2 activation inhibits embryonic neural progenitor cell proliferation. J. Neurochem. 114, 462–474 (2010).

Okun, E. et al. Toll-like receptor 3 inhibits memory retention and constrains adult hippocampal neurogenesis. Proc. Natl. Acad. Sci. U. S. A. 107, 15625–15630 (2010).

Parast, M. M., Aeder, S. & Sutherland, A. E. Trophoblast giant-cell differentiation involves changes in cytoskeleton and cell motility. Dev. Biol. 230, 43–60 (2001).

Parast, M. M., Aeder, S. & Sutherland, A. E. Trophoblast giant-cell differentiation involves changes in cytoskeleton and cell motility. Dev. Biol. 230, 43–60 (2001).

Parets, S. E., Knight, A. K. & Smith, A. K. Insights into genetic susceptibility in the etiology of spontaneous preterm birth. Appl. Clin. Genet. 8, 283–290 (2015).

Park, J. H. et al. SLC39A8 Deficiency: A Disorder of Manganese Transport and Glycosylation. Am. J. Hum. Genet. 97, 894–903 (2015).

Pennington, B. F. From single to multiple deficit models of developmental disorders. Cognition 101, 385–413 (2006).

Peter, B. et al. Replication of CNTNAP2 association with nonword repetition and support for FOXP2 association with timed reading and motor activities in a dyslexia family sample. J. Neurodev. Disord. 3, 39–49 (2011).

Pickering, M. & O’Connor, J. J. in Progress in Brain Research (ed. Helen E. Scharfman) Volume 163, 339–354 (Elsevier, 2007).

Pierce, A., Miller, G., Arden, R. & Gottfredson, L. S. Why is intelligence correlated with

226 semen quality?: Biochemical pathways common to sperm and neuron function and their vulnerability to pleiotropic mutations. Commun. Integr. Biol. 2, 385–387 (2009).

Preston, M. & Sherman, L. S. Neural stem cell niches: roles for the hyaluronan-based extracellular matrix. Front. Biosci. 3, 1165–1179 (2011).

Preston, M. et al. Digestion products of the PH20 hyaluronidase inhibit remyelination. Ann. Neurol. 73, 266–280 (2013).

Pummara, P., Tongsong, T., Wanapirak, C., Sirichotiyakul, S. & Luewan, S. Association of first-trimester pregnancy-associated plasma protein A levels and idiopathic preterm delivery: A population-based screening study. Taiwan. J. Obstet. Gynecol. 55, 72–75 (2016).

Pusapati, G. V. et al. EFCAB7 and IQCE regulate hedgehog signaling by tethering the EVC-EVC2 complex to the base of primary cilia. Dev. Cell 28, 483–496 (2014).

Ramanan, V. K., Shen, L., Moore, J. H. & Saykin, A. J. Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends Genet. 28, 323– 332 (2012).

Ramus, F., Marshall, C. R., Rosen, S. & van der Lely, H. K. J. Phonological deficits in specific language impairment and developmental dyslexia: towards a multidimensional model. Brain 136, 630–645 (2013).

Rapp, B. & Lipka, K. The literate brain: the relationship between spelling and reading. J. Cogn. Neurosci. 23, 1180–1197 (2011).

Reese, K. L. et al. Acidic hyaluronidase activity is present in mouse sperm and is reduced in the absence of SPAM1: evidence for a role for hyaluronidase 3 in mouse and human sperm. Mol. Reprod. Dev. 77, 759–772 (2010).

Ridel, K. R., Leslie, N. D. & Gilbert, D. L. An updated review of the long-term neurological effects of galactosemia. Pediatr. Neurol. 33, 153–161 (2005).

Rivera-Mancía, S., Ríos, C. & Montes, S. Manganese accumulation in the CNS and associated pathologies. Biometals 24, 811–825 (2011).

Rivera-Mancía, S., Ríos, C. & Montes, S. Manganese accumulation in the CNS and associated pathologies. Biometals 24, 811–825 (2011).

Rogalski, E., Johnson, N., Weintraub, S. & Mesulam, M. Increased frequency of learning disability in patients with primary progressive aphasia and their first-degree relatives. Arch. Neurol. 65, 244–248 (2008).

Rolls, A. et al. Toll-like receptors modulate adult hippocampal neurogenesis. Nat. Cell

227 Biol. 9, 1081–1088 (2007).

Rosen, K. M., Goozée, J. V. & Murdoch, B. E. Examining the effects of multiple sclerosis on speech production: does phonetic structure matter? J. Commun. Disord. 41, 49–69 (2008).

Rouault, T. A. Iron metabolism in the CNS: implications for neurodegenerative diseases. Nat. Rev. Neurosci. 14, 551–564 (2013).

Salmaso, N., Jablonska, B., Scafidi, J., Vaccarino, F. M. & Gallo, V. Neurobiology of premature brain injury. Nat. Neurosci. 17, 341–346 (2014).

Seifan, A. et al. Childhood Learning Disabilities and Atypical Dementia: A Retrospective Chart Review. PLoS One 10, e0129919 (2015).

Severyn, C. J., Shinde, U. & Rotwein, P. Molecular biology, genetics and biochemistry of the repulsive guidance molecule family. Biochem. J 422, 393–403 (2009).

Shavit, J. A. et al. Impaired megakaryopoiesis and behavioral defects in mafG-null mutant mice. Genes Dev. 12, 2164–2174 (1998).

Shriberg, c. L. D., Potter, N. L. & Strand, E. A. Prevalence and Phenotype of Childhood Apraxia of Speech In Youth with Galactosemia. J. Speech Lang. Hear. Res. 54, 487–519 (2011).

Sices, L., Taylor, H. G., Freebairn, L., Hansen, A. & Lewis, B. Relationship between speech-sound disorders and early literacy skills in preschool-age children: impact of comorbid language impairment. J. Dev. Behav. Pediatr. 28, 438–447 (2007).

Siegel, L. S. Perspectives on dyslexia. Paediatr. Child Health 11, 581–587 (2006).

Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).

Sloane, J. A. et al. Hyaluronan blocks oligodendrocyte progenitor maturation and remyelination through TLR2. Proc. Natl. Acad. Sci. U. S. A. 107, 11555–11560 (2010).

Slomiany, M. G. & Toole, B. P. in Hyaluronan in Cancer Biology 19–35 (Academic Press, 2009). doi:10.1016/B978-012374178-3.10002-X

Soulika, A. M. et al. Initiation and progression of axonopathy in experimental autoimmune encephalomyelitis. J. Neurosci. 29, 14965–14979 (2009).

Squire, L. R., Stark, C. E. L. & Clark, R. E. The medial temporal lobe. Annu. Rev. Neurosci. 27, 279–306 (2004).

228 Strieter, E. R. & Korasick, D. A. Unraveling the complexity of ubiquitin signaling. ACS Chem. Biol. 7, 52–63 (2012).

Sun, I. Y. C., Overgaard, M. T., Oxvig, C. & Giudice, L. C. Pregnancy-associated plasma protein A proteolytic activity is associated with the human placental trophoblast cell membrane. J. Clin. Endocrinol. Metab. 87, 5235–5240 (2002).

The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

Toole, B. P. Hyaluronan is not just a goo! J. Clin. Invest. 106, 335–336 (2000).

Tuschl, K., Mills, P. B. & Clayton, P. T. Manganese and the brain. Int. Rev. Neurobiol. 110, 277–312 (2013).

Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

Ullman, M. T. Contributions of memory circuits to language: the declarative/procedural model. Cognition 92, 231–270 (2004).

Ullman, M. T. A neurocognitive perspective on language: the declarative/procedural model. Nat. Rev. Neurosci. 2, 717–726 (2001).

Vellutino, F. R. & Scanlon, D. M. Phonological Coding, Phonological Awareness, and Reading Ability: Evidence from a Longitudinal and Experimental Study. Merrill. Palmer. Q. 33, 321–363 (1987).

Vernes, S. C. et al. A functional genetic link between distinct developmental language disorders. N. Engl. J. Med. 359, 2337–2345 (2008).

Vivar, J. C., Pemu, P., McPherson, R. & Ghosh, S. Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and ‘Big data’ biology. OMICS 17, 414–422 (2013).

Wang, J. et al. Annexin A11 in disease. Clin. Chim. Acta 431, 164–168 (2014).

Wang, K., Li, M. & Hakonarson, H. Analysing biological pathways in genome-wide association studies. Nat. Rev. Genet. 11, 843–854 (2010).

Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–4 (2012).

Waters, A. M. & Beales, P. L. Ciliopathies: an expanding disease spectrum. Pediatr. Nephrol. 26, 1039–1056 (2011).

229

Wen, Y., Alshikho, M. J. & Herbert, M. R. Pathway Network Analyses for Autism Reveal Multisystem Involvement, Major Overlaps with Other Diseases and Convergence upon MAPK and Calcium Signaling. PLoS One 11, e0153329 (2016).

White, E. J., Hutka, S. A., Williams, L. J. & Moreno, S. Learning, neural plasticity and sensitive periods: implications for language acquisition, music training and transfer across the lifespan. Front. Syst. Neurosci. 7, 90 (2013).

Whitehouse, A. J. O., Bishop, D. V. M., Ang, Q. W., Pennell, C. E. & Fisher, S. E. CNTNAP2 variants affect early language development in the general population. Genes Brain Behav. 10, 451–456 (2011).

Wolman, M. A., Jain, R. A., Marsden, K. C., Bell, H., Skinner, J., Hayer, K. E., … Granato, M A genome wide screen identifies PAPP-AA mediated IGFR signaling as a novel regulator of habituation learning. Neuron 85, 1200–1211 (2015).

Wong, A. K., Krishnan, A., Yao, V., Tadych, A. & Troyanskaya, O. G. IMP 2.0: a multi- species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 43, W128–33 (2015).

Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

Yang, L. et al. Polygenic transmission and complex neuro developmental network for attention deficit hyperactivity disorder: genome-wide association study of both common and rare variants. Am. J. Med. Genet. B Neuropsychiatr. Genet. 162B, 419–430 (2013).

Yi, J. J. et al. An Autism-Linked Mutation Disables Phosphorylation Control of UBE3A. Cell 162, 795–807 (2015).

Yoshiki, A. & Moriwaki, K. Mouse phenome research: implications of genetic background. ILAR J. 47, 94–102 (2006).

Zeng, P. et al. Statistical analysis for genome-wide association study. J. Biomed. Res. 29, 285–297 (2015).

Zhong, Y., Zhu, F. & Ding, Y. Serum screening in first trimester to predict pre- eclampsia, small for gestational age and preterm delivery: systematic review and meta- analysis. BMC Pregnancy Childbirth 15, 191 (2015).

Álvarez-Buylla, A. & Ihrie, R. A. Sonic hedgehog signaling in the postnatal brain. Semin. Cell Dev. Biol. 33, 105–111 (2014).

230