DISSECTING THE GENETICS OF HUMAN COMMUNICATION:
INSIGHTS INTO SPEECH, LANGUAGE, AND READING
by
HEATHER ASHLEY VOSS-HOYNES
Submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Department of Epidemiology and Biostatistics
CASE WESTERN RESERVE UNIVERSITY
January 2017
CASE WESTERN RESERVE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
We herby approve the dissertation of
Heather Ashely Voss-Hoynes
Candidate for the degree of Doctor of Philosophy*.
Committee Chair
Sudha K. Iyengar
Committee Member
William Bush
Committee Member
Barbara Lewis
Committee Member
Catherine Stein
Date of Defense
July 13, 2016
*We also certify that written approval has been obtained for any proprietary material
contained therein Table of Contents
List of Tables 3 List of Figures 5 Acknowledgements 7 List of Abbreviations 9 Abstract 10 CHAPTER 1: Introduction and Specific Aims 12 CHAPTER 2: Review of speech sound disorders: epidemiology, quantitative components, and genetics 15 1. Basic Epidemiology 15 2. Endophenotypes of Speech Sound Disorders 17 3. Evidence for Genetic Basis Of Speech Sound Disorders 22 4. Genetic Studies of Speech Sound Disorders 23 5. Limitations of Previous Studies 32 CHAPTER 3: Methods 33 1. Phenotype Data 33 2. Tests For Quantitative Traits 36 4. Analytical Methods 42 CHAPTER 4: Aim I- Genome Wide Association Study 49 1. Introduction 49 2. Methods 49 3. Sample 50 5. Statistical Procedures 53 6. Results 53 8. Discussion 71 CHAPTER 5: Accounting for comorbid conditions 84 1. Introduction 84 2. Methods 86 3. Results 87 4. Discussion 105 CHAPTER 6: Hypothesis driven pathway analysis 111 1. Introduction 111 2. Methods 112 3. Results 116 4. Discussion 119 CHAPTER 7: Exploratory pathway analysis 123 1. Introduction 123 2. Methods 124 3. Results 127 4. Discussion 135 5. Future Directions 141 CHAPTER 8: General Conclusions and Future Directions 143 Appendix A- Additional Materials for Chapter 3 146 1. Sample Ancestry 150 2. Power Calculations 151
1 Appendix B- Additional Materials for Chapter 4 154 1. Model Selection 154 2. Full GWAS Results 162 Appendix C- Addiitional Materials for Chpater 5 188 Appendix D- Additional materials for Chapter 6 210 Appendix E- Additional Materials for Chapter 7 211 Bibliography 220
2 List of Tables
Table 2.1 Phonological processes and age at which they decline. 19 Table 2.3 Loci from linkage studies. 27 Table 2.4 Genes associated with SSD 27 Table 2.5 Copy number variation associated with SSD 28 Table 2.6 Genes associated with comorbid conditions. 30 Table 2.7 Loci from linkage studies associated with comorbid conditions 32 Table 3.1 Tests used in current study and the phenotype interrogated 35 Table 3.3 Basic demographics of all individuals in the cohort as of February 2016 35 Table 3.4 Transformations of z-scores 39 Table 3.5 . Genotyping data for the current study. 40 Table 3.6 Chip characteristics summarized from Illumina documentation 40 Table 3.7. SNP quality control summary 41 Table 3.8 Individual quality control summary 41 Table 3.9. Significance threshold for HapMap. 45 Table 4.1 Test used in the analyses divided by endophenotype. 50 Table 4.2 Summary statistics for quantitative traits used in the analysis 51 Table 4.3 Correlation (R2) between the quantitative traits analyzed. 52 Table 4.4 Most significant marker for genes previously associated with SSD or childhood apraxia of speech 54 Table 5.1- Mean/median z-scores stratified by Language Impairment affection status (Model 1) 88 Table 5.2 Mean/median score stratified by Reading Disability affection status (Model 2) 88 Table 5.3 Mean/median scores stratified by all groups except SSD status 89 Table 6.1 Pathways of interest based on Aim I GWAS results 114 Table 6.2 Genes included in the FOXP2 and CANTNAP2 gene sets 115 Table 6.3 Significance of Aim I based pathways 117 Table 6.4 p-values for FOXP2 and CNTNAP2 networks 118 Table 6.5 p-values for Comorbid Condition Gene Sets 119 Table 7.1 Number of significant pathways for each trait 128 Table 7.2 Pathways shared by four or more traits. 129 Table 7.3 Pathways significant in GFTA and MSW or NSW 135 Table A1- Ancestry of the individuals who passed quality control. 150 Table B1. Lambda values for four models. 161 Table B2 Sample sizes with and without parents 161 Table C1. Top 20 loci for binary outcome after adjusting for LI and RD 199 Table C2. Top loci for Fletcher Time by Count after adjusting for LI and RD. 200 Table C3. Top 10 loci for Goldman Fristoe Test of Articulation after adjusting for LI and RD. 201 Table C4. Top 20 loci for Expressive One Word Picture Vocabulary Test after adjusting for LI and RD 202 Table C5 Top 20 loci for Peabody Picture Vocabulary test after adjusting for LI and RD 203
3 Table C6 Suggestive loci for Weschler Individual Achievement Test –Listening Comprehension after adjusting for LI and RD 204 Table C7 Top 20 loci for multisyllabic word repetition after adjusting for LI and RD 205 Table C8. Top 20 loci for nonsense word repetition after adjusting for LI and RD 206 Table C9. Suggestive loci for TWS after adjusting for LI and RD 207 Table C10. Top 20 loci for Word Attack after adjusting for LI and RD 207 Table C11 . Suggestive makers Word Identification after adjusting for LI and RD 208 Table C12. Most significant SNP in genes previously associated with SSD. 209 Table D1. Pathway Analysis- User defined pathways 210 Table E1. Significant pathways for articulation and motor control 211 Table E2. Significant pathways for language traits 211 Table E3. Significant pathways for phonology traits 214 Table E5. Significant pathways for spelling 216 Table E6 Pathways significant in 3 traits. 218
4 List of Figures
Figure 2.1- Consonants and age of acquisition 18 Figure 3.1 Overall study design and workflow 33 Figure 4.2 Manhattan plot- Fletcher Time by Count 57 Figure 4.3 Manhattan plot- GFTA 58 Figure 4.4 Manhattan plot- EOWPVT 59 Figure 4.5 Manhattan plot- PPVT 60 Figure 4.6 Manhattan plot- WIATLC 61 Figure 4.7 Manhattan plot- Shared between EOWPVT and PPVT 62 Figure 4.8 Manhattan plot- MSW 63 Figure 4.9 Manhattan plot- NSW 65 Figure 4.10 Manhattan plot- Shared between MSW and NSW 66 Figure 4.11 Manhattan plot- WRDATK 67 Figure 4.12 Manhattan plot- WRDID 68 Figure 4.13 Manhattan plot- Shared WRDATK WRDID 69 Figure 4.14 Manhattan plot- TWS 70 Figure 5.1- Conceptual model for the relationship between SNP effect, SSD quantitative trait, language impairment, and reading disability. 84 Figure 5.2 Basic workflow for Aim II. 86 Figure 5.3. Proportion of markers with p<1x10-5 in Aim I 90 Figure 5.4 Effects of adjusting for LI and RD –Fletcher Time by Count 92 Figure 5.5 Effects of adjusting for LI and RD – Goldman-Fristoe Test of Articulation 93 Figure 5.6 Effects of adjusting for LI and RD –Expressive One Word Picture Vocabulary Test. 95 Figure 5.7 Effects of adjusting for LI and RD Peabody Picture Vocabulary Test 96 Figure 5.8 Effects of adjusting for LI and RD Weschler Individual Achievement Test- Listening Comprehension subtest 97 Figure 5.9 Effects of adjusting for LI and RD Multisyllabic Word Repetition 98 Figure 5.10 Effects of adjusting for LI and RD Nonsense Word Repetition 99 Figure 5.11 Effects of adjusting for LI and RD Word Attack 101 Figure 5.12 Effects of adjusting for LI and RD Word Identification 103 Figure 5.13 Effects of adjusting for LI and RD Test of Written Spelling 104 Figure 6.1 Workflow for pathway analysis of genome-wide association results 113 Figure 7.1 Section of the KEGG Calcium signaling pathway 126 Figure 7.3 Classification of pathways significant in two or more traits 128 Figure 7.4 Interactions between significant pathways for language traits. 132 Figure 7.5 Pathways significant in both MSW and NSW 133 Figure 7.5 Shared pathways for reading traits 134 Figure 7.6 Interactions identified between significant spelling pathways 135 Figure A1 z-scores for Fletcher Time by Count and Goldman-Fristoe Test of Articulation 146 Figure A2 z-scores for PPVT and WIATLC 147 Figure A3 z-scores for MSW and NSW 147 Figure A4 z-scores for Word Attack and Word Identification 148 Figure A5 z-scores for Test of Written Spelling 149
5 Figure A6 Principal component plots 150 Figure A7 Power at various minor allele frequencies and effect estimates. 151 Figure A8 Effects of altering various parameters on power for binary outcome. 153 Figure B1. QQ plots for Articulation and Oral Motor Control 155 Figure B2. QQ plots for language endophenotypes 156 Figure B3. QQ plots for reading endophenotypes 157 Figure B4. QQ plots for spelling 158 Figure B5. Histograms of articulation and language traits 159 Figure B6. Histograms of phonology, reading, and spelling traits 160 Figure C1 Manhattan plots for adjusted BT Speech 188 Figure C2 Manhattan plots for adjusted Fletcher Time by Count 189 Figure C3 Manhattan plots for adjusted GFTA 190 Figure C4 Manhattan plots for adjusted EOWPVT 191 Figure C5 Manhattan plots for adjusted PPVT 192 Figure C6 Manhattan plots for adjusted WIATLC 193 Figure C7 Manhattan plots for adjusted MSW 194 Figure C8 Manhattan plots for adjusted NSW 196 Figure C9 Manhattan plots for adjusted WRDATK 197 Figure C10 Manhattan plots for adjusted WRDID 198 Figure C11 Manhattan plots for adjusted TWS 199
6 Acknowledgements
I am grateful to countless individuals for helping me through this process. Thank you to my advisor, Dr. Sudha Iyengar, for involving me in the project that would become this dissertation in my second semester of school. Also, thank you for buying into my ambition and not shooting down my timeline. To my committee members Drs. Will
Bush, Barbara Lewis, and Catherine Stein thank you for sacrificing your own time to help me by providing feedback, valuable insights, and helpful suggestions. You have all helped me grow as a thinker and questioner; for that, I am eternally grateful.
Thank you to Dr. Ralph O’Brien for encouraging me to pursue this degree and to
Dr. Mark Willis for his constant support, reminding me that I should never stop dancing, and that nothing is set in stone. Thank you to Dr. Rob Igo for his help with the QC process and answering my incessant questions. To Jeremy Fondran and Barb Truitt, thank you for introducing me to the data and being sources of information and ideas over the past three years. Thank you to families who participated in the study and Lisa
Freebarin, Jessica Tag, and others who painstakingly tested and scored each participant.
To all the administrators in the department, we would be lost without you. To
Alberto Santana, thank for maintaining Latitude. Cynthia Moore, thank you for always being available for a chat.
To my peers, especially Jessica, Noémi, and Yana, I will never forget the laughs
(and tears) we shared over the past few years. I am delighted that no one was seriously injured or ended up in a security alert, and I wish you all the best of luck in meeting your adulting goals.
7 To the Cavs, thank you for making June far less miserable than it could have been. #allin216
I also extend a most heartfelt thank-you to my fellow teachers and students at the
Murphy Irish Arts Center. My students’ joy and humor provided me invaluable perspective and helped me through some of the most challenging times.
Finally and above all, my most profound gratitude goes to my family. Words fail to express how fortunate I am to have them; without their unwavering love and support, I am certain completing this degree would have been impossible. And Kurt, I will always be a crusader for the humanities.
8 List of Abbreviations
EOWPVT Expressive One Word Picture Vocabulary Test GFTA Goldman-Fristoe Test of Articulation LI Language Impairment MSW Multisyllabic Word Repeition NSW Nonsense Word Repeition PPVT Peabody Picture Vocabulary Test RD Reading disability SSD Speech Sound Disorders TWS Test of Written Spelling Weschler Individual Achievement Test- Listening WIATLC Comprehension WRDATK Woodcock Reading Mastery- Word Attack WRDID Woodcock Reading Mastery Word Identification
9 Dissecting the Genetics of Human Communication: Insights into Speech, Language, and Reading
Abstract
By
HEATHER A. VOSS-HOYNES
Interpersonal communication is a vital component of everyday life which can be negatively affected by speech sound disorders (SSD). SSD affect articulation and phonological processes, are the most common type of communication disorder, and occur in 16% of three year olds. Despite the frequency with which they occur, SSD are relatively understudied compared to other communication disorders such as dyslexia and specific language impairment. SSD can occur due to craniofacial abnormalities, hearing loss, as a symptom of certain syndromes, or due to unknown causes. SSD of unknown cause are heritable with monozygotic twin concordance rates of 0.95, but the genetic basis is not well defined. Many previous studies have focused upon FOXP2, a gene harboring a causal mutation in one large family, or genes and loci associated with language impairment (LI) or dyslexia (RD), frequently comorbid conditions. The weakness of these approaches is they are self-limiting and cannot identify novel loci.
Consequently, it would be beneficial to address the etiology of SSD agnostically to identify novel loci and characterize the genetic architecture of what is likely a multifactorial disorder. To do so, data from the Cleveland Family Speech and Reading
Study, a longitudinal study of children with SSD, were used to perform the first known genome-wide association study on traits associated with SSD endophenotypes in a
10 sample ascertained based on speech sound disorder diagnosis. This analysis identified novel loci, replicated previous findings, and informed hypotheses regarding biological pathways that may be involve in SSD. To investigate the impact of LI and RD on genetic association with SSD endophenotypes, the changes in genetic effect estimates after adjusting for the conditions were analyzed. Some effects were unchanged by LI and RD status suggesting a foundational role of these loci in human communication. Finally a pathway analysis revealed similarities between SSD and other neuropathologies such as autism spectrum disorders and Alzheimer’s disease. This study represents a thorough examination of the genetic underpinnings of SSD and other communication traits, is the first genome-wide association study for SSD, and supports a multifactorial genetic architecture underlying both typical and atypical communication.
11 CHAPTER 1: Introduction and Specific Aims
Communication disorders cost an estimated $154 billion dollars1 annually in lost salaries, special education, and medical care (Ruben, 2000). The most common communication disorders, childhood speech sound disorders (SSD), occur in roughly
16% of preschoolers and persist past six-years-old in 3.8% of the population (Shriberg,
2002). SSD include aberrant articulation—the way speech sounds are produced—and disrupted phonological processes, vary from mild to severe, and can be comorbid with specific language impairment and reading disability (American Speech and Hearing
Association (ASHA, 2016); Peterson et al., 2009).
Known causes of SSD include hearing loss, otitis media, structural variation of the tongue and teeth, cleft lip and palate (asyndromic and syndromic), cerebral palsy, galactosemia, and syndromes such as Down syndrome (ASHA-Speech Sound Disorders,
2016). However, most causes of SSD are unknown. Though little is known about the latter group, they are heritable (monozygotic twin concordance rate=0.95-0.97, dizygotic= 0.22) (Lewis & Thompson, 1992; Bishop, 2002), a reality motivating genetic studies of SSD.
While comorbid conditions have been well characterized, there have not been extensive studies of SSD genetics. The most well-known study of SSD genetics was conducted on a family segregating apraxia, a severe form of SSD, and identified a point mutation in FOXP2 as the causal mutation (Lai et al., 2001). Based on those results,
FOXP2 became the focus of SSD research (Feuk et al., 2006; Lennon et al., 2007).
Studies not focusing on FOXP2 concentrated on loci previously linked to dyslexia.
1 In 2000 amounts. There have been no follow up studies since the original study by Ruben in 2000.
12 Focusing on regions previously linked with dyslexia, researchers linkage with SSD association on chromosomes 1, 2, 3, and 15 (Stein et al., 2004; Smith et al., 2005;
Miscimarra et al., 2007). More recently, agnostic sequencing studies on small samples identified variants within various genes such as CNTNAP2, KIAA0319, and SEXT (Laffin et al., 2012; Worthey et al., 2013).
Our understanding of SSD genetics remains fragmented and confined to studies of related phenotypes or single case reports. The ultimate motivation for this dissertation is to characterize the genetic architecture of speech sound disorders in a cohesive manner through both agnostic and hypothesis driven approaches. This work represents the first genome wide association study for SSD, the results of which will lead to hypotheses for future research. The aims of this dissertation are:
One (Chapter 4): To conduct the first genome-wide association study of speech sound disorder and identify variants associated with quantitative measures of SSD endophenotypes.
To our knowledge, this will be the first genome-wide association study conducted
on individuals ascertained based on SSD affection status. In addition to
identifying novel loci, this aim will also generate data for the remaining aims.
Two (Chapter 5): To explore the relationship between commonly comorbid conditions and genetic effects by examining changes in effect estimates after accounting for LI and
RD in a genome wide association study.
In Aim 1 we will not account for comorbidity affection status. It is possible that
the genetic effects from Aim I are confounded by RD and LI affection status
especially given that previous research has identified shared genetic components
13 of SSD and RD and SSD and LI (Stein et al., 2004; Smith et al., 2005; Rice et
al., 2009), it is possible. If adjusting for these comorbidities does not alter the
genetic effect at a certain loci, those loci may be a component of communication
skills.
Three: To account variants of marginal significance and perform pathway analysis to
a. Test for enrichment of association signal in pathways based on the results of Aim
I as well as gene sets associated with comorbid conditions (Chapter 6). In Aim 1,
we will identify suggestive loci that we an cluster into potentially biologically
meaningful groups. Additionally, we place our results in the context of previous
work by testing for enrichment of FOXP2 network and gene sets previously
associated with LI and RD.
b. Classify the spectrum of association signals into biologically meaningful
pathways (Chapter 7). This analysis will classify nonsignificant association
signals into biologically relevant Kyoto Encyclopedia of Genes and Genomes
(KEGG) pathways and will be a step toward a more cohesive understanding of the
genetic basis of SSD as well as typical speech and communication.
This dissertation marks the first known genome-wide association study of speech sound disorders and will characterize the genetic basis of SSD in a comprehensive manner.
14 CHAPTER 2: Review of speech sound disorders: epidemiology, quantitative
components, and genetics
1. Basic Epidemiology
Communication, “the process by which information is exchanged (speaking, writing, semaphore etc.),” is vital to human life, and is disrupted by communication disorders (Williams, 2012). Such disorders cause “impairment in the ability to receive, process, represent, or transmit information…specifically speech, language, or hearing”
(Williams, 2012). Prevalence estimates vary, but according to recent data from the
National Center for Health Statistics, the prevalence of all communication disorders is
7.7% in children 3–17. In general, boys are significantly more likely than girls to be affected with communication disorders (9.6% vs 5.7%) as are non-Hispanic black children compared to non-Hispanic white and Hispanic children (9.6%, 7.8%, and 6.9% respectively) (Black et al., 2015). Of the affected children, speech problems were most common, accounting for 41.8% and 24.4% of all communication disorders in children 3–
10 and 11–17, respectively (Black et al., 2015).
Speech sound disorders (SSD) are heterogeneous and include disorders of articulation—sound production—and/or phonology—the organization of sounds in a language (ASHA- SSD Overview, 2016). Articulation and phonology will be discussed in detail subsequently (p. 17). SSD of unknown causes occur along a continuum from mild, which resolve, to severe, such as childhood apraxia of speech, which can persist into adulthood (Lewis et al., 2011). An important consideration of speech sound disorders is the temporal component. It should also be noted that SSD are not differences in pronunciation due to dialect.
15
a. Comorbid conditions
SSD can occur in isolation or with comorbidities. 6–21% of children with SSD also have receptive language disorders, 38–62% have expressive language disorders, and
25–30% have a reading disability (Peterson et al., 2009). Miscimarra et al. also described that the odds of finding LI in individuals with SSD was 10 times greater than finding isolated SSD (Miscimarra et al., 2007). Though all children with SSD do not have comorbid language or reading impairment, the relatively high prevalence of comorbidities has led to discussions of shared etiologies of SSD, specific language impairment (LI), and RD, a possibility that is considered in this work. b. Causes of Speech Sound Disorders
There are two types of SSD, those with known causes and those with unknown causes. Known causes include craniofacial abnormalities such as cleft lip and palate, malformed teeth, overbite or underbite, and macroglossia. Additionally, cerebral palsy and syndromes characterized by severe intellectual disability can lead to speech sound disorders (Shprintzen, 1997; Shprintzen, 1999). The American Speech-Language-Hearing
Association uses the following definitions:
Speech sound disorders is an umbrella term referring to any combination of
difficulties with perception, motor production, and/or the phonological
representation of speech sounds and speech segments (including phonotactic
rules that govern syllable shape, structure, and stress, as well as prosody) that
impact speech intelligibility. (ASHA-SSD Overview, 2016)
16 2. ENDOPHENOTYPES OF SPEECH SOUND DISORDERs
While tests of articulation and phonology are used to diagnose SSD, there are also other testable cognitive skills associated with SSD including receptive and expressive language, reading, and spelling. Analysis of these skills, also known as endophenotypes, allows for refinement of a binary classification to a more precise trait (Gottesman and
Gould, 2003). Such refinement is ideal for genetic analyses of complex traits where phenotypic heterogeneity can obscure genetic associations. Phonological memory, phonological awareness, and vocabulary abilities distinguish between SSD severity levels on a phenotypic level (Lewis et al., 2012). Therefore we leveraged this narrowing of the phenotype in hopes of identifying genetic variants associated with each trait. The same endophenotypes are also relevant to the commonly comorbid conditions (Rvachew, 2007;
Lewis et al., 2011; Stein et al., 2014), a fact that will allow us to put our results into a broader context. a. Articulation
Articulation is the motor component of speech and describes how sounds are made. Complete details regarding articulation are described in Bernthal, Bankson, and
Flipsen (2013); a basic, simplified explanation is that the English language consists 44 phonemes 18 of which are vowels (Bernthal, Bankson, and Flipsen, 2013). Precise motor control is necessary to position the jaw, tongue, and lips correctly to produce sounds
(Figure 1). Vowels are described by the position of the lips, rounded (as in why) or unrounded (as in hi) and location of the tongue in the mouth. Consonants are described by the placement of the lips and tongue and the closure of the oral cavity known as manner. For example, /p/ in pet or /b/ in bat is characterized by complete closure of the
17 oral cavity followed by a release or closure; the sound is bilabial stop. The word
sequence pie, why, vie, thigh, tie, shy, guy, and hi exemplify the differing places of
consonant articulation, from front of the mouth to back, and the motor control necessary
to accurately produce the sounds (sequence from Bernthal, Bankson, and Flipsen, 2013).
There are ages by which children are expected to master these sounds, and departure
from these norms often results in referral to speech language pathologist (ASHA, 2016)
(Figure 2.1).
/ʒ/ beige /ð/ the /θ/ think /v/ very /ʤ/ jam /z/ zoo /ʃ/ shop /ʧ/ chop /s/ sorry /‐l/ heel /l‐/ long /r/ red /j/ yellow /‐f/ leaf /f‐/ fall /ŋ/ ping /t/ top /d/ dot /g/ go /k/ car /b/ book /w/ win /n/ not /h/ hot /m/ mat /p/ pop 0 2 4 6 8 10 Age
Figure 2.1 Consonants and age of acquisition. Figure developed by Williamson, 2010 using data from Sander, 1972; Grunwell, 1981; and Smith et al., 1990
Figure 2.1- Consonants and age of acquisition
18 b. Phonology
Phonology is a linguistic study of how speech sounds are organized in a given
language and is often considered to be the cognitive component of language.
Phonological processes, awareness, and memory are relevant to SSD (Bernthal, Bankson,
and Flipsen, 2013).
Phonological processes: As part of the normal process of language acquisition, children
use phonological processes (sometimes referred to as phonological patterns) to simplify
speech. When using consonant harmony, for example, an individual produces consonants
in a word the same (dog dod). As children mature, the use of processes decreases until
eventually, children speak like adults (Table 2.1). Children who continue to use the
processes past the expected age would likely be referred to an SLP. In addition to using
processes past the normal age, children with phonological disorders may use uncommon
processes such as deleting initial consonants, backing stops (tub kub), or any processes
involving vowels (vowel backing and lowering bird bad) (Bauman-Waengler, 2012).
Table 2.1 Phonological processes and age at which they decline. Adapted from Bernthal, Bankson and Flipsen, 2013 and Grunwell, 1982
Group Process Definition Example Declines Assimilation Consonant harmony one sound becomes similar to dog dod 3;0 another in the same word
Substitution Fronting Velars pronounced as sounds Car tar 3;6 produced father forward in the
mouth
Backing (not common) Dog gog 3;0
Gliding Liquids /l/,/r/ are replaced by rabbit wabbit 5+ /w/,/j/
Group Process Definition Example Declines Substitution Depalatization Palatal sounds are pronounced as Fish fis 2;6, 4;0 sounds produced further forward
deaffrication Affricates pronounced as Church Sursh 2;6
19 fricatives
Syllable Processes that affect syllable structure structure Final consonant Deletion of the final consonant Dog do 3;0 deletion
Cluster simplification Deletion of on element of a Plane pane 3;6 or reduction cluster
Weak syllable Deletion of an unstressed Banana nana 3;6, 4 deletion syllable
Phonological awareness is an understanding and ability to analyze and manipulate the sound structure of speech (Bernthal, Bankson, and Flipsen 2013). Measureable components of phonological awareness include the ability to recognize rhyme, recognize and segment syllables, and manipulate phenomes (Williams, 2013; Bernthal, Bankson, and Flipsen 2013). For example, a child should be able to hear /b/ + /ell/ and say /bell/.
Table 2.2 Components of phonological awareness and age at which a related task/interrogation is mastered. Component Example Age % children mastering skill 2–3 50% Rhyme matching 4–5 90% Rhyme production 3 35% 4 50% Syllable counting How many syllables are in puppy? 5 90 Phoneme awareness (segmentation Pond /p/, /α/, /n/, /d/ 6–7 >90% analysis/elison) Phoneme awareness /p/, /α/, /n/, /d/ pond (blending/synthesis)
Phonological memory is the storage of auditory phoneme information in short-term memory so that it can be manipulated (ASHA-SSD Assessment, 2016). Some children with phonological memory deficits experience impairments in developing written and spoken vocabularies (Gathercole & Baddeley, 1990).
20 Phonological representations is the mental representation of sound and their combinations that comprise words. i. Tests of Phonology
Multisyllabic word repetition (MSW): This test requires children to accurately sequence phenomes by repeating multisyllabic words. Target words include aluminum, thermometer, sympathize (Catts, 1986). The test is scored by determining the percentage of words repeated correctly.
Nonsense word repetition (NSW): This test requires children to encode the information they hear and then repeat it. Phonological encoding results in conversation of what is heard to phonetic representations and formation of an articulatory plan to repeat it; in encoding would result in mistakes in word repetition (Levelt, 2002; Kamhi & Catts,
1986). An example of a nonword is rəbesɪt. It is scored by determining the percentage of words repeated correctly. c. Language
Broadly, language is the use and comprehension of a spoken, written, or other symbol system (e.g. sign language) and is both receptive and expressive (ASHA, 2016).
Receptive language is the ability to understand what is said while expressive language is the ability to formulate thoughts into words. Language includes five domains: phonology, morphology, syntax, semantics, and pragmatics (ASHA, 2016). Morphology is a study of the way the smallest meaningful units of language are combined (i.e. grammar), syntax is the study of sentence structure, semantics examines meaning, and pragmatics involves the social component of language (ASHA, 2016). Children with
21 childhood apraxia of speech and SSD+LI score more poorly on vocabulary measures than do unaffected children (Lewis et al., 2012). d. Reading
Reading is a receptive written language. One theory regarding reading is the dual route theory that stipulates that reading occurs via two mechanisms. In the direct route words are immediately recognized and understood; in the indirect, phonological route words must be broken apart or sounded out (Seigel, 2006). The latter requires breaking words into component parts (phonological processing), and associating letters with sounds (phonological awareness) (ASHA, Seigel, 2006). 25–30% of children with SSD also have a reading disability (Peterson et al., 2009) and one group proposes that difficulties with phonological representations may contribute to both speech sound disorders and reading deficits (Anthony et al., 2011). e. Spelling
Like reading, spelling demands phonological awareness because to spell successfully an individual must understand how phonemes (sounds) are represented by symbols (letters).
3. EVIDENCE FOR GENETIC BASIS OF SSD
Prior to embarking on genetic based studies of any disorder, it is necessary to establish that there is, indeed, a genetic basis. For speech sound disorders, twin and family aggregation studies support the existence of a genetic etiology. Morley provides the first evidence of a genetic basis of SSD when he describes that of 12 families in which the proband had childhood apraxia of speech, 50% also had parents or siblings with SSD (Morley, 1967). A longitudinal study of development found that the children of
22 individuals who had phonological disorders in elementary school scored more poorly on articulation tests than children of unaffected individuals (Felsenfeld et al., 1995). Lewis et al. report that in a cohort (cohort for this dissertation) of families ascertained based on
SSD 26% (18.3% fathers, 40.9% brothers, 19.4% sisters, 18.2% mothers) of nuclear family members and 13.6% of extended family members were also affected with SSD
(Lewis et al., 1992). A separate study reports concordance rates of 0.95 and 0.22 for monozygotic and dizygotic twins, respectively (Lewis and Thompson, 1992). Bishop reports narrow sense heritability, the phenotypic variation explained by additive genetic components, of 0.97 (2002). Though the diagnostic criteria for SSD may have changed since the time of these studies, they illustrate that at least some portion of SSD is heritable and validate genetic studies.
4. GENETIC STUDIES OF SSD
For a brief review, see Tables 2.3–2.5. The first genetic study of speech sound disorder was based on a multigenerational family in England, the KE family, affected with apraxia of speech. Following identification of a linkage peak on 7q31.1 by Fisher et al. (1998), Lai et al. identified a single, causal mutation in FOXP2 leading to heavy focus upon FOXP2 as the basis of speech sound disorders. The gene is an evolutionarily conserved transcription factor that is highly expressed in the brain during development
(Enard et al., 2002). The finding spurred research on other species. In songbirds, early work found FOXP2 is differentially expressed in the brain during periods of song learning, indicating it may have a role in vocal learning (Webb and Zhang, 2005); later experiments using knockout zebra finch models confirmed these findings, as knockout animals were unable to learn songs accurately (Hueston and White, 2015). In mice, there
23 is evidence of a role of FOXP2 in communication; homozygous FOXP2 knockout leads to decreased ultrasonic vocalization compared to wild type controls (Shu et al., 2005).
Given these realities, FOXP2-related research dominated the field for a period of time. a. FOXP2 and SSD
In a 2006 study, Feuk et al. examined the FOXP2 region in 13 individuals with childhood apraxia of speech and identified structural anomalies in all individuals. In their sample, the affected individuals had either maternal uniparental disomy or a deletion of the paternal copy of FOXP2, suggesting a parent-of-origin effect for SSD. It should, however, be noted that 7 individuals had Silver-Russell Syndrome, and 2 had Autism, situations making it challenging to disentangle the effect of speech phenotypes on other syndrome phenotypes.
An exome sequencing study of 24 individuals with apraxia of speech identified structural anomalies—there were 16 unique copy number variations (CNVs) identified in
12 individuals (Laffin, 2012). Although the pathogenicity of the CNVs was not clear, the authors explain that the CNVs included gene families (ALG, BAG, CCDC, CDC<
EXOSC, MAP, PDE, RAB, TMEM, and SFP) that have been associated with neurite outgrowth making them plausible candidate regions for SSD. A case study of a child with apraxia also identified a CNV on 7q31 including FOXP2. In addition to apraxia, the child had fine and gross motor control deficits (Lennon et al., 2007). b. Other, non-FOXP2 based studies
Studies not focused upon FOXP2 can be divided into two groups; they are linkage analyses using region associated with dyslexia and hypothesis-generating-agnostic studies. The latter group consists mostly of case reports or case series.
24 1. Hypothesis driven analyses
Although FOXP2 has been a consistent focus in the SSD literature, studies have also identified other loci. In 2004, Stein et al., conducted a linkage study within a dyslexia candidate region on chromosome 3, DYX5, and identified linkage between the region and a phonological factor score based on the scores of multisyllabic and nonsense word repetition. This finding indicates that dyslexia and SSD have a shared genetic basis.
In further support of pleiotropy of the region, Stein et al., found independent effects of the region on both multisyllabic word repetition (MSW) and nonsense word repetition
(NSW). Also exploring the hypothesis that reading delay and SSD have a shared genetic basis, Smith et al. (2005), tested three dyslexia susceptibility regions—1p36(DYX8),
6p22.2 (DYX3), and 15q21(DYX1)—for linkage with SSD. The group reported linkage with GFTA scores and the log odds of affection with a speech disorder on chromosome 6.
They also identified linkage on chromosome 15 with nonsense word repetition, GFTA, and percent consonants correct (Smith et al., 2005). This region on chromosome 15 is especially interesting in relation to speech because it is the Prader Willi/Angelman locus.
These syndromes result in poor oral motor skills and poor speech development, respectively (Cassidy and Schwartz, 1998). Additionally, the region has been associated with autism which is characterized by delayed speech (Pinto et al., 2010). Stein et al. further investigated DYX1 and did not identify linkage with SSD; however, the group did identify linkage (SSD as binary trait and between and repetition of single syllable) slightly upstream at 15q14 (Stein et al., 2006).
In the final study investigating linkage between a dyslexia region and SSD,
Miscimarra et al. found suggestive evidence among DYX8 (1p36), verbal short term
25 memory, and language comprehension (Miscimarra et al., 2007). These findings suggest a pleiotropic effect of the region.
Most recently, Stein et al. (2014) explored the hypothesis that the neural genes
DRD2, a dopamine receptor; AVPR1A, an arginine-vasopressin receptor; and ASPM, a microcephaly gene, are associated with SSD. By performing association analyses with genotyped SNPs, the group identified association among AVPR1A and phonological memory (measured by nonsense and multisyllabic word repetition), reading decoding (as measured by the Word Identification and Word Attack subtests of the Woodcock Reading
Mastery Tests- Revised), and both receptive and expressive vocabulary (measured by
Peabody Picture Vocabulary Test- 3rd edition and the Expressive One Word Picture
Vocabulary Test- Revised). DRD2 was associated with phonological memory (measured by NSW, MSW), and ASPM was associated with receptive language (measured by
Peabody Picture Vocabulary Test) and reading decoding.
2. Agnostic studies
An exome sequencing study of 10 apraxic children identified potentially pathogenic variants in CNTNAP2, KIAA0319, FOXP1, SETX (Worthey et al., 2013).
There was no single variant shared among all 10 children, so it is difficult to make any conclusions regarding the pathogenicity of the variants. However, the genes have associated with related phenotypes (Tables 2.6 and 2.7).
A separate study using an Affymetrix genome-wide copy number variation(CNV) array on 7 children with childhood apraxia of speech free of SLI and of 8 children with specific language impairment alone revealed a deletion of CNTNAP2 in two children with apraxia but not in any children with SLI (Centanni et al., 2015). The authors suggest that
26 these findings indicate that previous associations between CNTNAP2 and LI/Dylsexia could have been due to comorbid motor speech problems.
Table 2.3 Loci from linkage studies. Linkage peak Study 1p34-p36 Miscimarra et al., 2007 3p12-q13 Stein et al., 2004
6p22.2 Smith et al., 2005 15q21 Smith et al., 2005 15q14 Stein et al., 2006
Table 2.4 Genes associated with SSD Gene Study type Mutation Study ASPM Targeted genotyping Stein et al., 2014 ATP13A4 Exome sequencing g.1938A>T, ATPase highly Worthey et al., 2013 (n=10) p.Glu646Asp expected in language centers of brain AVPR1 Targeted genotyping Stein et al., 2014
CNTNAP1 Exome sequencing p.Arg1064Gln, Formation and Worthey et al., 2013 (n=10) c.3191G>A maintenance of neural cell contact. Previously associated with
CNTNAP2 Exome sequencing p.arg171Cys; 3 Language delay, Laffin et al., 2012; (n=10) nucleotide intellectual Worthey et al., 2013 insertion near disability, splice site stereotypies of autism, specific language impairment DRD2 Targeted genotyping Stein et al., 2014
Gene Study type Mutation Study FOXP1 Exome sequencing SNP, Neural Worthey et al., 2013 (n=10) p.Ile107Thr development; previously associated with autism, language delay,
FOXP2 Transcription Lai et al., 2001
27 factor
KIAA0319 Exome sequencing p.Ala311Thr, Adhesion between Worthey et al., 2013 (n=10) c.931G>A neurons. Previously associated with SLI and dyslexia SETX Exome sequencing SNP, Previously Worthey et al., 2013 (n=10) p.Lys992Arg, associated with g.2975A>G) oculomotor apraxia type 2
Table 2.5 Copy number variation associated with SSD Locus Type Genes Trait Original Study 1q25.1 Deletion Delayed Centanni et al., 2015 language 2q31 Deletion Deletion of DLX1, Craniofacial Laffin et al., 2012 DLX2 patterning and forebrain development ITGA6 Cell-surface signaling
RAPGEF, HAT, Memory MAP1D,PDK1, retrieval and AL157450,CGEF2, synapse ZAK, CDCA7, remodeling MLK7-AS1 PDE11A Regulation of brain function 2q24 Deletion UPP2, CCDC148, Laffin et al., 2012 PK4P, AK126351 2p14 Deletion SPRED2 Laffin et al., 2012
4p15.1 Duplication Laffin et al., 2012
5q34 Deletion Cleft lip, Centanni et al., 2015 depressed nasal bridge, microcephaly
6p12.1 Duplication DST, BEND6, Carpenter Laffin et al., 2012 ZNF451, BAG2, syndrome RAB23, PRIM2
Locus Type Genes Trait Original Study 7q22- karyotyping Deletions of the All include Feuk et al.2006 7q36 of FOXP2 FOXP2 locus or FOXP2 and flanking maternal UPD regions in 13 patients with apraxia
7q31.1- Case report Deletion Also has severe Lennon et al., 2006
28 7q31.31 developmental delay
7q31.1- Deletion Hemizygous Includes Zilina et al., 2012 7q31.2 maternally inherited FOXP2 and has deletion
7q31.1- Deletion Hemizygous Zilina et al., 2012 q31.31 maternally inherited deletion
12p12.3 Deletion Language delay, Centanni et al., 2015 dysmorphic features and hypotonia
13q13.3 Duplication RFXAP, SMAD9, Laffin et al., 2012 ALG5, EXOSC8,
14q23.2 Deletion Laffin et al., 2012
15q21.2 Deletion Abnormal facial Centanni et al., 2015 shape, hypotonia 16p11.2 Deletion Laffin et al., 2012
16p13.2 Deletion ABAT, TMEM186, ABAT Laffin et al., 2012 PMM2, CARSHP1, deficiency USP7 psychomotor retardation, hypotonia, hyperflexia, lethargy, seizures
17q23.2 MS12 Expressed in Laffin et al., 2012 neuronal precursor cells c. Studies of comorbid conditions
As previously discussed and due to similar deficits, one can hypothesize that genes contributing to dyslexia and reading may contribute to speech sound disorders (or vice versa). Dyslexia is described by four cognitive components: orthographic processing, phoneme awareness, rapid automatized naming, and phonological short term
29 memory (Carrion-Castillo et al., 2013) which are similar to those involved in SSD with the exception of orthographic processing. Children with LI may have delayed phonological development in addition to grammatical, expressive language, and receptive language difficulties (Berkson, Bankson, and Flipsen, 2013). Additionally the conditions are often comorbid, and regions associated with dyslexia have been shown to be associated with SSD (Stein et al. 2004, 2005, Misimarra et al., 2007; Smith et al., 2005).
Consequently, because there may be shared etiology apart from that already described, a summary of genetic studies of SLI and RD will be provided (Table 2.6-2.7) but not discussed in detail.
Table 2.6 Genes associated with comorbid conditions. SLI= Specific language impairment Gene Phenotype Study ACOT13 Dyslexia Deffenbacher et al., 2004 ABCC13 SLI Luciano et al., 2014 ATP13A4 SLI, ASD Kwasnicka-Crawford et al., 2005 ATP2C2 SLI Newbury et al.2009, Newbury et al., 2011 Dyslexia Newbury et al., 2011 BDNF SLI Simmons et al.2010 CCDC136 SLI Giallusi et al., 2014 CFTR SLI O’Brien et al., 2003 CMIP SLI Newbury et al., 2009, Newbury et al.2011 Dyslexia Scerri et al., 2011 CNTNAP2 Dyslexia Newbury et al., 2011; Peter et al., 2011 SLI Vernes et al., 2008; Newbury et al., 2011 Delayed speech Al-Murrani et al., 2012 CYP19A1 Dyslexia Anthoni et al., 2012 DCDC2 Dyslexia Deffenbacher et al., 2004; Harold et al., 2006; Schumacher et al., 2006; Newbury et al., 2011; Scerri et al., 2011; Lind et al., 2010; Zhong et al., 2013 (meta analysis) SLI Rice et al., 2009 DOCK4 Dyslexia Pagnamenta et al., 2010 DRD2 Stuttering Lan et al., 2009 DYX1C1 Dyslexia Taipale et al., 2003; Scerri et al., 2004, Wigg et al., 2004, Brkanac et al., 2007, Marino et al., 2007, Dahdouh et al., 2009, Lim et al., 2011, Paracchini et al., 2011; Mascheretti et al., 2013 SLI Newbury et al., 2011
30 FOXP1 SLI Hamdan et al., 2010 FOXP2 Dyslexia Peter et al., 2011 SLI Rice et al., 2009 GCFC2 Dyslexia Anthoni et al., 2012 (conflicting evidence) SLI Scerri et al., 2011 GNPTAB Stuttering Kang et al., 2011 GNPTG Stuttering Kang et al.2011 GPLD1 Reading Meng et al., 2005 disability
Gene Phenotype Study KIAA0319 Dyslexia Deffenbacher et al., 2004; Cope et al., 2005; Harold et al., 2006; Ludwig et al., 2008; Dennis et al., 2009; Newbury et al., 2011; Scerri et al., 2011; Venkatesh et al., 2013 Reading Meng et al., 2005 disability SLI Rice et al., 2009; Newbury et al., 2011; MRPL19 Dyslexia Anthoni et al., 2007
SLI Scerri et al., 2011 *According to Carrion-Castillo et al., 2013 this gene may be related to general cognition rather than specifically reading and language NAGPA Stuttering Kang et al., 2010 NDST4 SLI Eicher et al., 2013 NRSN1 Dyslexia Deffenbacher et al., 2004 NOP9 SLI ROBO1 Dyslexia Hannula-Jouppi et al., 2005; not replicated by Venkatesh et al., 2013 NWR (in Bates et al., 2010 unaffected individuals) SETBP1 SLI Filges et al., 2011; Marseglia et al., 2012 SRPX2 Oral dyspraxia Roll et al., 2006 and seizure THEM2 Pinel et al.2012; Cope et al., 2012 TTRAP Deffenbacher et al., 2004 TDP2 Dyslexia Deffenbacher et al., 2004; Luciano et al., 2007
VMP Dyslexia Deffenbacker et al., 2004
31
Table 2.7 Loci from linkage studies associated with comorbid conditions, SLI= Specific language impairment Location Gene Phenotype Author 1p34-p36 KIAA0319 Grigorenko et al., 2001; Tzenova, Kaplan, Petryshen, & Field, 2004; de Kovel et al., 2008; Rice et al., 2009; Rice et al., 2008 2q36.3 TM4SF20 SLI Wiszniewski et al., 2013 6q11.2–q12 Dyslexia Petryshen et al., 2001 7q31–7q36 SLI Monaco et al., 2007 13q21 SLI Bartlett et al., 2002 16q23-24 CMIP, SLI SLI Consortium, 2002; SLI consortium 2004 ATP2C2
18p11.2 MC5R, Dyslexia Fisher et al., 2002; Bates et al., 2007; Seshadri DYM, et al., 2007; Poelmans et al., 2011; Scerri et NEDD4L, al., 2010 and VAPA 19q13.13- SLI SLI Consortium 2002 12.41 Xq27.3 FMR1 de Kovel et al., 2004; Platko et al., 2008; Huc- Chabrolle et al., 2013
5. LIMITATIONS OF PREVIOUS STUDIES
To date, most speech sound disorder related research has focused primarily on
FOXP2 or performed linkage analyses on region previously association with comorbid
conditions. There are shared genetic components between SSD and dyslexia (Stein et al.,
2004, Stein et al., 2006; Miscimarra et al., 2007); however, it has yet to be determined in
a cohort ascertained specifically for SSD, if there are unique genetic components.
Additionally, agnostic studies have been limited by small sample sizes with the
maximum being n=24, making it difficult to draw conclusions regarding the
pathogenicity of variants.
This dissertation will addresses these limitations by performing a genome-wide
association study using 721 ascertained for speech sound disorder and using pathway
analyses to simplify/make sense of seemingly disparate results.
32 CHAPTER 3: Methods
This chapter will discuss phenotypic and genetic data collection and quality control methods that are relevant to all remaining chapters, the basic statistical theory for each aim, and the software chosen to address the research questions. Chapters 4, 5, 6, and 7 will address any issues specific to the aim discussed therein. The overall study design and workflow is outlined in Figure 3.1.
Figure 3.1 Overall study design and workflow Figure 3.1 Overall study design and workflow
1. PHENOTYPE DATA
1.1 Overall Study design
The data are from a longitudinal study of SSD in which 4-6-year-old children with SSD were referred by speech-language pathologists in Northeast Ohio. Families are ascertained through a proband diagnosed with speech sound disorder. Diagnosis is based
33 on a score at or below the 10th percentile on the Goldman-Fristoe Test of Articulation
(GFTA) (Goldman and Fristoe, 1986) and on the production of at least three errors on the
Khan-Lewis Phonological Analysis test (KLPA) (Khan and Lewis, 1986) (Chapter 2; p.
17). Additionally, to eliminate the possibility of SSD due to other comorbidities, children
must have normal hearing, normal peripheral speech mechanism (z-score within 1
standard deviation of the normative reference on the Total Function and Total Structure
subscales of the Oral and Speech Motor Control Protocol) (Robbins and Klee, 1987), an
IQ>80 on the Wechsler Preschool and Primary Scale of Intelligence (methods adapted
from description by Stein et al., 2004). Probands and their siblings are given a battery of
tests (Table 3.1) described to measure endophenotypes of speech sound disorders.
Family history of speech, language, and reading disorders as well as psychiatric disorders
were also collected. For an example pedigree see Figure 3.1.
Figure 3.2 Example family participating in the study. Colored boxes indicate affection status. 515 is the proband affected with speech, language, and reading impairments. Through him, his siblings 516- 518 were recruited into the study. His father, 561, has or had reading disability and his mother is or was affected with speech and reading impairments. A=Affected, U=Unaffected 3.2
34 Table 3.1 Tests used in current study and the phenotype interrogated (Adapted from Lewis et al., 2005) Articulation Goldman-Fristoe Test of Articulation1,2 Khan-Lewis Phonological Analysis1 Conversational speech sample1,2 Phonology Comprehensive Test of Phonological Processing Nonsense Word Repetition Test1,2 Multisyllabic Word Repetition1,2 Speech Error Phrases1,2 Semantic/Syntactic Measures Test of Language Development-Primary 21 (TOLD-P2) Clinical Evaluation of Language Fundamentals-32 (CELF-P) Written Language (7-12 years old only) Woodcock Reading Mastery Test 2 (WRMT) WAIT- Reading Comprehension2 Test of Written-Spelling2 Nonverbal intelligence Weschler Preschool and Primary Scale of Intelligence-Revised or (WPSI) Weschler Intelligence Scale for Children- 3rd Edition subtests (WISC) Oral Motor Measures Oral and Speech Motor Control Protocol1 Fletcher Time-by-Count2 1 Administered to 4-7 year olds 2 Administered to 7-12 year olds
Table 3.3 Basic demographics of all individuals in the cohort as of February 2016 n (%) Basic Information 1732 (total) Male 960 (55) SSD=1 418 (31.4) Language=1 607 (36.7) Reading=1 299 (18.1) Family Information Number families 416 Siblings of proband 519 Parents 702 Grandparents 21
35 2. TESTS FOR QUANTITATIVE TRAITS
2.1 Articulation and motor control i. Fletcher Time by Count (Fletcher 1972): Examines the mechanical limit of speech production. According to the developer, the speech mechanism is similar to a machine – it has weights, levers (mandible and hyoid), and devices that produce sound (muscles and nerves), and consequently, there must be a mechanical limit. Fletcher argues that this limit is the rate at which the structures can perform. (Fletcher, 1972): Individuals repeat single syllables such as /pʌ/ and multiple syllables such as /pʌtəkə/as many times as possible in 20 seconds. The test was normed on 384 school aged children, but we use raw scores in the development of our z-scores. ii. The Goldman-Fristoe Test of Articulation (Goldman and Fristoe, 1986) is a series of pictures with target words that tests children’s ability to produce 39 sounds of the English language in various locations of the word (initial, medial, final) (Goldman and Fristoe,
1986). For example, a card may have an image of a yellow duck that says quack which requires the examinee to produce an initial /j/ (as in yellow), /d/, /kw/, and a final /k/.
The test is scored by counting the total number of errors which can be converted to a standard score developed based on test results of 1,723 females and 1,798 from 2;0 to
21;11.
2.2 Language i. Expressive One Word Picture Vocabulary Test (EOWPVT) (Garner, 1990)- Expressive language is the ability to verbalize thoughts. In this test, individuals are required to name a target object, action, or idea that is illustrated in a picture easel. The test was standardized on individuals 2-80 years old.
36 ii. Peabody Picture Vocabulary Test (PPVT) (Dunn and Dunn, 1997) is a test of receptive vocabulary that requires the examinee to select one image that best matches the stimulus from a group of four. The test was standardized on individuals 2 to 90. iii. Weschler Individual Achievement Test-Listening Comprhension subtest – (WIATLC)
(Weschler, 2011): This test assesses receptive language in two ways. The first is a picture easel test like PPVT. In the second, the examiner tells brief stories then asks the examinee to explain why something is important, to remember specific details, and to develop hypotheses about the story. This is a more practical test than EOWPVT because it involves syntax.
2.3 Phonology i. Multisyllabic word repetition (MSW): This test requires children to accurately sequence phenomes by repeating multisyllabic words. In order to complete this task, individuals must encode phonologic information with target words such as aluminum, thermometer, sympathize (Catts, 1986). The test is scored by determining the percentage of words repeated correctly. ii. Nonsense word repetition (NSW) (Kamhi & Catts, 1986): This test requires children to encode the information they hear and then repeat it; deficits in encoding would result in mistakes in word repetition. An example of a nonword is /rəbesɪt/. The test is scored by determining the percentage of words repeated correctly. NSW can discriminate adults with resolved SSD from those who never had SSD.
37 2.4 Reading i. Woodcock Reading Mastery Test-Revised Word Attack subtest (WRDATK)
(Woodcock, 1987): Individuals must read a list of 45 non-words; the test includes
“words” such as ip, din, ceisminadolt, and gnouthe to assess phonetic decoding skills. ii. Woodcock Reading Mastery Test- Revised, Word Identification Subtest (WRDID)
(Woodcock 1997): Individuals must read a list of 106 real words.
2.5 Spelling i. Test of Written Spelling is similar to a spelling test that would be administered in school and requires subjects to spell dictated words in order of increasing difficulty (Larsen,
Hammill, and Moats, 1999).
1.3 Standardizing data
For analyses, we converted all scores to age adjusted z-scores using the procedure below
(Equations 3.1–3.3). We chose the first available observation of each trait for every individual within the study, even those without genetic data, to maximize our sample size. Using only the individuals without SSD and their ages, we calculated effect estimates for age and age squared. Age squared is included in the model because we suspect that there is a non linear relationship between age and the quantitative trait
(Qtrait).
2 Qtrait ~ β1* age + β2*age Equation 3.1
The beta estimates were then used to calculate z-scores for the affected individuals
2 Qtrait predicted ~β0 + β1* age + β2*age Equation 3.2 Z=Qtraitobserved-Qtraitpredicted/SEresiduals Equation 3.3
38 This method has been used elsewhere (Lewis et al., 2011; Wellman et al., 2011; Stein et al., 2014). The scores developed in this manner for each individual were the outcome measures for the analyses described in Chapter 4 and 5. If necessary, z-scores were transformed to satisfy normality assumptions of basic regression based on Q-Q plots
(Table 3.2 distributions in Appendix A). If constants were added, it was to make all original z-scores positive before transforming.
Table 3.4 Transformations of z-scores λ Articulation and motor control Fletcher Time By Count 0.5 (3+z)0.5 Goldman-Fristoe Test of Articulation log log(3+z) Language Expressive One Word Picture Vocabulary Test 1.5 (3+z)1.5 Peabody Picture Vocabulary Test NA Weschler Individual Achievement Test- Listening NA Comprehension Phonology Multisyllabic Word Repetition NA Nonsense Word Repetition NA Reading Woodcock Reading Mastery- Word Attack 2 (8+z)2 Woodcock Reading Mastery- Word Identification 1.5 (8+z)1.5 Spelling Test of Written Spelling NA
3. GENETIC DATA
Saliva samples were collected for all willing family members using Qiagen (saliva) collection kits; blood was also collected from some families. DNA was extracted, prepared for genotyping, and genotyped at Case Western Reserve University.
2.1 Genotyping
All genotyping was performed using Ilumina Omni Chips. Due to rapidly evolving technology, five chips were used (Table 3.5)
39 Table 3.5 . Genotyping data for the current study. Numbers reflect non-failed samples prior to quality control procedures Chip Technical Name Individuals Families Omni 2013 HumanOmni2.5Exome Chip 8v1_A 609 144 Omni 2014 HumanOmni2.5Exome Chip 8v1-1-A 105 34 Omni Express Omni Express 47 28 Omni 8 HumanOmni 2.5-8 40 10 Omni4 HumanOmni 2.5-4 16 12
Table 3.6 Chip characteristics summarized from Illumina documentation # Markers Variation Chip Technical Name 1000 Genomes Exonic captured (MAF >0.05)
Omni 2013 HumanOmni2.5Exome CEU=0.83 Chip 8v1_A/ CHB+ JPT=0.83 ~2.5 million 240,000 Omni 2014 HumanOmni2.5Exome YRI=0.65 Chip 8v1-1-A CEU= 0.83 HumanOmni 2.5-8/ Omni 8/Omni4 2,338,671 0 CHB+JPT=0.83 HumanOmni 2.5-4 YRI=0.65 CEU= 0.73 Omni Express Omni Express 715,000 0 CHB+JPT= 0.74 YRI= 0.40
2.1.2 Genotype quality control
Prior to completing any analyses, a considerable amount of quality control was necessary. We followed stringent quality control procedures outlined in Guo et al., 2014.
Briefly, the markers are filtered for characteristics such as cluster separation and those not passing the filter are visually inspected and manually reclustered, if possible. This procedure saves SNPs from removal. Following cluster QC and conversion to the plus strand, minor allele frequency dependent call rate filters were applied (Table 3.7). Non- autosomal SNPs (X and Y) were removed prior to analysis because the sex chromosome builds for phasing and imputation were not stable, thus we restricted our analysis to the autosomes. On the X and Y chromosomes there are psudoautosomal regions that map to
40 analysis. The exclusion of the X and Y chromosomes is a limitation of this study especially because anecdotal evidence indicates there are more affected males than females.
Table 3.7. SNP quality control summary Number SNPs Omni Omni Omni 2013 Omni 8 Omni 4 2014 Express Total SNPs, pre QC 2503734 2583315 2344747 2442829 716356 0.01
Failed Cluster QC in GenomeStudio 63773 7730 4581 15801 2951 Non Autosomal 38010 39107 31105 26668 15789 Duplicate Markers or triallelic 27222 27603 2380 2062 3 Hardy Weinberg Equilibrium 9 0 0 0 0 (p<1x10-20)* Mendelian Error** 0 0 0 0 0 Total # SNPs 1594290 1586379 1398635 1182863 651088 * Only removed extreme violation of HWE ** SNPs with Mendelian errors were zeroed for correct individuals. If removal of the SNPs for individuals resulted in low call rate for the SNP, it was subsequently removed.
Table 3.8 Individual quality control summary Number individuals removed Omni Omni Omni 2013 Omni 8 Omni 4 2014 Express Total, pre QC 651 112 40 16 48 Call rate >0.98 38 7 0 0 1 Twins 3 0 1 0 0 Planned 10 0 0 0 0 replicates Relationship 14 0 0 0 0 errors (non- resolvable) Excess 51 5 2 1 4 heterozygosity* Sex mismatch 0 0 0 0 0 Total 598 105 40 16 47 (313 Male) (53Male) (26 Male) (7 Male) (26 Male) Total families 148 60 8 4 45 *Did not remove becase there was no evidence of sample contamination and doing so would further reduce an already small sample
41 3.3 Phasing
Phasing increases the speed and accuracy of imputation by haplotypes estimating haplotypes of the genotyped data. We used SHAPEIT2 to check for strand congruity between our data and hg19 (reference genome) and individually phased each chip prior to imputation (Delaneau et al., 2012; Delaneau et al., 2014).
3.4 Imputation
Data were imputed to the Phase 3, mixed reference option, of the 1000 Genomes Project using the University of Michigan Imputation server which implements minimac3 (Howie et al., 2012). Following imputation, all markers with R2<0.7 and MAF<0.05 in our population were removed. Following imputation, the genotypes are continuous values between 0 and 2 known as dosage because there is uncertainty in the imputation. These continuous values do not reflect biology because it is impossible to carry non whole number alleles.
3.5 Finalizing data
All genetic data were reduced to the overlapping imputed markers across all five chips,
5,078,482 markers, and merged for all subsequent analyses. Additionally, due to small sample size and reduction in power when adjusting for PCs, all non Caucasian individuals were removed from the sample (For ancestry principle components see
Appendix A- Fig 6).
4. ANALYTICAL METHODS
4.1 Aim 1: Genome wide association study
We hypothesize that there is a common variant captured by the GWAS arrays or imputed markers that is associated with SSD endophenotypes without confounding comorbidities.
42 4.1.1 Statistical model
Genome wide association studies (GWAS) perform single association of each marker present in a data set with either a continuous or dichotomous outcome. They grew in popularity in the early 2000s but have been criticized for failing to replicate, not explaining enough phenotypic variance, and not providing any meaningful biological insights (Vissher et al., 2012). Despite these criticisms, GWAS have been successful in identifying loci for dyslexia (Roeske et al., 2011); therefore this analytical method will be used. Importantly, this is the first known genome wide association study for SSD.
In general, the model follows Equation 3.4.