Dissecting the Genetics of Human Communication

DISSECTING THE GENETICS OF HUMAN COMMUNICATION: INSIGHTS INTO SPEECH, LANGUAGE, AND READING by HEATHER ASHLEY VOSS-HOYNES Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Epidemiology and Biostatistics CASE WESTERN RESERVE UNIVERSITY January 2017 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We herby approve the dissertation of Heather Ashely Voss-Hoynes Candidate for the degree of Doctor of Philosophy*. Committee Chair Sudha K. Iyengar Committee Member William Bush Committee Member Barbara Lewis Committee Member Catherine Stein Date of Defense July 13, 2016 *We also certify that written approval has been obtained for any proprietary material contained therein Table of Contents List of Tables 3 List of Figures 5 Acknowledgements 7 List of Abbreviations 9 Abstract 10 CHAPTER 1: Introduction and Specific Aims 12 CHAPTER 2: Review of speech sound disorders: epidemiology, quantitative components, and genetics 15 1. Basic Epidemiology 15 2. Endophenotypes of Speech Sound Disorders 17 3. Evidence for Genetic Basis Of Speech Sound Disorders 22 4. Genetic Studies of Speech Sound Disorders 23 5. Limitations of Previous Studies 32 CHAPTER 3: Methods 33 1. Phenotype Data 33 2. Tests For Quantitative Traits 36 4. Analytical Methods 42 CHAPTER 4: Aim I- Genome Wide Association Study 49 1. Introduction 49 2. Methods 49 3. Sample 50 5. Statistical Procedures 53 6. Results 53 8. Discussion 71 CHAPTER 5: Accounting for comorbid conditions 84 1. Introduction 84 2. Methods 86 3. Results 87 4. Discussion 105 CHAPTER 6: Hypothesis driven pathway analysis 111 1. Introduction 111 2. Methods 112 3. Results 116 4. Discussion 119 CHAPTER 7: Exploratory pathway analysis 123 1. Introduction 123 2. Methods 124 3. Results 127 4. Discussion 135 5. Future Directions 141 CHAPTER 8: General Conclusions and Future Directions 143 Appendix A- Additional Materials for Chapter 3 146 1. Sample Ancestry 150 2. Power Calculations 151 1 Appendix B- Additional Materials for Chapter 4 154 1. Model Selection 154 2. Full GWAS Results 162 Appendix C- Addiitional Materials for Chpater 5 188 Appendix D- Additional materials for Chapter 6 210 Appendix E- Additional Materials for Chapter 7 211 Bibliography 220 2 List of Tables Table 2.1 Phonological processes and age at which they decline. 19 Table 2.3 Loci from linkage studies. 27 Table 2.4 Genes associated with SSD 27 Table 2.5 Copy number variation associated with SSD 28 Table 2.6 Genes associated with comorbid conditions. 30 Table 2.7 Loci from linkage studies associated with comorbid conditions 32 Table 3.1 Tests used in current study and the phenotype interrogated 35 Table 3.3 Basic demographics of all individuals in the cohort as of February 2016 35 Table 3.4 Transformations of z-scores 39 Table 3.5 . Genotyping data for the current study. 40 Table 3.6 Chip characteristics summarized from Illumina documentation 40 Table 3.7. SNP quality control summary 41 Table 3.8 Individual quality control summary 41 Table 3.9. Significance threshold for HapMap. 45 Table 4.1 Test used in the analyses divided by endophenotype. 50 Table 4.2 Summary statistics for quantitative traits used in the analysis 51 Table 4.3 Correlation (R2) between the quantitative traits analyzed. 52 Table 4.4 Most significant marker for genes previously associated with SSD or childhood apraxia of speech 54 Table 5.1- Mean/median z-scores stratified by Language Impairment affection status (Model 1) 88 Table 5.2 Mean/median score stratified by Reading Disability affection status (Model 2) 88 Table 5.3 Mean/median scores stratified by all groups except SSD status 89 Table 6.1 Pathways of interest based on Aim I GWAS results 114 Table 6.2 Genes included in the FOXP2 and CANTNAP2 gene sets 115 Table 6.3 Significance of Aim I based pathways 117 Table 6.4 p-values for FOXP2 and CNTNAP2 networks 118 Table 6.5 p-values for Comorbid Condition Gene Sets 119 Table 7.1 Number of significant pathways for each trait 128 Table 7.2 Pathways shared by four or more traits. 129 Table 7.3 Pathways significant in GFTA and MSW or NSW 135 Table A1- Ancestry of the individuals who passed quality control. 150 Table B1. Lambda values for four models. 161 Table B2 Sample sizes with and without parents 161 Table C1. Top 20 loci for binary outcome after adjusting for LI and RD 199 Table C2. Top loci for Fletcher Time by Count after adjusting for LI and RD. 200 Table C3. Top 10 loci for Goldman Fristoe Test of Articulation after adjusting for LI and RD. 201 Table C4. Top 20 loci for Expressive One Word Picture Vocabulary Test after adjusting for LI and RD 202 Table C5 Top 20 loci for Peabody Picture Vocabulary test after adjusting for LI and RD 203 3 Table C6 Suggestive loci for Weschler Individual Achievement Test –Listening Comprehension after adjusting for LI and RD 204 Table C7 Top 20 loci for multisyllabic word repetition after adjusting for LI and RD 205 Table C8. Top 20 loci for nonsense word repetition after adjusting for LI and RD 206 Table C9. Suggestive loci for TWS after adjusting for LI and RD 207 Table C10. Top 20 loci for Word Attack after adjusting for LI and RD 207 Table C11 . Suggestive makers Word Identification after adjusting for LI and RD 208 Table C12. Most significant SNP in genes previously associated with SSD. 209 Table D1. Pathway Analysis- User defined pathways 210 Table E1. Significant pathways for articulation and motor control 211 Table E2. Significant pathways for language traits 211 Table E3. Significant pathways for phonology traits 214 Table E5. Significant pathways for spelling 216 Table E6 Pathways significant in 3 traits. 218 4 List of Figures Figure 2.1- Consonants and age of acquisition 18 Figure 3.1 Overall study design and workflow 33 Figure 4.2 Manhattan plot- Fletcher Time by Count 57 Figure 4.3 Manhattan plot- GFTA 58 Figure 4.4 Manhattan plot- EOWPVT 59 Figure 4.5 Manhattan plot- PPVT 60 Figure 4.6 Manhattan plot- WIATLC 61 Figure 4.7 Manhattan plot- Shared between EOWPVT and PPVT 62 Figure 4.8 Manhattan plot- MSW 63 Figure 4.9 Manhattan plot- NSW 65 Figure 4.10 Manhattan plot- Shared between MSW and NSW 66 Figure 4.11 Manhattan plot- WRDATK 67 Figure 4.12 Manhattan plot- WRDID 68 Figure 4.13 Manhattan plot- Shared WRDATK WRDID 69 Figure 4.14 Manhattan plot- TWS 70 Figure 5.1- Conceptual model for the relationship between SNP effect, SSD quantitative trait, language impairment, and reading disability. 84 Figure 5.2 Basic workflow for Aim II. 86 Figure 5.3. Proportion of markers with p<1x10-5 in Aim I 90 Figure 5.4 Effects of adjusting for LI and RD –Fletcher Time by Count 92 Figure 5.5 Effects of adjusting for LI and RD – Goldman-Fristoe Test of Articulation 93 Figure 5.6 Effects of adjusting for LI and RD –Expressive One Word Picture Vocabulary Test. 95 Figure 5.7 Effects of adjusting for LI and RD Peabody Picture Vocabulary Test 96 Figure 5.8 Effects of adjusting for LI and RD Weschler Individual Achievement Test- Listening Comprehension subtest 97 Figure 5.9 Effects of adjusting for LI and RD Multisyllabic Word Repetition 98 Figure 5.10 Effects of adjusting for LI and RD Nonsense Word Repetition 99 Figure 5.11 Effects of adjusting for LI and RD Word Attack 101 Figure 5.12 Effects of adjusting for LI and RD Word Identification 103 Figure 5.13 Effects of adjusting for LI and RD Test of Written Spelling 104 Figure 6.1 Workflow for pathway analysis of genome-wide association results 113 Figure 7.1 Section of the KEGG Calcium signaling pathway 126 Figure 7.3 Classification of pathways significant in two or more traits 128 Figure 7.4 Interactions between significant pathways for language traits. 132 Figure 7.5 Pathways significant in both MSW and NSW 133 Figure 7.5 Shared pathways for reading traits 134 Figure 7.6 Interactions identified between significant spelling pathways 135 Figure A1 z-scores for Fletcher Time by Count and Goldman-Fristoe Test of Articulation 146 Figure A2 z-scores for PPVT and WIATLC 147 Figure A3 z-scores for MSW and NSW 147 Figure A4 z-scores for Word Attack and Word Identification 148 Figure A5 z-scores for Test of Written Spelling 149 5 Figure A6 Principal component plots 150 Figure A7 Power at various minor allele frequencies and effect estimates. 151 Figure A8 Effects of altering various parameters on power for binary outcome. 153 Figure B1. QQ plots for Articulation and Oral Motor Control 155 Figure B2. QQ plots for language endophenotypes 156 Figure B3. QQ plots for reading endophenotypes 157 Figure B4. QQ plots for spelling 158 Figure B5. Histograms of articulation and language traits 159 Figure B6. Histograms of phonology, reading, and spelling traits 160 Figure C1 Manhattan plots for adjusted BT Speech 188 Figure C2 Manhattan plots for adjusted Fletcher Time by Count 189 Figure C3 Manhattan plots for adjusted GFTA 190 Figure C4 Manhattan plots for adjusted EOWPVT 191 Figure C5 Manhattan plots for adjusted PPVT 192 Figure C6 Manhattan plots for adjusted WIATLC 193 Figure C7 Manhattan plots for adjusted MSW 194 Figure C8 Manhattan plots for adjusted NSW 196 Figure C9 Manhattan plots for adjusted WRDATK 197 Figure C10 Manhattan plots for adjusted WRDID 198 Figure C11 Manhattan plots for adjusted TWS 199 6 Acknowledgements I am grateful to countless individuals for helping me through this process. Thank you to my advisor, Dr.

Dissecting the Genetics of Human Communication

Blueprint Genetics ANOS1 Single Gene Test

Genomic Correlates of Relationship QTL Involved in Fore- Versus Hind Limb Divergence in Mice

Analysis of Gene Expression Data for Gene Ontology

Gpr161 Anchoring of PKA Consolidates GPCR and Camp Signaling

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Identification and Characterization of RHOA-Interacting Proteins in Bovine Spermatozoa1

Novel Mutations in ANOS1 and FGFR1 Genes Agnieszka Gach1* , Iwona Pinkier1, Maria Szarras-Czapnik2, Agata Sakowicz3 and Lucjusz Jakubowski1

Anti-Cdk8 Antibody Produced in Rabbit (C0238)

Discovering Novel Hearing Loss Genes: Roles for Esrp1 and Gas2 in Inner Ear Development and Auditory Function

C2orf3 (GCFC2) (NM 001201334) Human Tagged ORF Clone Product Data

Table SI. Genes Upregulated ≥ 2-Fold by MIH 2.4Bl Treatment Affymetrix ID

Speech Sound Disorder Influenced by a Locus in 15Q14 Region