TOWARDS THE IDENTIFICATION OF CAUSAL AND CONTRIBUTING

MOLECULAR PROCESSES UNDERLYING STRABISMUS

by

Xin Ye

B.Sc., The University of British Columbia, 2011

M.Sc., The University of British Columbia, 2014

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Medical Genetics)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

May 2019

© Xin Ye, 2019 The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled: Towards the identification of causal genes and contributing molecular processes underlying strabismus

submitted by Xin Ye in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Medical Genetics

Examining Committee:

Wyeth Wasserman Supervisor

Angela Brooks-Wilson Supervisory Committee Member

Orson Moritz University Examiner

Douglas Allan University Examiner

Additional Supervisory Committee Members:

Supervisory Committee Member

Supervisory Committee Member

ii

Abstract

Eye misalignment, or strabismus, has a frequency of up to 4% in a population, and is known to have both environmental and genetic causes. Genes associated with syndromic forms of strabismus (i.e. strabismus concurrent with multiple phenotypes) have emerged, but genes contributing to isolated strabismus remain to be discovered. Only one isolated strabismus locus,

STBMS1 on 7, has been confirmed in more than one family, but the inheritance model of the locus is inconsistent between studied families and no specific causal variant has been reported. The large set of syndromes with strabismus suggests that within the visual system multiple perturbations of an underlying genetic network(s) can have the common output of disrupted eye alignment. Thus, I used a bioinformatic-driven approach to analyze curated genes associated with strabismus to provide insight into the biological mechanisms underlying strabismus, highlighting a link to the Ras-MAPK pathway. During the process, I noticed strabismus presenting within a large number of intellectual disability disorders. Therefore, I studied the co-occurrence of strabismus and other common phenotypes in a series of patients with intellectual disability, which confirmed a significant correlation between eye alignment and intellectual disability. Finally, I resumed efforts from my prior studies to identify the genetic cause in a seven-generation family with isolated strabismus inherited in an autosomal dominant manner. The likely casual disruption, altering a likely cis-regulatory region of the

FOXG1gene, was identified through the incorporation of linkage analysis, next generation sequencing, and in-depth bioinformatic analyses. This thesis identifies potential roles for genes participating in the Ras-MAPK pathway, emphasizes the role of the central nervous system, and reveals FOXG1 as a causal gene candidate for isolated strabismus.

iii

Lay Summary

Eye misalignment, or strabismus, can affect vision and self-image. Untreated strabismus can lead to blindness in extreme cases. Multiple surgeries may be needed to align the eyes, but the outcome may not be satisfying. Hippocrates observed that strabismus was often presented as a familial condition over 2400 years ago, implying a genetic cause of strabismus. However, we have little understanding of the genetics and mechanism(s) of strabismus today. I combined different methods to study the role of genetics in strabismus, including finding the gene likely causing strabismus in a large family with many afflicted. Strabismus is a complex condition that may involve mechanisms related to eyes, muscles and the brain. This study leads to new directions for research to unravel the causes of strabismus, and the improved understanding may lead to better prevention and treatment.

iv

Preface

Chapter 1 is built upon a published review:

Ye XC, Pegado V, Patel MS, Wasserman WW. Strabismus genetics across a spectrum of

eye misalignment disorders. Clin Genet. 2014 Aug;86(2):103–11.

A version of chapter 2 is published as:

Ye XC, van der Lee R, Wasserman WW. Curation and bioinformatic analysis of

strabismus genes supports functional heterogeneity and proposes candidate genes

with connections to RASopathies. Gene. 2019. doi: 10.1016/j.gene.2019.02.020.

WWW helped design the computational studies, reviewed and edited the manuscript, and acquired funding. RvdL reviewed/edited the manuscript and advised on the analyses. I designed and conducted the computational studies, collected and analyzed the data, wrote the manuscript, and made tables and figures.

A version of chapter 3 is under review in Pediatric Neurology as:

Ye XC, van der Lee R, Wasserman WW, CAUSES Study, Friedman JM, Lehman A.

Strabismus in children with intellectual disability: part of a broader motor control

phenotype?

AL helped design the study, reviewed and edited the manuscript. RvdL, WWW and JMF reviewed/edited the manuscript. I designed and conducted the study, collected and analyzed the data, wrote the manuscript and made tables and figures. CAUSES Study collected the data. v

Chapter 4 is in preparation for publication as:

Ye XC, Horton J, Lyons C, Pegado V, Ross C, Shyr C, Richmond P, Roslin N, Peterson

A, Han X, Higginson M, Giaschi D, Gregory-Evans C, Patel M, Wasserman WW.

Linkage analysis identifies an isolated strabismus locus at 14q12 overlapping the FOXG1

syndrome region.

WWW and MP helped design the experiments, reviewed and edited the manuscript, and acquired funding. JH, VP, and CL performed ophthalmological exams. NR and AP performed the initial linkage analysis through the CARE4RARE network. MH extracted DNA from blood samples and prepared samples for WES and WGS. CR and XH designed and performed Sanger sequencing. CS and PR maintained and improved the WES and WGS pipelines and performed the analyses. DG was counselled for project development and helped develop the MRI protocol.

CGE helped develop the project and provided experimental design support. I designed and conducted the experiments, worked with the family to build the pedigree, extracted DNA from saliva samples, performed the second round of linkage analysis with WGS data, performed WES and WGS analyses, analyzed and integrated relevant data for candidate prioritization, helped develop the MRI protocol, analyzed the data, wrote the manuscript and made tables and figures.

Ethics statement:

Approval for the study was obtained from University of British Columbia Children’s &

Women’s Research Ethics Board (approval number CW10-0317/H10-03215).

vi

Table of Contents

Abstract ...... iii

Lay Summary ...... iv

Preface ...... v

Table of Contents ...... vii

List of Figures ...... xii

List of Supplementary Material ...... xiii

List of Abbreviations ...... xiv

Acknowledgements ...... xv

Chapter 1: Introduction ...... 1

1.1 Overview ...... 1

1.2 Introduction of strabismus ...... 2

1.2.1 Classification ...... 2

1.2.2 Epidemiology ...... 2

1.2.3 Pathology...... 3

1.2.4 Animal models...... 4

1.3 Genes of strabismus ...... 5

1.3.1 Family and twin study ...... 5

1.3.2 Linkage analysis ...... 7

1.3.3 Genome-wide association studies ...... 8

1.3.4 Next generation sequencing ...... 9

1.4 Frontiers of strabismus research - genes implicated in strabismus ...... 10 vii

1.4.1 Duane retraction syndrome ...... 10

1.4.2 Congenital fibrosis of the extraocular muscles ...... 12

1.4.3 Cranial nerves and beyond: is the axon growth signaling pathway the whole

story? ...... 14

1.5 Thesis objectives ...... 16

Chapter 2: Curation and bioinformatic analysis of strabismus genes supports functional heterogeneity and proposes candidate genes with connections to RASopathies...... 20

2.1 Synopsis ...... 20

2.2 Introduction ...... 21

2.3 Methods ...... 23

2.3.1 Curation of strabismus gene lists ...... 23

2.3.2 Curation of gene lists for strabismus risk factors ...... 24

2.3.3 Gene set analysis ...... 24

2.3.4 Phenotype annotation for GO modules ...... 25

2.3.5 Candidate gene identification ...... 26

2.4 Results...... 26

2.4.1 Curation and evaluation of strabismus gene sets ...... 26

2.4.2 Identifying strabismus functional modules through gene set analyses ...... 28

2.4.3 Linking strabismus functional modules to clinical phenotypes ...... 32

2.4.4 Strabismus candidate gene collection ...... 33

2.4.5 Prioritizing genes from strabismus-associated genetic loci ...... 34

2.4.6 Associations between Ras-MAPK signaling and strabismus ...... 34

2.5 Discussion ...... 34 viii

Chapter 3: Strabismus in children with intellectual disability: part of a broader motor control phenotype?...... 50

3.1 Synopsis ...... 50

3.2 Introduction ...... 51

3.3 Methods: ...... 51

3.3.1 Clinical characterization of CAUSES subjects ...... 51

3.3.2 Analyses based on Population Data BC and OMIM ...... 52

3.4 Results...... 53

3.4.1 Strabismus frequently affects children with ID ...... 53

3.4.2 Motor control phenotypes are associated with strabismus in ID ...... 54

3.5 Discussion ...... 55

Chapter 4: Linkage analysis identify isolated strabismus locus at 14q12 overlapping

FOXG1 syndrome region...... 59

4.1 Synopsis ...... 59

4.2 Introduction ...... 60

4.3 Materials and methods ...... 61

4.3.1 Patient ascertainment ...... 61

4.3.2 DNA isolation ...... 62

4.3.3 Genotyping: statistical linkage analysis and haplotype analysis ...... 62

4.3.4 Whole-exome sequencing ...... 63

4.3.5 Whole-genome sequencing ...... 64

4.3.6 Non-coding variant annotation and interpretation ...... 64

4.4 Results...... 65 ix

4.4.1 Pedigree and participant profile...... 65

4.4.2 Linkage analysis and haplotype analysis ...... 66

4.4.3 No impactful coding variant in the 10 Mb region identified through WES and

WGS ...... 67

4.4.4 WGS and bioinformatic analyses highlight a heterozygous non-coding variant in a

regulatory region of FOXG1 ...... 68

4.5 Discussion ...... 69

Chapter 5: Conclusion ...... 82

5.1 Complex genetic architecture – Parkinson’s disease and age-related macular

degeneration as models for strabismus research ...... 82

5.2 Summary of the thesis ...... 85

5.3 The importance of phenotyping ...... 88

5.4 The importance of non-coding genetic variants ...... 90

5.5 Conclusion ...... 91

Bibliography ...... 93

Appendices ...... 106

Appendix A Supplementary Material for Chapter 2 ...... 106

Appendix B Supplementary Material for Chapter 3 ...... 109

x

List of Tables

Table 1.1 Strabismus genes with defined molecular mechanism47 ...... 18

Table 2.1 Permissive strabismus gene set (P-Strab)...... 46

Table 2.2 Early developmental brain regions enriched for P-Strab genes, based on expression enrichment analysis with ABAenrichment...... 47

Table 2.3 Top 10 HPO terms for GO modules based on P-Strab Set...... 48

Table 2.4 Strabismus Candidate Gene Collection ...... 49

Table 3.1 Prevalence of clinical features in CAUSES ID subjects...... 58

Table 3.2 Co-occurrence of strabismus with other clinical features in CAUSES-ID series...... 58

Table 4.1 Ophthalmological characterization of the subject family...... 81

xi

List of Figures

Figure 1.1 Common strabismus classifications...... 19

Figure 2.1 Gene set analysis overview...... 41

Figure 2.2 ClueGO enrichment analysis of Gene Ontology Biological Process terms for P-Strab genes...... 42

Figure 2.3 Specific Expression Analysis (SEA) across brain regions and developmental stages for P-Strab genes...... 43

Figure 2.4 GeneMania interaction networks for the P-Strab genes...... 44

Figure 2.5 Strabismus genes mapped to the human Ras-MAPK pathway...... 45

Figure 3.1 Odds ratios and 95% confidence intervals (x-axis) for the association of different clinical features (y-axis) with strabismus vs. non-strabismus ID groups...... 57

Figure 4.1 Pedigree for the subject family with isolated strabismus...... 73

Figure 4.2 Linkage analysis for subject family...... 74

Figure 4.3 Linkage region...... 75

Figure 4.4 Topologically associated domains within the 3 Mb core region...... 76

Figure 4.5 FOXG1 transcription factor binding site matching to reference and alternative sequence...... 77

Figure 4.6 Ultra-conservation of the likely causal variant region...... 79

Figure 4.7 Cis-regulatory mechanism within FOXG1-TAD...... 80

Figure 5.1 Schematic representation of the thesis...... 92

xii

List of Supplementary Material

Supplementary Table A.1a: Stringent strabismus gene set (S-Strab) Supplementary Table A.1b: HPO-based strabismus gene set Supplementary Table A.1c: Permissive strabismus gene set (P-Strab) Supplementary Table A.2a: Genes for strabismus disorders with defined genetic basis (E-Strab) Supplementary Table A.2b: Strabismus loci (L-Strab) Supplementary Table A.3a: Strabismus risk factors Supplementary Table A.3b: Genes annotated with risk factors (RF) Supplementary Table A.3c: Overlapped genes between P-Strab and RF Supplementary Table A.4a: Ten strabismus GO modules with gene lists based on ClueGO enrichment analysis Supplementary Table A.4b: ErmineJ GO enrichment results Supplementary Table A.5a: KEGG human pathways results Supplementary Table A.5b: Gene lists for identified KEGG human pathways Supplementary Table A.6: Gene clusters based on GeneMania and MCODE analyses Supplementary Table A.7a: Strabismus Candidate Gene Collections (GM list, GM-GO list, overlapped list) Supplementary Table A.7b: Top 100 differentially expressed genes in cerebellum/cerebellar cortex (CB/CBC), amygdala (AMY), and posteroventral (inferior) parietal cortex (IPC)

xiii

List of Abbreviations

CCDD congenital cranial dysinnervation disorders

CFEOM congenital fibrosis of the extraocular muscles

DURS Duane retraction syndrome

GWAS genome-wide association study

ID intellectual disability

LOD logarithm of the odds

MRI magnetic resonance imaging

OMIM Online Mendelian Inheritance in Man

TAD topologically associating domain

TFBS transcription factor binding site

WES whole exome sequencing

WGS whole genome sequencing

xiv

Acknowledgements

I would like to start by thanking the many participants for their time and effort to make the family study possible, and the clinicians who generously offered their time to examine the participants.

I would like to thank my PhD supervisory committee, Drs. Cheryl Gregory Evans, Anna

Lehman, and Angela Brooks-Wilson, for their guidance and support through the study. Without your great insights and expertise, the project cannot be brought to life. I thank Drs. Jan Friedman,

Matthew Lorincz, Lynn Raymond, and Torsten Nielsen for their mentorship.

I would especially like to thank Dr. Wyeth W. Wasserman for taking me into his lab as an undergraduate student and introducing me to various research projects, and thereby providing me the opportunities to explore science. I thank you for guiding me through the many challenges, encouraging me to explore my ideas and interest, and setting me for a career full of excitement.

I would also like to thank the members and alumni of the Wasserman Lab and the

Gregory-Evans Lab. The scientific exploration fueling the current thesis would not be possible without your training and guidance.

A special thanks for my fellow students, for the encouragement, scientific discussions, and inspiration; and Dora Pak, Cheryl Bishop, and Jane Lee, who provided constant administrative support.

I am grateful for the scholarships and financial support I received from the Canadian

Institutes of Health Research (CIHR), UBC Faculty of Medicine, and UBC Four Year fellowship.

Lastly, I would like to thank my parents for their unconditional love and support all the way through. Thank you for believing in me and giving me the opportunity to pursue my dream. xv

Chapter 1: Introduction

1.1 Overview

Strabismus refers to eye misalignment and occurs in up to 4% in ethnic populations. It is one of the earliest recorded genetic disorders; Hippocrates described ‘Children of parents having distorted eyes squint also for the most part’ 1. Strabismus can lead to visual problems during development, including loss of binocular vision, amblyopia (‘lazy eye’), and abnormal retinal correspondence (shifting of the fixation point relative to the macula in one eye). Strabismus disrupts stereopsis, which impacts the performance of numerous practical tasks requiring the precise judgment of distance (e.g. driving) or depth (e.g. microscope usage) 2. In addition to reduced visual function, strabismus is associated with psychosocial problems impacting self- image, interpersonal relationships, performance in school and employment 3.

Despite well-established clinical diagnostic and treatment guidelines, we have a very limited understanding of the pathophysiology and genetics of strabismus. The most genetically well-studied subgroup of strabismus is congenital cranial dysinnervation disorders, but little success has been achieved in defining the genetic mechanisms contributing to other subtypes of strabismus.

The overview is intended to provide a basis for understanding the research included within this thesis which relates to genetic studies of strabismus, including: (1) the curation and network analysis of genes associated with strabismus; (2) the analysis of phenotypes associated with strabismus for a set of patients with intellectual disability; and (3) the genetic characterization of a family with isolated strabismus (extending preliminary studies reported in my MSc thesis research).

1

1.2 Introduction of strabismus

1.2.1 Classification

Strabismus is commonly classified by the direction of deviation: esotropia (inward misalignment), exotropia (outward), hypertropia (upward), and hypotropia (downward) (Figure

1.1a). In many cases, isolated strabismus is characterized by non-restrictive, non-paralytic ocular misalignment with the same magnitude in all directions of gaze, which is known as concomitant

(comitant) strabismus. Incomitant strabismus is paralytic in origin and the angle of deviation varies in different directions (Figure 1.1b). The occurrence of muscle paralysis can be determined by the broad H test, which is scored positive if one eye lags behind the other in at least one of the six positions of gaze 4. Strabismus can also be classified into accommodative and non-accommodative forms. Accommodative strabismus arises due to visual discrepancy and favoured use of the better eye, and significant hyperopia (farsighted) can be a contributing factor.

The eye not in use is esotropic in accommodative strabismus. Non-accommodative strabismus refers to the rest of the cases that do not have a visual discrepancy.

1.2.2 Epidemiology

The reported prevalence of strabismus varies with ethnicity: 2 - 4% among Caucasians,

2.4% among Hispanic/Latinos, 2.5% among African-Americans, and 1% in East-Asians 5–8.

Among Caucasians, esotropia is three times more common than exotropia, while exotropia predominates in Cameroon black (63%) and Asian populations (> 70%) 9–12. Studies consistently report balanced distribution between sex 13–16.

The age of onset distribution for strabismus is bimodal: before the age of 12 months and between 2 and 3 years of age. It has been observed that non-accommodative strabismus was 2

more common in the infant group, while accommodative strabismus (such as those with hyperopia) was more common in the older group 17,18. Within the infant group, strabismus in a small percentage of individuals may spontaneously resolve before the age of 6 months: exotropia was reported as resolved in 5 of 138 infants (4%) 19; esotropia was reported as resolved in 46 of

170 infants (27%), with intermittent or variable deviation in most resolved cases 20.

1.2.3 Pathology

The physical mechanisms underlying strabismus may involve one of several systems or tissues, both for congenital and acquired conditions. Past reports highlight the potential for disruptions in extraocular muscles, orbital connective tissues, cranial nerves, and the brain 21.

Damages to extraocular muscles caused by mechanical trauma, acquired inflammation or infiltration, and metabolic disorder can all lead to extraocular muscle myopathy and consequently secondary strabismus. Abnormalities of either the location or stability of the connective tissue can alter the direction of extraocular muscles pulling and contribute to both congenital and acquired strabismus. A defining feature of congenital cranial dysinnervation disorders is hypoplastic or misrouted motor nerves to extraocular muscles, leading to dysfunctional extraocular muscles 21.

Researchers have postulated about the relationship between strabismus and brain abnormalities, often emphasizing the role of the visual cortex. At the turn of the 20th century,

Worth proposed that infantile esotropia was due to an inborn defect of fusion, as surgery on extraocular muscle could not reverse strabismus 22. Fusion is the ability to combine the corresponding retinal images from two eyes into a single visual percept 23. Tychsen suggested that this fusion faculty was situated within the striate cortex (a part of the visual cortex), and 3

specifically proposed that congenital defects would therefore be present in disparity-sensitive, binocular neurons 24. Using staining techniques, a paucity of such binocular connections was observed in both naturally occurring and induced strabismic monkeys, while monocular connections remained. Electrophysiological measurement showed that loss of binocular responsiveness and disparity sensitivity was consistent with the reduced number of binocular connections 25. Schoeff and coworkers reasoned that the lack of evidence of extraocular muscle dysinnervation in isolated strabismus suggested a visual cortex contribution 26.

1.2.4 Animal models

While the clinical observations from human patients raise new questions for medical research, causal mechanisms in most cases require validation on animal models. Animal models of strabismus have been a challenge to define since the diagnosis of strabismus is based on eyes in the same direction. The alignment of primate forward-orientated eyes is crucial for the stereoscopic or binocular vision, giving depth perception 27. Therefore, much focus in the literature is on the use of macaques. Strabismus incidence is approximately 4% in Macaca nemestrina (pig-tailed macaques), but no data is available for other macaque species 28. Research using macaque models are largely based on artificially induced strabismus through either surgery or sensory methods. These macaque models show that binocular vision disruption in infancy can lead to strabismus and other associated visual sensory deficits and oculomotor abnormalities.

Invasive studies have provided some insight in the brain regions that may be involved in the development and maintenance of strabismus, in which the following regions have shown corresponding neural activity changes: motor nuclei, supraoculomotor area, brainstem pontine areas, cerebellum, and the primary visual cortex (V1)28. 4

No genetic macaque model has been established for strabismus. While mice are commonly used for genetic animal models of disease, recapitulating phenotypes equivalent to strabismus is challenging in mice, as their eyes are orientated to the side. There have been indications of a binocular zone of vision in mice, but phenotyping methods are not well established 29.

1.3 Genes of strabismus

The approaches and goals to study strabismus genetics have evolved over time with technological advances: from the conceptual demonstration of its heritability to the identification of causal genes. In this section, I will review the literature for strabismus genetic research starting in the early 20th century, covering family and twin study, linkage analysis, genome-wide association study, and next generation sequencing.

1.3.1 Family and twin study

Many early genetic studies focused on the transmission of strabismus through families.

Findings varied in terms of heritability, the concordance of strabismic types, and inheritance mode 16. Schlossman and Priestley found that 47.5% of 158 patients with strabismus belonged to families with two or more members affected with strabismus, but cautioned that the actual number might be larger since subtle alignment deviations could be missed 30. The highest reported familial incidence of strabismus was 63% 13,31. A longitudinal study found that six of 34 babies born in families with a parent affected by esotropia developed constant or intermittent esotropia by six months 32. As the types of assessed relatives varied between studies and there was no consideration of the environment, the precise genetic risk is unclear. Nevertheless, the 5

figures were much higher than expected based on the prevalence in general populations, supporting a contribution of genetics to strabismus risk.

The concordance of strabismus types varies across familial studies. Families with a mixture of esotropia and exotropia phenotypes have been reported multiple times 10,30. One study found that 80% of strabismus cases occurring in the same family were concordant 16. Another study reported 54% concordance within 39 studied families 10. The observed discordance could indicate potential phenocopy, genetic heterogeneity, or developmental variation, and observations from twin studies may provide additional insights.

Twin studies of strabismus have reported higher concordance rates in monozygotic twins than dizygotic twins, suggesting a predominant genetic factor 16. The twin study by Matsuo and coworkers showed that strabismic subtypes of 67.3% of 49 pairs or sets were concordant, with the concordance rate higher in monozygosity (82.4%) than in multizygosity (47.6%) 33. Wilmer and Backus performed a meta-analysis, reporting monozygosity and dizygosity concordances of

54% and 14%, respectively, in studies with systemic ascertainment; and 66% and 19%, respectively, without systematic ascertainment 34. Podgor reported that the odds ratio for esotropia rose from 2.6 if a sibling from a preceding birth was affected to a ratio of 5.4 if a twin

(or other multiple births) was affected 35. While the higher concordance rates in monozygotic twins support genetics as a strong drive for misalignment types, the remaining discordance suggest misalignment types may share a common underlying cause with a developmental variation.

The majority of studies have noted that simple Mendelian models cannot generally explain the complexity of strabismus inheritance patterns. There are multiple inheritance patterns represented in the families described in the scientific literature. Dominant, recessive, and sex- 6

linked inheritance patterns have been proposed for isolated strabismus in familial studies 16,30.

Across different families, Czellitzer reportedly suggested autosomal recessive inheritance patterns were responsible for strabismus, while Waardenburg proposed an autosomal dominant model of a single gene in a family 30,36. A study using quantitative measurement of sensory and motor function rejected the theories of Mendelian inheritance of strabismus as a single trait in the population 11.

1.3.2 Linkage analysis

Although the heritability of strabismus has long been recognized, most advances at the level of specific genes have occurred in the last 15 years 5,9. Parikh and coworkers identified the first concomitant strabismus locus on chromosome 7p22.1 (STBMS1) in a linkage analysis.

Among seven initially assessed multiplex families with isolated strabismus, one family showed a significant logarithm of the odds (LOD) score on chromosome 7 37. Although the pedigree suggested an autosomal dominant inheritance pattern, the haplotype data was most consistent with an autosomal recessive model or a more complex model, such as the authors’ proposed semi-dominant inheritance model. In the subject family, eight of fourteen siblings were affected, and seven of these eight patients had hypermetropia of varying severity. The transmission patterns observed in the other six families from the original study were not consistent with the chromosome 7 locus, STBMS1 37. Rice and coworkers examined 12 additional families, of which one was consistent with an STBMS1 role. Five affected family members had primary isolated concomitant esotropia while 21 examined family members were unaffected. In this second

STBMS1 family, the pattern of inheritance was observed to best fit a dominant mode of inheritance 38. In combination, the reports indicate that there is at least one isolated strabismus 7

associated genetic component at the STBMS1 locus. Elucidating the causal mutations in the two families may clarify the conflict between transmission models, however, such an advance has not been achieved in the 15 years since the original report.

The Ohtsuki group tried to identify concomitant strabismus susceptibility loci through sib-pair analysis and non-parametric linkage analysis for multiple pedigrees. This initial 2003 attempt indicated multiple loci with low LOD scores 39. A 2008 report identified the 4q28.3 and

7q31.2 loci as having significant evidence of linkage. After stratifying cases into esotropia and exotropia sub-groups, they identified additional loci at 8q24.21 and 14q21.3, respectively 40.

1.3.3 Genome-wide association studies

Genome-wide association studies (GWAS) detect alleles that are observed more often than expected by chance in subjects with a trait of interest compared to controls. However, to identify the causal genetic variation based on the GWAS-detected marker can be challenging.

Recently the first large strabismus GWAS has been reported 41. With 826 non-accommodative esotropia subjects and 2991 controls of White European ancestry from the United States, a single significant association was identified for the single nucleotide polymorphism (SNP) rs2244352

[T] on chromosome 21q22.2 (odds ratio [OR] = 1.41, p=2.84 × 10-09), and the result was replicated with 689 subjects and 1448 controls of White European ancestry from Australia and

United Kingdom (OR = 1.33, p=9.58 × 10-11). This SNP situated within an intron of WRB, which encodes a tryptophan rich basic protein. WRB is a maternally imprinted gene and widely expressed throughout life. There is no known connection between WRB and strabismus.

Interestingly, maternal smoking, which is a risk factor for strabismus, has been shown to reduce

WRB methylation in a meta-analysis 41. Further study is required to establish the connection 8

between the genetic and epigenetic mechanisms leading to strabismus. Another SNP rs912759

[T] on chromosome 1p31.1 was identified in the same paper for a distinct GWAS for accommodative esotropia which relied on the smaller discovery and replication cohorts (OR =

0.59, P=6.53 × 10-07), but no candidate gene was suggested by the authors 41.

1.3.4 Next generation sequencing

Whole exome sequencing (WES) or whole genome sequencing (WGS) has become prominent in disease gene identification. Massively parallel sequencing and ever-advancing bioinformatic tools have revolutionized human genetic analysis. Variants in exons, mostly coding, are captured in the widely adopted WES, leading to great success in finding causal genes for diverse disorders. WES has been used to identify candidate genes (e.g. PAX3 and AHI1) for strabismus, but further investigations are needed to confirm the causal relationship for these cases 42,43. With the fast dropping price of WGS and the proven superiority of looking beyond coding regions 44, increasingly researchers are using WGS. With the emergence of WGS, evidence is accumulating for a significant role in disease pathogenesis of non-coding regulatory regions 45. Although the annotation software is lagging behind for non-coding regions in the genome, methods and resources are improving. The gnomAD database has provided 15,496 genomes from unrelated individuals, helping researchers to identify common variants that are unlikely to contribute to rare diseases. WGS has not been applied to strabismus study in the published literature and scientific exploration of non-coding regulatory regions in WGS is nascent, but they are important for the thesis presented here.

9

1.4 Frontiers of strabismus research - genes implicated in strabismus

I have introduced the clinical features of strabismus and the history of its genetic study, and both demonstrate that strabismus is heterogeneous. Disruption of diverse tissues can lead to strabismus, spanning extraocular muscles, orbital connective tissues, cranial nerves, and the brain. Diverse inheritance patterns and loci have been reported. Given the heterogeneity, one way to gain a clearer insight into strabismus is to focus on specific subgroups of strabismus with distinctive clinical phenotypes. For example, the ‘congenital fibrosis syndromes’ contain a group of congenital, non-progressive ophthalmoplegia with restriction of globe movement. Genetic studies have led to a concept evolution, and this group of incomitant strabismus is now termed congenital cranial dysinnervation disorders (CCDDs). The following disorders are contained in the CCDDs: Duane Retraction Syndrome, congenital fibrosis of the extraocular muscles, Bosley-

Slih-Alorainy syndromes, Athabascan brain dysgenesis syndrome, hereditary congenital facial

Paresis, Mobius syndrome, and horizontal gaze palsy with progressive scoliosis 46. Congenital cranial dysinnervation disorders genes are summarized in Table 1.1, and the clinical descriptions are based on the publication by Graeber and coworkers 47. Causal genes for the two most well- studied CCDDs (Duane retraction syndrome and congenital fibrosis of the extraocular muscles) are highlighted below, followed by a summary of the derived mechanistic perspective and discussion for potential disruption beyond cranial nerve dysinnervation.

1.4.1 Duane retraction syndrome

Duane retraction syndrome (DURS) is a congenital cranial dysinnervation disorder, and overall DURS accounts for approximately 5% of strabismus cases 48. About 70% of DURS cases do not exhibit other congenital abnormalities, and approximately 20% of cases have a family 10

history of strabismus 49,50. Three types of DURS have been described based on clinical examination (Type 1, Type 2 and Type 3), and the key attributes for the classification include abduction, movement of a body part away from the midline, and adduction, movement toward the midline. Type 1 DURS is characterized by a marked limitation of abduction, type 2 DURS is characterized by a marked limitation of adduction, and type 3 DURS is characterized by a combination of marked limitation of both 51. Mutations of the following three genes have been demonstrated to lead to DURS.

CHN1

CHN1 encodes α2-chimaerin, a Rac-specific GTPase-activating protein, and its mutations can lead to either type 1 or type 3 DURS. The encoded product, α2-chimaerin, is crucial for transducing axon guidance signals to the cytoskeleton, which is responsible for leading axon growth and steering 46. In the first study to report CHN1 mutations, each of the seven studied pedigrees contained a different mutation. Among these, p.L20F, p.I126M, p.Y143H, p.A223V, p.252Q enhance the α2-chimaerin translocation to the cell membrane. In 71-87% of the chick embryos tested, overexpression of α2-chimaerin stalled the axon growth of oculomotor nerve prematurely52. Although CHN1 mutations are found in 35% of familial DURS, mutations were not observed in 140 sporadic DURS subjects with diverse ethnicity 53. CHN1 expresses throughout the central nervous system, including the cortex, implying an additional role for α2- chimaerin. Chn1 knockdown mice demonstrated a defect in locomotor coordination and a hopping gait due to axon wiring defects of the corticospinal tract and the spinal motor circuits.

Whether the ocular motor wiring depends on this function is unclear 46.

MAFB

MAFB encodes a basic region/leucine zipper transcription factor. Pathogenic mutations 11

have been identified from four different type 3 DURS pedigrees: p.N268Mfs*125, p.G147Rfs*78, p.Q215Rfs*10, and a heterozygous full gene deletion 54. Only the family with the p.N268Mfs*125 mutation presents with deafness. Functional analysis by creating Mafb knockout mice suggested a threshold model for MAFB function. While MafbWT/KO embryos showed hypoplastic abducens nerves, the MafbKO/KO embryos showed severe hindbrain malformation with an absence of abducens nerves. In addition, the glossopharyngeal and the vagus nerves were fused, and the oculomotor nerve aberrantly innervated lateral rectus muscle 54. On the other hand, a series of missense mutations in MAFB have been reported to lead to a rare skeletal disease, multicentric carpotarsal osteolysis syndrome 55. No phenotypic overlap has been reported between the two MAFB related disorders, presented as ocular disease or skeletal disease.

SALL4

SALL4 is a zinc finger transcription factor whose mutated form causes Duane-radial ray syndrome. In addition to Duane ocular anomaly, patients are characterized by forearm malformations. Other phenotypes, such as renal anomalies, are observed in some cases. In 2002, two independent studies identified a total of eight heterozygous mutations in eight different families 56,57. Later, a series of different mutations have been identified in different patients 58.

However, specific ocular motor nerves and muscles abnormalities have not been reported in different Sall4 mice models 59,60.

1.4.2 Congenital fibrosis of the extraocular muscles

Congenital fibrosis of the extraocular muscles (CFEOM) is also divided into three subtypes with different clinical presentations. CFEOM1 is presented with bilateral ptosis and 12

restricted vertical gaze. CFEOM2 is characterized by bilateral ptosis with eyes fixed in an exotropic position. CFEOM3 demonstrates more variable clinical features and is genetically heterogeneous. It is diagnosed in a family if at least one member does not have classic findings of CFEOM1 or 2. Review of three genes for which alterations lead to CFEOM follows.

KIF21A

KIF21A encodes a kinesin motor protein, and its mutations have been identified in mainly

CFEOM1 patients, and rarely some CFEOM3 patients (e.g. p.M947I). CFEOM1 related mutations cluster within the protein stalk domain for dimerization (e.g. p.M947V, p.M947R, p.R954W, p.R954Q, p.I1010T), and the axon wiring to the superior rectus muscle has been altered, leading to an inability to elevate the eye 61. KIF21A p.R954W substitution knock-in mice recapitulate the CFEOM1 phenotype, showing the failure to divide the oculomotor nerve into superior and inferior divisions 61.

PHOX2A

PHOX2A encodes a paired-like homeodomain transcription factor with a role in the development of the autonomic nervous system by regulating the expression of tyrosine hydroxylase and dopamine beta-hydroxylase. PHOX2A mutations cause CFEOM2, which is inherited in an autosomal recessive manner. In the first report, two splice site variants (IVS1, G-

A, +1 & IVS2, G-A, -1) and a point mutation (p.A72V) were identified in four pedigrees with

CFEOM2. The oculomotor and trochlear nerves are absent in these patients 62. PHOX2A has a critical role in neuron differentiation, and the Phox2a-/- null mouse and zebrafish with homozygous point mutations showed the absence of both nuclei. In addition, locus coeruleus and parasympathetic ganglia are absent, and cranial sensory ganglia are atrophic 9.

13

TUBB3

TUBB3 encodes a class III member of the beta tubulin protein, which heterodimerizes with alpha tubulin to form microtubules. Tischfield and coworkers identified eight different heterozygous mutations for CFEOM3 families. Some genotype-phenotype correlations have been observed: p.E410K and p.D417H had a severe phenotype with additional facial weakness and learning disabilities; p.D417N had peripheral neuropathy and hypoplasia of the corpus callosum, and p.R262C had the mildest phenotype, from severe bilateral CFEOM3 to moderate unilateral or mild form of the syndrome 63. Heterozygous p.R262C mice appear normal but homozygous animals die after birth from suffocation 64. In the mice, the oculomotor nerve appears to deviate towards the superior oblique muscle, the trochlear nerve stalls, and the trigeminal projections are disrupted. Different mutations, either increasing or decreasing microtubule stabilization, can alter neuronal function, and different phenotypes are present likely due to disrupted balance 64.

1.4.3 Cranial nerves and beyond: is the axon growth signaling pathway the whole story?

Only a subset of the CCDDs can be explained by variants in one of the identified genes.

Additional contributing genes with different properties are likely not reflected in the existing literature. An overall analysis of causal genes for CCDDs can help to shed light to the involved mechanism. Except for horizontal gaze palsy with progressive scoliosis (ROBO3), patients with all the other CCDDs are presented with aberrant cranial nerve(s), suggesting an underlying defect for the ocular motor axon growth. These genes can be mapped to specific cranial nerves and developmental stages 46: 1) ROBO3 – the commissural axon guidance that connects both sides of the nervous system 65; 2) HOXB1 – the facial nerve differentiation; 3) HOXA1, SALL4, 14

CHN1, MAFB – the abducens nerve development; 4) PHOX2A – trochlear nerve differentiation; and 5) PHOX2A, KIF21A, CHN1, TUBB3 – oculomotor nerve development.

Signaling pathways for axon development and pathfinding are likely the underlying cause: there is evidence to support mutated Tubb3 causes a decreased association of Kif21A with tubulin, affecting the axon growth cone dynamics; CHN1 encoded α2-chimaerin may act as a switch that integrates both attractant and repellent guidance cues when the oculomotor nerve reaches the edge of the eye field and influences the pathfinding. However, these three genes are widely expressed throughout the developing nervous system and may have broader functions than axon development. For instance, TUBB3 mutations can cause malformations of the cortical development while the CFEOM phenotype is absent; mutated CHN1 results in additional defects of the corticospinal tract and the spinal motor circuits in the cortex, and CHN1 lies in a locus associated with autism, epilepsy and schizophrenia 46. In addition, mutated ROBO3 can lead to the absence of bulging of the abducens nuclei with intact nerves, and its intronic polymorphisms have also been associated with autism 65.

Perhaps the impairment in axon differentiation and guidance is central to the pathophysiology of CCDDs and some other strabismus cases. When this process is disturbed, axon mal-wiring leads to the defect of ocular motor nerves. On the other hand, genes contribute to the development of brain regions, such as fusion centres and visual cortex, sending signals to the ocular motor nerves may also contribute to strabismus. The cortical roles of CHN1, TUBB3,

ROBO3, and the lack of cranial nerve anomalies in many strabismic patients through MRI study suggest the involvement of additional mechanism(s) in non-CCDDs strabismus. Study of non-

CCDD strabismus is likely to expand our understanding of strabismus and its genetics beyond the ocular motor nerves. 15

1.5 Thesis objectives

Strabismus is a common condition affecting both children and adults. Disruptions of different tissues, including extraocular muscles, orbital connective tissues, cranial nerves, and the brain as well as environmental factors can contribute to strabismus 16,21. Targeted studies on a subset of patients with distinct phenotypes and technological advances have led to an evolved understanding of the causes of strabismus from muscle fibrosis to nerve dysinnervation (now called congenital cranial dysinnervation disorders). With the increased capacity of next generation sequencing, more genes and corresponding mutations will be identified in other subgroups of strabismus.

I studied a large family with isolated strabismus inherited in an autosomal dominant manner during my MSc degree and identified a locus for this family. However, like many other linkage analysis studies for isolated strabismus, at the time I could not identify a specific likely casual variant. In my Ph.D., I continue to phenotype and genotype the subjects, aiming to identify a causal mutation in the subject family (Chapter 4). At the core of my thesis research is the hypothesis that a subset of strabismus cases arises from simple Mendelian genetic mechanisms. Each chapter touches upon the core properties of genotype and phenotype, with sharpening focus from populations to a specific family. The observed heterogeneity of strabismus findings motivated me to study strabismus genetics with a gene network approach

(Chapter 2), allowing information spanning diverse genes to be integrated into a more holistic view of the biological pathways, processes and tissues involved in eye alignment conditions.

Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities associated with 7000+ diseases and is used as an important resource for gene curation in Chapter

266. Similarly, I explored the relationship between phenotypes in the analysis of a set of 16

individuals with intellectual disability and strabismus (Chapter 3). The research culminates with a deep exploration of the family I previously studied, using more modern technologies and data from additional family members to narrow the candidate region and to identify a specific likely casual variant that appears to regulate the FOXG1 gene (Chapter 4).

17

Table 1.1 Strabismus genes with defined molecular mechanism47

Gene Gene Associated Name Gene Identifier Description Disorder(s) Inheritance Clinical Presentation Orbital and Intracranial Findings Abnormalities in the abducens and oculomotor nerve with or without superior oblique muscle CHN1 ENSG00000128656 chimaerin 1 DURS2 AD Type 1 or Type 3 Duane or supraduction deficit hypoplasia. Small optic nerves. Normal extraocular muscles. Hypoplastic Bosley-Slih- Bilateral type 3 DURS, sensorineural hearing abducens nerve. Hypoplastic or absent internal Alorainy loss, malformations of the cerebral vasculature, carotid arteries. Occasional duplication of the HOXA1 ENSG00000105991 homeobox A1 syndrome AR cardiac malformations, autism vertebral artery. Horizontal gaze restriction, intellectual disabilities, sensorineural hearing loss, cardiac Athabascan malformations, facial weakness, central Brain Dysgenesis hypoventilation, cerebral vasculature HOXA1 ENSG00000105991 homeobox A1 syndrome AR malformation Hereditary congenital facial HOXB1 ENSG00000120094 homeobox B1 paresis 3 AR Esotropia, bilateral facial palsy, deafness Bilateral absence of the facial nerve Hypoplasia of oculomotor nerve or abducens nerve (less often). Hypoplasia of levator palpebrae and superior rectus muscles. Bilateral non-progressive restrictive Misinnervation of lateral rectus muscle by ophthalmoplegia with blepharoptosis. Eyes are oculomotor nerve. kinesin family CFEOM1, infraducted in resting position with limitation Reduction of mean optic nerve size compared KIF21A ENSG00000139116 member 21A CREOM3B AD of vertical movements with normal subjects. MAF bZIP transcription Type 1 or Type 3 Duane with or without Abnormalities in the abducens and oculomotor MAFB ENSG00000204103 factor B DURS3 AD deafness nerve. Enlarged lateral rectus muscles with all other paired like Profound ptosis, restrictive ophthalmoplegia extraocular muscles comparatively hypoplastic. PHOX2A ENSG00000165462 homeobox 2A CFEOM2 AR with exotropia and poorly reactive pupils Absence of oculomotor and trochlear nerves. Gaze palsy, roundabout horizontal, with guidance progressive Flattened pons, hindbrain midline cleft. Butterfly ROBO3 ENSG00000154134 receptor 3 scoliosis AR Horizontal gaze limitation, scoliosis configuration of brainstem on axial scans. Okihiro spalt like syndrome Duane, upper limb anomalies, with or without Hypoplastic to absent abducens nerves with transcription (Duane radial the following: renal anomalies, sensorineural normal intracranial and optic nerves. Evidence of SALL4 ENSG00000101115 factor 4 ray syndrome) AD deafness, gastrointestinal anomalies aberrant innervation of the lateral rectus muscle. Variable phenotype with unilateral or bilateral Enlarged lateral rectus muscles with all other tubulin beta 3 CFEOM1, blepharoptosis and ophthalmoplegia with extraocular muscles comparatively hypoplastic. TUBB3 ENSG00000258947 class III CFEOM3A AD limited vertical movements Absence of oculomotor and trochlear nerves.

18

Right Gaze Primary Gaze Left Gaze No deviation No strabismus

Esotropia

Exotropia Comitant strabismus

Hypertropia

Incomitant strabismus Hypotropia

a. b.

Figure 1.1 Common strabismus classifications. The left eye is affected in all the schematic representations. Classification is based on a) the direction of deviation; b) concomitant vs. incomitant

19

Chapter 2: Curation and bioinformatic analysis of strabismus genes supports functional heterogeneity and proposes candidate genes with connections to

RASopathies

2.1 Synopsis

Purpose: The low-resolution label “strabismus” covers a range of heterogeneous defects, which makes it challenging to unravel this condition. We aim for a coherent understanding of the causes of strabismus by understanding properties enriched in genes linked to strabismus.

Methods: We attempt to gain a better understanding of the underlying genetics by combining phenotype-based gene curation, diverse bioinformatic analyses (including gene ontology, pathway mapping, expression and network-based methods) and literature review.

Results: We identify high-confidence and permissive sets of 54 and 233 genes potentially involved in strabismus. These genes can be grouped into 10 modules that together span a heterogeneous set of biological and molecular functions, and can be linked to clinical sub- phenotypes. Multiple lines of evidence associate retina and cerebellum biology with the strabismus genes. We further highlight a potential role of the Ras-MAPK pathway.

Independently, sets of 11 genes and 15 loci tied to strabismus with definitive genetic basis have been compiled from the literature. We identify strabismus candidate genes for 5 of the 15 reported loci (CHD7; SLC9A6; COL18A1, COL6A2; FRY, BRCA2, SPG20; PARK2). Finally, we synthesize a Strabismus Candidate Gene Collection, which together with our curated gene sets will serve as a resource for future research.

20

Conclusion: The results of this informatics study support the heterogeneity and complexity of strabismus and point to specific biological pathways, the Ras-MAPK pathway, and anatomical structures (retina, cerebellum, and amygdala) for future focus.

2.2 Introduction

Strabismus is one of the most common pediatric chronic disorders. It is observed in 2-4% of children, a rate ten times more frequent than epilepsy or type 1 diabetes 67. Broadly defined, strabismus is a misalignment of the eyes, with clinical diagnostic criteria and treatment options well described in the literature 68. It ranges in severity and in some cases can lead to loss of stereovision, amblyopia, or blindness. Eye misalignment can impair social interactions and consequently mental health 69,70. The disorder can arise due to genetics, injury, or in response to other pathophysiological processes. While strabismus can occur in isolation, it is often observed concurrent with other phenotypes 71. The complexity of causes is difficult to unravel, in part because there are diverse phenotypes grouped under the label of strabismus. Eyes can be misaligned in multiple directions, problems can be intermittent or constant, and there is great range of severity. Thus, the low-resolution label ‘strabismus’ likely covers a collection of disorders that arise from multiple genetic/molecular mechanisms.

Genetic forms of strabismus may provide insights into mechanisms, as the causal genes may suggest molecular and pathophysiological processes that when disrupted lead to eye misalignment. While there are cases of simple Mendelian genetic transmission of strabismus, such cases are rare and have proven difficult to map to causal genes 71. A few loci have been reported based on familial studies, but replication has proven challenging: the only locus that has been found in more than one family was STBMS1, but the reported inheritance patterns were 21

different 37,38. A definitive genetic basis has been elucidated for some strabismus-related eye alignment disorders1,47,54. These loci and genes are both compiled within this report.

Strabismus is commonly observed as one phenotype amongst many in complex genetic disorders, and thus it is unclear in such situations whether strabismus is a primary or secondary consequence of the disrupted genes. Based on the diversity of patient characteristics across these disorders, however, it seems probable that strabismus is more frequently arising as a secondary response. Different anatomical sites have been associated to the diverse forms of strabismus, spanning orbits, muscles, cranial nerves and corresponding nuclei, and within the brain, including the visual cortex and other cortical areas involved in visual processing 1,21,71.

Strabismus arising from injury provides an alternative approach for understanding the underlying mechanisms, particularly within the brain. Conventionally, the function of a brain region can be derived by associating clinically-defined brain damage to the loss of certain functions. However, limited success has been achieved in ascribing regional alterations in the brain to the onset and progression of strabismus 72. Studies of acute onset of strabismus related to brain tumors suggest damage to fusion centres situated in the rostral-dorsal midbrain and the hindbrain; such centres are key regions for visual processing 73–75.

In order to gain greater insight into the molecular mechanisms involved in strabismus and the potential delineation of subclasses, in this report we take a gene systems-based approach to identify properties of genes linked to strabismus and to make evidence-based prediction (Figure

2.1). We compile high-confidence and permissive sets of strabismus genes based on curation of genes extracted from genetic information databases. We apply diverse forms of gene set analysis to the groups in order to identify enriched properties, including the analysis of pathways, patterns of gene expression and gene annotation. Based on the results, we propose prenatal brain region- 22

specific alterations for subgroups of strabismus and suggest sets of genes for inclusion in future genetic studies. An integrated exploration of the candidate genes arising from the diverse analyses highlights the involvement of Ras-MAPK pathways.

2.3 Methods

2.3.1 Curation of strabismus gene lists

Two strabismus gene sets were created using different approaches and holding different levels of confidence. A high-confidence list was compiled by carefully querying the reference genetics resource Online Mendelian Inheritance in Man (OMIM) (July 2017) and supplementing the resulting initial gene list with genes described in the SysID-database (November 2017) 76,77.

Manual inspection of the corresponding literature was performed, following the illustrated curation process (Supplementary Figure A.1a). We refer to this high-confidence gene set as the

Stringent Strabismus set, or S-Strab. A second set, larger but lower in confidence, is based on semi-automated gene annotations. Using the Human Phenotype Ontology (HPO) database (Build

130) 78, genes linked to strabismus were selected as illustrated (Supplementary Figure A.1b,

HPO-based strabismus gene set). A union set was generated by combining the two gene lists and termed the Permissive Strabismus set, or P-Strab.

In addition, we compiled two lists based on recognized genetics of strabismus from the literature: (i) A list of genes identified as the defined genetic basis of strabismus disorders based on overall phenotype similarities between model animal and human patients (Established

Strabismus set, or E-Strab); (ii) a list of strabismus-associated genetic loci identified through linkage analysis. The genes within each locus were then obtained by using the Table Browser function of the UCSC Genome Browser (hg19). 23

2.3.2 Curation of gene lists for strabismus risk factors

Maconachie and coworkers summarized significant strabismus risk factors 16. These factors were manually matched to HPO terms, and genes annotated with these HPO terms comprised the risk factor gene list (RF set).

2.3.3 Gene set analysis

No single bioinformatic tool can capture and inform all of the relationships within a gene set. Each tool and approach has distinct benefits and limitations. Here, we utilized a series of platforms to explore different aspects:

Gene ontology enrichment analysis: The ClueGO app (Version 2.5.0) for Cytoscape (Version

3.5) was used to identify and visualize the gene ontology (GO) term clusters associated with the strabismus genes 79. ErmineJ (Version 3.0.3) with default settings was used to reduce the gene multi-annotation bias in GO term enrichment analysis .80 (Details for both are listed in

Supplemental Information.) The top scoring enriched modules of strabismus genes were reviewed and reported.

KEGG pathway mapping: Using the online KEGG pathway analysis tools (KEGG Mapper

Version 3.1), P-Strab was submitted for pathway analysis in human using default settings. The top pathways were reported, ranked by the number of P-Strab genes contained. A set of

“pathways” involved in specific forms of cancer is excluded from consideration (e.g. “Renal Cell

Carcinoma”), as cancer processes may include aspects that are not applicable to normal physiology. 24

Expression enrichment: We examined the gene expression pattern of the strabismus genes based on human expression datasets with the default settings of the following software: Enrichr, SEA

(Specific Expression Analysis), and ABAenrichment (accessed in January 2018). Enrichr81 queries enriched expression across the body, using data from 53 cell types or tissues within the

ProteomicsDB database. As the gene ontology enrichment analyses described above highlighted brain-related terms/pathways and the ProteomicsDB data lacks brain expression data, SEA 82 and

ABAenrichment 83 were screened to provide high resolution enrichment analysis across brain regions and over developmental stages based on the BrainSpan Atlas of the Developmental Brain dataset 84. Top enriched anatomical regions were reported from all three tools.

Interaction network analysis: The GeneMania app (Version 3.4.1) for Cytoscape (Version 3.5) was used to visualize potential relationships between strabismus genes based on different types of interactions. Distinct clusters within the overall network were explored based on the annotations of physical interactions and pathways within the GeneMania database 85. In addition to visual inspection, MCODE (Version 1.5.1) for Cytoscape (Version 3.5), with the “Fluff” option in addition to default setting, was used to find clusters in the network. “Fluff” allows nodes to belong to more than one cluster.

2.3.4 Phenotype annotation for GO modules

We used HPOSim (Version 1.2) with default settings to analyze Human Phenotype

Ontology (HPO) term enrichment 86 in the above-derived functional modules for strabismus

25

genes.

2.3.5 Candidate gene identification

To propose candidate genes for future inclusion in the strabismus gene set, we considered enrichment based on (i) all classes of interactions within GeneMania (automatically selected weighting), (ii) GO annotation interactions via GeneMania (GO-biological process weighting), and (iii) differential gene expression for selected brain regions based on early developmental transcriptome data (8 post conception weeks to 18 months) of BrainSpan 84. The later criterium introduces a specific focus on the brain consistent with the enrichment analysis results, and candidate genes in the final set are permitted that meet any two of the three criteria. The top fifty candidate genes based on P-Strab were reported for each of the weighting approaches, and the top 100 expression-based candidate genes for each of the selected brain regions were also reported. The overlapping genes were subjected to additional manual inspection. To provide additional candidate genes for established strabismus loci, we considered the top 10 candidate genes based on S-Strab generated by GeneMania (automatically selected weighting).

2.4 Results

2.4.1 Curation and evaluation of strabismus gene sets

In order to systematically study the molecular underpinnings of strabismus, and to classify the disease into potential molecular subcategories, we sought to construct a “gold standard” set of genes in which mutations cause disorders with strabismus as a described phenotype. To seed the gene list, we extracted a total of 95 unique genes with genotype-

26

phenotype information from the reference genetics resource OMIM and SysID, a published set of genes associated with intellectual disability with detailed phenotypic descriptions including ocular conditions. The use of SysID was motivated by the strong ties between strabismus and intellectual disability; the prevalence of strabismus in intellectual disability ranges from 27% to

63% 87,88. We then manually inspected all compiled genes by examining reported cases in literature, requiring multiple independent reports of patient cases in which disruption of the gene was linked to strabismus phenotype (at least three independent reports for autosomal dominant; at least two for autosomal recessive). This resulted in a total of 54 unique genes (56.8% of 95 unique genes), which formed the Stringent Strabismus Set (S-Strab; Table 2.1, Supplementary

Figure A.1a, Supplementary Table A.1a).

To obtain a larger gene list potentially providing higher statistical power in enrichment analyses, we collected genes from the Human Phenotype Ontology (HPO) database. 411 unique genes were obtained by intersecting a table of gene-phenotype pairs with a list of strabismus- related phenotype terms curated from a disease-phenotype table (Supplementary Figure A.1b).

We screened the primary literature for the 411 genes to identify those genes with at least one published report in which a strabismus phenotype was reported for a patient with a disrupted gene (a lower standard than applied in the construction of S-Strab), resulting in a set of 204 unique genes (49.6% of 411 genes) (Supplementary Table A.1b). An individual gene from large genomic regions tied to syndromes, such as Down syndrome, would only be included in our collection if a patient with a specific mutation in that gene was reported to have strabismus.

Twenty-five genes are present in the intersection of these 204 genes with S-Strab. The resulting union contains 233 unique genes and forms the Permissive Strabismus Set (P-Strab Set; Table

2.1, Supplementary Table A.1c), which receives the focus of subsequent analyses. Of the 233 P- 27

Strab genes, 19 (8.2%) are only from the OMIM set, 10 (4.3%) are only from the SysID set, 179

(76.8%) are only from the HPO set, and 25 (10.7%) are from two sets.

As an alternative approach to compile strabismus genes, manual literature review provided 11 genes (Established-Strab, E-Strab, Supplementary Table A.2a) and 15 loci (Loci-

Strab, L-Strab, Supplementary Table A.2b). To evaluate how well the P-Strab and S-Strab capture the E-Strab genes, we overlapped the above sets. Seven out of 11 E-Strab genes are present in S-Strab, while nine out of 11 are present in P-Strab. Only two genes (ROBO3,

HOXB1) were not detected in P-Strab. Examination of the HPO file revealed that ROBO3 is associated with “Horizontal supranuclear gaze palsy” and not the strabismus-related terms used in curation process. HOXB1 is associated with “Esotropia” in HPO, but the OMIM:614744

“Facial Paresis, hereditary congenital, 3” entry is labeled with IEA (inferred from electronic annotation) and thus was filtered out during the curation process. In summary, the overlap between the literature-based E-Strab and the curated P-Strab and S-Strab both confirms the quality of the curation process and indicates the continuous need to improve the annotations in the valuable HPO and OMIM resources.

We also sought to identify genes annotated with HPO terms representing known risk factors for strabismus16. After curation, a set of 119 risk factor-annotated genes was generated, which we term the RF set (Supplementary Table A.3b). The RF set overlaps P-Strab for 33 genes

(28% of RF) (Appendix A - Supplementary Results).

2.4.2 Identifying strabismus functional modules through gene set analyses

The strabismus genes are subjected to four types of gene set analyses: gene function

28

enrichment, KEGG pathway mapping, gene expression enrichment, and interaction networks.

The functional enrichment and pathways mapping analyses identify over-representation of gene modules as defined by gene ontology (GO) annotations or pathways annotations. The goal of the expression enrichment analysis is to identify subgroups of genes that are co-expressed in a tissue-specific or developmental stage-specific manner (details in the subsections below).

Interaction networks are examined to explore the molecular interactions between the genes, as well as to identify additional genes that may provide mechanistic insights into pathophysiological processes of strabismus.

Function enrichment analysis

To provide insight into the biological functions of strabismus genes, we analyzed the 233

P-Strab genes for GO biological process terms. Information for 229 genes was available for the analysis, of which 162 were represented in 29 enriched GO clusters (Bonferroni step-down corrected p<0.05; Figure 2.2). We focused on clusters with more than 10 genes and further combined clusters with similar semantic meaning. For example, we combined “cell morphogenesis involved in neuron differentiation” and “central nervous system neuron differentiation” into a “neuron differentiation” group. In this manner, we obtained a final set of

10 GO modules: “camera-type eye development”, “inner ear”, “cranial nerve formation”,

“neuron differentiation”, “hindbrain development”, “diencephalon development”, “cardiac muscle”, “embryonic skeletal system”, “histone acetylation”, and “kidney”. (The corresponding member genes are listed in Supplementary Table A.4a.) An extended analysis using ErmineJ to reduce annotation bias was performed, which additionally highlighted “photoreceptor cell” (an important component of the camera-type eye development category) and “ciliary transition zone” 29

associated genes (Appendix A - Supplementary Methods and Supplementary Table A.4b).

Pathway mapping

The P-Strab Set was subjected to pathway analysis using the KEGG Mapper, excluding the broad “metabolic pathways” category (Supplementary Table A.5a). P-Strab genes were mapped to each of the top eight pathways: MAPK, PI3K-Akt, Ras, FoxO, Rap1, and neurotrophin signaling, regulation of actin cytoskeleton, and focal adhesion (Supplementary

Table A.5b). All of these pathways overlap extensively with the broader MAPK signaling pathway (highlighted in Supplementary Table A.5b).

Expression enrichment analysis

We analyzed P-Strab genes with a series of tools that perform expression analysis, incorporating different underlying data resources. Enrichment analyses using Enrichr gene expression data of human tissues revealed retina as the only enriched tissue for RNA expression of strabismus genes (p <0.05; up-regulated genes from BioGPS). Human adult retina is the most enriched tissue for protein expression (adjusted p = 0.08; up-regulated proteins from

ProteomicsDB with 53 cell types or tissues81,89).

As the preceding GO analyses included several neural features, and the ProteomicsDB lacks brain expression data, we sought expression resources that could provide high resolution neural data. BrainSpan (accessible through the Allen Brain Atlas data portal) provides developmental expression data for human brains. We assessed if P-Strab genes were enriched in any of the pre-defined structures. First, we used the Specific Expression Analysis (SEA) tool, 30

finding that 219 of the 233 genes express in the brain region and developmental expression dataset. Gene expression enrichment was only observed in fetal developmental stages, specifically for both amygdala and cerebellum (p<0.05; Figure 2.3). The SEA system divides the data into 6 broad brain structures, resulting in relatively low-spatial resolution. To investigate expression patterns in more specific brain regions, we used the ABAenrichment tool (which uses the Allen Brain Atlas expression data), which provides analysis based on 16 brain regions83. In light of the SEA results, we restricted our focus to the analysis of early developmental stages, reporting five enriched regions in prenatal stages and five partially-overlapping regions in the data for 0-2 years of age (family-wise error rate (FWER) < 0.05; Table 2.2). One region is enriched in both the prenatal and 0-2 year stages: the posteroventral (inferior) parietal cortex

(IPC).

Interaction network analysis

Network analysis provides insights into the molecular relationships between strabismus genes and may provide gene clusters for comparison with other analyses. The 233 genes of P-

Strab form a single large network when all available types of interactions within the GeneMania database are considered (Figure 2.4a). To untangle the network and identify clusters of connected genes with interpretable biological meanings, both physical and pathway interactions networks were generated (Figure 2.4b). This second network consists of one large set of connected genes (116 genes), seven gene pairs (each with two members), and 102 genes without any connection. A highly-connected cluster within the 116 gene sub-network corresponds to the

“neurotrophin TRK receptor signaling pathway” (GO annotation). Further systematic analysis using MCODE revealed 6 clusters (Supplementary Table A.6). GO-annotated genes from the 31

“neurotrophin TRK receptor signaling pathway” sub-network constitute the core of Clusters 1 and 2, which largely overlap. These same clusters represent the above-defined strabismus GO

Module for Neuron Differentiation (44 genes, Supplementary Table A.4a), with 13 of 23 genes

(56.5%) in Cluster 1 and five of seven (71.4%) genes of Cluster 2 overlapping the module (44 genes). Furthermore, >35% of genes from Cluster 3 are present among genes that show high expression in amygdala and cerebellum compared to other brain regions (BrainSpan). Two of the six genes that make up Cluster 6 are present in GO Module H (Histone acetylation).

2.4.3 Linking strabismus functional modules to clinical phenotypes

Neither functional enrichment nor expression enrichment provides association to clinical observations. To examine if there are additional clinical characteristics associated with strabismus genes and to facilitate clinical classification of the strabismus phenotypes in the future, we examined clinical phenotypes (HPO terms) associated with each of the 10 strabismus

GO modules, listing the top 10 HPO terms for each module in Table 2.3. As expected, the top two terms are “strabismus” and “abnormal conjugate eye movement” for each of the 10 modules.

However, we also observed terms that are specific to individual modules, indicating modules have distinct characteristics. Strabismus in combination with specific clinical phenotypes may suggest the involvement of genes in a particular module and could facilitate the understanding of pathogenesis of strabismus. For example, for strabismus patients with issues limited to the eye, particularly aplasia and/or hypoplasia, genetic analyses might prioritize genes from module A.

For strabismus patients with intellectual disability, analyses may prioritize candidate genes from module B.

32

2.4.4 Strabismus candidate gene collection

GeneMania can propose candidates for gene sets based on weighted connections to member genes. Two lists of 50 candidate genes were generated: one using default GeneMania weightings (hereafter called the GM list); and one using the GeneMania option for GO- biological process weighting (GM-GO list). Both sets of genes were deposited into a Strabismus

Candidate Gene Collection (Supplementary Table A.7a). Sixteen genes overlapped between the two lists.

As a complementary approach, we pursued candidates exhibiting differential expression within the three brain regions highlighted above as important for strabismus in early development. The top 100 differentially expressed genes for these regions (cerebellum/cerebellar cortex (CB/CBC), amygdala (AMY), and posteroventral (inferior) parietal cortex (IPC)) were determined based on comparison to the rest of the brain structures represented in BrainSpan

(Supplementary Table A.7b).

In comparing the two GeneMania candidate gene sets with the three differential

BrainSpan expression-based candidate sets, we observed the following: no gene from the

CB/CBC list presents in the GM or GM-GO lists; two genes (TFAP2D and GLI3) from the AMY list present in both GM and GM-GO lists; two genes (NFIX and DAB1) from the IPC list present in GM list but not the GM-GO list. A total of 18 candidate genes are supported by at least two lines of evidence (Table 2.4), of which GMPPA, GLI3, and NFIX have reported strabismus phenotypes for patients in which the gene is disrupted 90–92.

33

2.4.5 Prioritizing genes from strabismus-associated genetic loci

We compiled a set of 15 genetic loci that have previously been associated with strabismus through linkage analysis (L-Strab, Supplementary Table A.2b), containing 869 genes.

Four L-Strab loci contain a total of seven genes that were included in P-Strab (8q12: CHD7;

X26.3: SLC9A6; 21q22: COL18A1, COL6A2; 13q12.2-13: FRY, BRCA2, SPG20). Of the remaining 862 genes, none overlap the 18 genes from our Strabismus Candidate Gene

Collection. Focusing on the 440 genes in the remaining 11 loci, we identified the gene in closest proximity to the S-Strab gene network using GeneMania (default settings): PARK2 (S-Strab proximity rank nine out of all human genes) locates on 6q24.2-6q25.1, which was identified in a family with infantile esotropia. PARK2 is found to be responsible for juvenile Parkinson’s disease, and Parkinson’s subjects show a higher prevalence of strabismus than controls93,94.

2.4.6 Associations between Ras-MAPK signaling and strabismus

Genes with a role in the Ras-MAPK signaling pathway were observed in both the P-Strab and the Strabismus Candidate Gene Collection. A total of 23 P-Strab genes have a known role in the Ras-MAPK pathway (Figure 2.5). We noticed that nine of these 23 genes cause

RASopathies: PTPN11, SOS1, RAF1, BRAF, KRAS, NRAS, HRAS, MAP2K1, MAP2K2 95,96.

None overlaps with the 11 E-Strab genes. Of the 18 strabismus candidate genes, five genes

(KSR1, MAPK3, ARAF, MAPK1, and RGL2) are also connected to the Ras-MAPK pathway.

2.5 Discussion

In this report we have identified sets of strabismus genes in order to assess molecular and biological mechanisms by which the diverse strabismus phenotype may arise, and to propose 34

additional candidate genes for future study. We compiled high-confidence and permissive collections of strabismus genes from reference databases, and curated each based on the published literature. Analysis of the resulting 233 permissive strabismus genes was performed using gene set analysis methods, including tools based on GO-term annotation, pathway assignment, protein interaction, and expression patterns. The analyses indicate that the compiled set of strabismus genes consists of multiple subsets, consistent with the heterogeneity of strabismus phenotypes. The analyses, including literature review, collectively highlight three anatomical regions and one signaling pathway associated with the strabismus genes: retina, cerebellum, and amygdala; and the Ras-MAPK signaling pathway. Candidate genes predicted by network analysis of our stringent curated set of strabismus genes (S-Strab) included PARK2, a gene situated within a previously published strabismus locus. Expanding the curated strabismus gene sets, we generated a Strabismus Candidate Gene Collection composed of 18 genes (Table

2.4) that arose using two or more analysis methods.

In performing the literature curation for this study, we observed a challenge related to patient descriptions. Multiple phenotypes are grouped under a single disorder name, but individual patients commonly present with only subsets of these characteristics. Greater resolution in reporting the observed phenotypes for patients would facilitate future studies. In many cases, the observation of strabismus in a patient case report was not clearly specified.

Moreover, strabismus may not be reported in subjects with other more prominent phenotypes, such as brain malformation, limb defect, and other organ malfunctions. While annotation will improve as controlled vocabularies are more widely adopted, the term “strabismus” is lacking in resolution, and the field would benefit from establishing systematic ways to describe the phenotype more richly. 35

Bioinformatic analyses, including network and functional enrichment approaches, are commonly applied for gaining insights about the biological foundations of gene sets. Their success largely depends on the molecular coherence of the system under study. Tightly connected and directly interacting systems, such as protein complexes and cellular organelles, allow for high-specificity predictions in contrast to the more molecularly heterogeneous gene sets arising from phenotypic associations 77,97. Our analyses of strabismus genes underscore the relatively broad phenotypic characteristics of strabismus, highlighting heterogeneity at the molecular level. Despite this constraint, we do generate specific observations, including potential ties between strabismus and the Ras-MAPK pathway and specific tissues (retina, cerebellum and amygdala).

The retina and cerebellum have published ties to strabismus, while the amygdala has not previously been associated with eye movement. Several observations are in line with the observed enrichment of strabismus genes in retina functions. For example, Gursoy and coworkers reported an overall strabismus incidence of 34% among 66 infants with retinopathy of prematurity (ROP) from Turkey. To focus on the effect of ROP, they excluded cases with neurologic abnormalities 98. Similarly, supporting the association between strabismus and the cerebellum: Mathews and coworkers reported that 9 out of 13 (69%) cerebellar hypoplasia cases exhibited strabismus 99; And Hufner and colleagues reported a 4.5 times increased risk of ocular alignment abnormalities in subjects with cerebellar dysfunction 100. Functional MRI provides further support for a role of the cerebellum in strabismus. Tan and colleagues compared congenital concomitant strabismus and control groups, finding significantly lower amplitude of low-frequency fluctuation in bilateral medial frontal gyrus, and higher values in the bilateral cerebellum posterior lobe and left angular gyrus101. On the other hand, the relationship between 36

amygdala and strabismus is less clear and could be controversial. An indirect link is provided by a functional MRI study that shows altered brain spontaneous activity in bilateral cingulate gyrus in patients with concomitant strabismus, associating strabismus and disruptions of the limbic system of which the amygdala is a key component 102. However, excitotoxin-induced amygdala lesions in adult monkeys did not lead to strabismus 103. The effect of amygdala defects during fetal development on eye alignment is unknown. Considering that the onset of most strabismus is in infants and children, the influence of amygdala defects during early development on eye alignment should be investigated. Overall the role of individual tissues in strabismus will require continuing exploration.

Recognizing the breadth of tissues that have physical relevance to strabismus (e.g. extra- ocular muscle), the absence of observable P-Strab gene enrichment from such tissues raises the potential for a bias in the original gene collection process. While most (95.7%) of the P-Strab set was compiled without regard to tissue, a small set of 10 genes (4.3%) were identified based on their presence in the brain-related SysID database (although all genes had to meet the same curation criteria for final inclusion). The small portion of genes arising from the SysID screen does not account for the enrichment for brain-related and neural features in the ontology enrichment results, and only two of the 18 Strabismus Candidate Gene Collection were dependent on brain expression for their inclusion. While not over-represented in the P-Strab set, genes active in various other anatomical regions of clinical relevance to strabismus (e.g. extraocular muscles) may emerge in the future and are worthy of future exploration as new data resources emerge.

Our informatics-based analyses are exploratory and hypothesis generating in nature.

Experimental follow-up will ultimately be required to advance our understanding of the 37

molecular and genetic underpinnings of strabismus. The output of this study can be used to inform future studies in diverse ways, such as: 1) specialist labs focused on individual members of the 18 Strabismus Candidate Gene Collection genes may explore model organisms for strabismus-related phenotypes; 2) strabismus cohort-based association studies may seek genetic variation in these gene sets (allowing the use of smaller cohorts than required for genome-wide studies); and 3) applied interpretation of genome sequences for patients with strabismus may detect alterations in these genes.

One approach for validating network-based predictions is the study of animal models, but such research is difficult for strabismus. Recapitulating phenotypes equivalent to strabismus is challenging in animal models, as common models feature lateral-eyes with little binocular overlap. The alignment of forward oriented eyes in primates or felines are crucial for the stereoscopic or binocular vision, giving depth perception 27. There have been indications of a binocular zone of vision in mice, but phenotyping methods are not well established 29. One alternative is to study observable phenotypes associated with strabismus. The study of MAFB, a gene present in both P-Strab and E-Strab, illustrates this strategy: MAFB mutations cause Duane

Retraction Syndrome, in which MRI imaging has shown absence or hypoplasia of the abducens nerve (cranial nerve VI) and aberrant lateral rectus (LR) muscle innervation by oculomotor nerve

(cranial nerve III)54. Mafb knock out mice show an embryonic developmental abnormality of cranial nerve VI, and a secondary innervation with cranial nerve III to LR 54. Given the limitations of animal models, deep clinical characterization of patients may offer the most important approach to understand the mechanisms by which gene disruptions contribute to strabismus.

A striking consistency across our analyses of strabismus genes was recurrent ties to the 38

Ras-MAPK pathway. One potential mechanism by which MAPK signaling pathway could relate to strabismus is via its role in neurotrophin signaling, which is expressed along the visual pathway from retina to the visual cortex during visual system development 104. Treatment of extraocular muscles in infant monkeys with BDNF, one of the neurotrophin pathway stimulators, was found to enlarge neuromuscular junctions in slow myofibers 105, suggesting a potential role for the neurotrophin pathway in eye muscle development. However, high doses of BDNF delivery to one lateral rectus muscle failed to elicit strabismus in infant monkeys 105, and expression of BDNF was also not altered in strabismic human eye muscles 106. In parallel, the neurotrophin family of proteins has roles in neuron development as well as adult brain plasticity and neurodegeneration 104,107. BDNF has been shown to inhibit ocular dominance column formation in the cat visual cortex. This effect may act through control of axonal branching during development and likely affect binocular sensory input and processing 108. The potential effects of

BDNF on strabismus could be located at either or both of extraocular muscles and the brain. The investigation of the spatiotemporal roles of BDNF may reveal the mechanism(s) for a group of strabismus.

The link to the Ras-MAPK pathway is most prominently observed for RASopathy genes.

Of the 24 genes linked to a RASopathies 96,109, nine are present in our curated P-Strab gene list.

For an additional five genes, NF1, LZTR1, SYNGAP1, CBL, SHOC2, we found literature support for the presence of strabismus as a phenotype of the associated RASopathy. This brings the total of RASopathy genes with a link to strabismus to 14 genes, 58% of all 24 RASopathy genes. For instance, studies of Noonan Syndrome, one of the RASopathies, indicate that 48% of patients exhibit strabismus 110; the syndrome can be caused by variants in multiple genes 109. In addition, the ablation of NF1 in a mouse model of Neurofibromatosis type 1 led to abnormal development 39

of cerebellum 111. Within our strabismus candidate gene set, five of the 18 genes link to Ras-

MAPK pathways. These finding suggest that mechanisms underlying RASopathies may also underlie a subgroup of strabismus. Thus, studies of RASopathies can shed light on strabismus pathology, and may help connect genetic and anatomical abnormalities to strabismus development. Future work could include the assessment of strabismus prevalence across the spectrum of RASopathies. The informatics-driven strabismus gene analysis presented in this report highlights gene collections for the strabismus research field, including a set of candidate genes for future consideration, for which detailed characterization of human patients with disruptions in these genes will be an important step in further assessing their functional roles.

40

Figure 2.1 Gene set analysis overview. The analyses in this paper can be grouped into three major steps, which are represented by three different colours: compilation and curation, bioinformatic analyses, and interpretation. Circles represent bioinformatic tools used.

41

Kidney

Heart development

Diencephalon development

Embryonic skeletal system

Camera-type eye development

Inner ear development

Histone acetylaon Neuron differenaon

Hindbrain development

Figure 2.2 ClueGO enrichment analysis of Gene Ontology Biological Process terms for P-Strab genes. Nodes represent significantly enriched terms (Bonferroni step-down corrected p <0.05) and are grouped (denoted by the same colour) based on overlapping gene lists.

42

Figure 2.3 Specific Expression Analysis (SEA) across brain regions and developmental stages for P-Strab genes. Columns represent brain structures; rows represent developmental stages. Hexagon rings represents brain region gene lists with varying levels of confidence for a role in that specific brain region and developmental stage, with the central hexagons representing the most confident gene lists. Hexagons size represents the size of the gene lists. Colours indicate the statistical support for enrichment of the P-Strab genes in the brain region gene list (see the bar at the bottom for corresponding p-values). Gray (no colour) indicates that the p-values is above 0.1.

43

a. b.

HIST1H4B CC2D2A

COL11A1 KIF7 COL11A2 SLC13A5 COL18A1 CEP290 LAMB2 LAMA1 FMR1 TBX15 VPS13B COL4A1 GBA SETX DCN L1CAM TMEM237 DHCR24 NGLY1 GBA NSUN2 FBN1 COL6A2 CD96 KDM6A MKS1 CDCA7 FKTN ZC4H2 FMR1 MECP2 KANSL1 KANSL1 HESX1 KIAA2022 C5orf42 RPGR RPS6KA3 ASXL1 SLC39A8 PLK4 NHS PUS3 PEX2 SRCAP TMEM216 RAI1 SMC1A XPA TCTN1 GLRB AGK PHOX2B CSPP1 RPS6KA3 KDM1A CPA6 ATP6V0A2 SETD5 BRCA2 SMC3 KMT2D PEX11B PHOX2A HOXA1 GNAT2 HDAC8 FGD1 CLIP1 SLC9A6 ARID2 SHH TMEM67 HECW2 TFAP2BAUTS2 PITX2 OPHN1 SMAD4 FKRP ABCB7 GLRX5 TFAP2A LRIT3 RPGRIP1L CHD7 MEF2C MEF2C FOXC2 NSD1 HACE1 KMT2A DMPK SATB2 KDM5C WDR73 ACOX1 CREBBP FRY MID2 UBE3A KAT6B ARID1A SRCAP KAT6B SOBP SPG20 RAD21 FANCG HECW2 ARID1B TFAP2BOCRL SKI MAP2K2 CASK AP1S2 EP300 DHCR24 SMAD4 SKI SOX3 SATB2 ALDH7A1 FRMD4A UBE3B FANCA SALL1 CHN1 CREBBP FGF14 EP300 ARID1A SALL1 ZEB2 ANKRD11 CDC42 PEX1 ALG9 RAF1 MAP2K1 KMT2A TCF12 C12orf65 KIF21A PEX11B FGFR3 BRAF GYG2 ATM TAF1 KRAS HRAS PAX6 ELOVL4SOX5 FBN1OPA1 RAP1A KDM1A FGF3 FGFR2 NRAS MAFB AUTS2 COL25A1COL4ARAP11 B PTPN11 MECP2 TWIST1 HOXA1 NRAS SOS1 IKBKG ZEB2 FGFRCOL18A3 1 ADSL ALG6 EMC1 ATP6V1E1 RAP1A LONP1 NDP OCA2 TCF12 PTEN RAP1B RET KDM6A KMT2D TUBB2B GJA1 TRPM1 ERCC2 HDAC8 SOX2 SALL4 HNRNPU DMPK PLK4 ARID1B GTF2E2 HNRNPU XPA CHRNG FGFR2 CAMTA1 SOS1 PTPN11 SLC39A6 TAF1 SOX5 IKBKG CDC42 CHD7 ACOX1 PHOX2B ZIC1 TFAP2A FGD1 CHN1 ATM GATAD2B FGF3 SOX2 KRAS PTEN RAD21 NRXN1 GPR143 CCDC174 MAFB GATAD2B DPAGT1 SMC3 GTF2E2 ALX4 LAMB2 COL11A1 ASXL1 PORCN SMC1A PAX6 ATP6V1E1 AMMECR1 AP4M1 OPHN1 OCRL CEP290 L1CAM DCN PAX3 GRM6 HESX1 FA2H COL6A2 CLIP1 BRCA2 ERCC2 TYR TYRP1 HRAS SLC2A1 TUBB3 GPR179 LAMA1 PAX3 BRAF SETX CDH3 KIF7 ARID2 NALCN RAI1 RAF1 DDOST FGF14 NPHP1 PITX2 RET PMM2 FANCG LONP1 DHCR7 AP4M1 MAP2K1 RPGR CTDP1 CNGA3 OTX2 SOX3 VLDLR GYG2 AHI1 ALG11 FANCA ARX PAX2 CTDP1 PEX6 SSR4 PEX1 INPP5K UNC80 CRB1 TWIST1 MAP2K2 SSR4 GJA1 SIL1 COL11A2 TUBB3 SLC12A6 RNF135 DDOST CACNA1F SHH PEX6 RPGRIP1L NOG TMCO1 POMGNT1 MFRP ALX4 MCOLN1 TUBB2B MAB21L2 FOXC2 PHOX2A IQSEC2

COG8 TYR TYRP1 CSPP1 SETD5 ANKRD11 CNGA3 GNAT2 NYX POMT2 ECEL1 SURF1 PROKR2 HYLS1 MKS1TMEM67 CABP4 CACNA1F CASK SLC6A19 NPHP1 FOXL2 GMPPB AHI1 NRXN1

SLX4

ADAT3 CABP4

Red: physical interacton Blue: pathways Genes in the “Neurotrophin TRK receptor signaling pathway”

Figure 2.4 GeneMania interaction networks for the P-Strab genes. (a) All types of interactions. (b) Physical interactions and pathways. Nodes correspond to genes, edges to interactions. Edge width represents confidence. Different edge colours represent different types of data, details can be found on the GeneMania website. In (b), red indicates physical interactions, and blue indicates pathways. Genes that do not connect to any other gene in the P-Strab set are not shown.

44

(inactive) RASSF1 Mst1

GDP Nore1 Mst1 Ras CACNA1F (GAPs) CACN PAK TIAM1 Rac (GEFs) NF1 Rho GRB2 GAB1 IKBKG SOS1 RasGAP SHC2 PTPN11 IKK NFKB FGFR3 IP3 Ca2+ FGF3 FGFR2 PI3K Akt/PKB +p BAD BCL-X PIP3 GF RTK PLCy RasGRPs DAG PI3K-Akt signaling AFX FasL pathway TCR ZAPZAP7070 LLATAT T cell receptor signaling pathway AF6 MNK1/2 MAPK signaling +p+ pathway RPS6KA3 RSK2 +p+ GPCR GNB5 RasGRFs CREB +p (active) SHOC2 RafA Proliferation ELK1 SRF c-fos PRKACA RafB +p +p differentiation MEK ERK DNA cAMP GTP ETS Ras Raf-1 KSR MAP2K1 MAPK1 PLA NMDAR CALML6 MAP2K2 MAPK3 HRAS KSR1 Ca2+ IMP KRAS Rap1 signaling NRAS Repac Rap1 pathway RAP1A RAP1B JNK

RALGDS Sec5 TBK1 NFkB RGL Ral PLD CDC42 RGL2 RalBP1 Rac

PKC DAG PLCE1

IP3 Ca2+

ABL1 RIN1 RAB5

RLIP76 Arf6 Rac1

Components of the Ras-MAPK pathway Compnents of the Ras-MAPK pathway that are present in either the P-Strab Set or the Candidate Gene Collecon

Figure 2.5 Strabismus genes mapped to the human Ras-MAPK pathway. If the box represents a group of genes, the specific strabismus genes are indicated next to the yellow box. Major connecting pathways are indicated by blue boxes. This figure is based on the intersection of KEGG pathways hsa04014 (Ras Signaling) and hsa04010 (MAK signaling).

45

Table 2.1 Permissive strabismus gene set (P-Strab). Gene symbols and stable IDs are listed. The 54 genes present in the S-Strab set are bolded (and listed first). (See Supplementary Tables 1a-1c that provide additional detail.) Gene Gene stable ID ALG9 ENSG00000086848 GMPPB ENSG00000173540 RAI1 ENSG00000108557 ADAT3 ENSG00000213638 ALX4 ENSG00000052850 GNAT2 ENSG00000134183 RAP1A ENSG00000116473 ALG11 ENSG00000253710 AMMECR1 ENSG00000101935 GPR143 ENSG00000101850 RAP1B ENSG00000127314 ALG6 ENSG00000088035 ANKRD11 ENSG00000167522 GPR179 ENSG00000277399 RET ENSG00000165731 AP4M1 ENSG00000221838 AP1S2 ENSG00000182287 GRM6 ENSG00000113262 RMRP ENSG00000269900 ARID1B ENSG00000049618 ARID1A ENSG00000117713 GTF2E2 ENSG00000197265 RNF135 ENSG00000181481 AUTS2 ENSG00000158321 ARID2 ENSG00000189079 HACE1 ENSG00000085382 RPGR ENSG00000156313 C12ORF65 ENSG00000130921 ARX ENSG00000004848 HDAC8 ENSG00000147099 RPGRIP1L ENSG00000103494 CAMTA1 ENSG00000171735 ASXL1 ENSG00000171456 HECW2 ENSG00000138411 RPS6KA3 ENSG00000177189 CASK ENSG00000147044 ATM ENSG00000149311 HESX1 ENSG00000163666 SALL1 ENSG00000103449 CHN1 ENSG00000128656 ATP6V0A2 ENSG00000185344 HNRNPU ENSG00000153187 SATB2 ENSG00000119042 CLIP1 ENSG00000130779 ATP6V1E1 ENSG00000131100 HRAS ENSG00000174775 SETD5 ENSG00000168137 COL25A1 ENSG00000188517 BRAF ENSG00000157764 HYLS1 ENSG00000198331 SETX ENSG00000107290 COL4A1 ENSG00000187498 BRCA2 ENSG00000139618 IKBKG ENSG00000269335 SHH ENSG00000164690 CPA6 ENSG00000165078 CABP4 ENSG00000175544 INPP5K ENSG00000132376 SIL1 ENSG00000120725 DPAGT1 ENSG00000172269 CACNA1F ENSG00000102001 KAT6B ENSG00000156650 SKI ENSG00000157933 ECEL1 ENSG00000171551 CC2D2A ENSG00000048342 KDM1A ENSG00000004487 SLC12A6 ENSG00000140199 FGFR2 ENSG00000066468 CCDC174 ENSG00000154781 KDM5C ENSG00000126012 SLC13A5 ENSG00000141485 FGFR3 ENSG00000068078 CD96 ENSG00000153283 KDM6A ENSG00000147050 SLC2A1 ENSG00000117394 FOXC2 ENSG00000176692 CDC42 ENSG00000070831 KIF7 ENSG00000166813 SLC6A19 ENSG00000174358 FOXL2 ENSG00000183770 CDCA7 ENSG00000144354 KMT2A ENSG00000118058 SLC9A6 ENSG00000198689 FRY ENSG00000073910 CDH3 ENSG00000062038 KMT2D ENSG00000167548 SLX4 ENSG00000188827 GYG2 ENSG00000056998 CEP290 ENSG00000198707 LAMA1 ENSG00000101680 SMAD4 ENSG00000141646 HIST1H4B ENSG00000278705 CHD7 ENSG00000171316 LAMB2 ENSG00000172037 SMC1A ENSG00000072501 HOXA1 ENSG00000105991 CHRNG ENSG00000196811 LONP1 ENSG00000196365 SMC3 ENSG00000108055 IQSEC2 ENSG00000124313 CNGA3 ENSG00000144191 LRIT3 ENSG00000183423 SOS1 ENSG00000115904 KANSL1 ENSG00000120071 COG8 ENSG00000272617 MAP2K1 ENSG00000169032 SOX2 ENSG00000181449 KIF21A ENSG00000139116 COL11A1 ENSG00000060718 MAP2K2 ENSG00000126934 SOX3 ENSG00000134595 KRAS ENSG00000133703 COL11A2 ENSG00000204248 MCOLN1 ENSG00000090674 SOX5 ENSG00000134532 L1CAM ENSG00000198910 COL18A1 ENSG00000182871 MFRP ENSG00000235718 SRCAP ENSG00000080603 MAB21L2 ENSG00000181541 COL6A2 ENSG00000142173 MKS1 ENSG00000011143 SSR4 ENSG00000180879 MAFB ENSG00000204103 CPLANE1 ENSG00000197603 NALCN ENSG00000102452 TAF1 ENSG00000147133 MECP2 ENSG00000169057 CRB1 ENSG00000134376 NDP ENSG00000124479 TBX15 ENSG00000092607 MEF2C ENSG00000081189 CREBBP ENSG00000005339 NGLY1 ENSG00000151092 TCTN1 ENSG00000204852 MID2 ENSG00000080561 CSPP1 ENSG00000104218 NHS ENSG00000188158 TFAP2A ENSG00000137203 NEXMIF ENSG00000050030 CTDP1 ENSG00000060069 NOG ENSG00000183691 TFAP2B ENSG00000008196 NRXN1 ENSG00000179915 DCN ENSG00000011465 NPHP1 ENSG00000144061 TMCO1 ENSG00000143183 OCA2 ENSG00000104044 DDOST ENSG00000244038 NRAS ENSG00000213281 TMEM216 ENSG00000187049 PHOX2A ENSG00000165462 DHCR24 ENSG00000116133 NSD1 ENSG00000165671 TMEM67 ENSG00000164953 PHOX2B ENSG00000109132 DHCR7 ENSG00000172893 NSUN2 ENSG00000037474 TRPM1 ENSG00000134160 PMM2 ENSG00000140650 DMPK ENSG00000104936 NYX ENSG00000188937 TUBB2B ENSG00000137285 POMT2 ENSG00000009830 ELOVL4 ENSG00000118402 OCRL ENSG00000122126 TWIST1 ENSG00000122691 PUS3 ENSG00000110060 EMC1 ENSG00000127463 OPA1 ENSG00000198836 TYR ENSG00000077498 SALL4 ENSG00000101115 EP300 ENSG00000100393 OPHN1 ENSG00000079482 TYRP1 ENSG00000107165 SLC39A6 ENSG00000141424 ERCC2 ENSG00000104884 OTX2 ENSG00000165588 UBE3B ENSG00000151148 SLC39A8 ENSG00000138821 FA2H ENSG00000103089 PAX2 ENSG00000075891 UNC80 ENSG00000144406 SOBP ENSG00000112320 FANCA ENSG00000187741 PAX3 ENSG00000135903 VPS13B ENSG00000132549 SPG20 ENSG00000133104 FANCG ENSG00000221829 PAX6 ENSG00000007372 WDR73 ENSG00000177082 SURF1 ENSG00000148290 FBN1 ENSG00000166147 PEX1 ENSG00000127980 XPA ENSG00000136936 TCF12 ENSG00000140262 FGD1 ENSG00000102302 PEX11B ENSG00000131779 ZC4H2 ENSG00000126970 TMEM237 ENSG00000155755 FGF14 ENSG00000102466 PEX2 ENSG00000164751 ZIC1 ENSG00000152977 TUBB3 ENSG00000258947 FGF3 ENSG00000186895 PEX6 ENSG00000124587 UBE3A ENSG00000114062 FKRP ENSG00000181027 PITX2 ENSG00000164093 VLDLR ENSG00000147852 FKTN ENSG00000106692 PLK4 ENSG00000142731 ZEB2 ENSG00000169554 FMR1 ENSG00000102081 POMGNT1 ENSG00000085998 ABCB7 ENSG00000131269 FRMD4A ENSG00000151474 PORCN ENSG00000102312 ACOX1 ENSG00000161533 GATAD2B ENSG00000143614 PROKR2 ENSG00000101292 ADSL ENSG00000239900 GBA ENSG00000177628 PTEN ENSG00000171862 AGK ENSG00000006530 GJA1 ENSG00000152661 PTPN11 ENSG00000179295 AHI1 ENSG00000135541 GLRB ENSG00000109738 RAD21 ENSG00000164754 ALDH7A1 ENSG00000164904 GLRX5 ENSG00000182512 RAF1 ENSG00000132155 46

Table 2.2 Early developmental brain regions enriched for P-Strab genes, based on expression enrichment analysis with ABAenrichment.

Developmental Stage Age Category 1 (Prenatal) Age Category 2 (0-2 yrs)

Structure 1 anterior (rostral) cingulate (medial prefrontal) cortex mediodorsal nucleus of thalamus

Structure 2 amygdaloid complex posteroventral (inferior) parietal cortex

Structure 3 cerebral nuclei parietal neocortex

Structure 4 orbital frontal cortex cerebellar cortex

Structure 5 posteroventral (inferior) parietal cortex striatum

47

Table 2.3 Top 10 HPO terms for GO modules based on P-Strab Set. For each module, the distinctive HPO term that is specific for that module is highlighted. Module Name A B C D E F G H I J Embryonic GO Module Neuron Cranial Nerve Hindbrain Inner Ear Skeletal System Histone Diencephalon characteristic Camera-Type Eye Differentiation Formation Development Cardiac Muscle Development Morphogenesis Acetylation Development Kidney 1 Strabismus Strabismus Strabismus Strabismus Strabismus Strabismus Strabismus Strabismus Strabismus Strabismus Abnormal Abnormal Abnormal Abnormal Abnormal Abnormal conjugate Abnormal conjugate Abnormal Abnormal Abnormal conjugate conjugate eye conjugate eye conjugate eye eye conjugate eye eye conjugate eye conjugate eye conjugate eye eye 2 movement movement movement movement movement movement movement movement movement movement Abnormality Deviation of the Abnormal Abnormality of eye Abnormality of Cerebellar of eye Abnormality of eye hand or of fingers Aplasia/Hypoplasia location of 3 movement eye movement malformation movement movement Oral cleft of the hand Narrow palate affecting the eye ears Radial deviation of Abnormal eye Intellectual Abnormal Aplasia/Hypoplasia the hand or of Abnormality of 4 physiology disability Cleft palate facial shape affecting the eye Cleft palate fingers of the hand the testis Short stature Low-set ears Abnormal Abnormality Aplasia/hypoplasia Abnormality of Abnormality Aplasia/Hypoplasia Abnormal eye Abnormality of pattern of Abnormality of the of the hard involving the male internal of eye 5 affecting the eye physiology the hard palate respiration eyelid palate skeleton genitalia Iris coloboma movement Abnormality Abnormality of the Abnormality of of midbrain Abnormality of the Abnormality of Abnormality 6 uvea the lip Oral cleft morphology palpebral fissures Iris coloboma Hypertelorism the palate Vertebral fusion of upper lip Abnormal Abnormal eye Abnormality of Abnormality of Molar tooth Abnormality of the Abnormality of the internal Abnormality of 7 morphology the mouth eye movement sign on MRI outer ear Coloboma eyelid genitalia body height Coloboma Abnormality Abnormality Abnormality Abnormality of the Abnormality of of the Abnormality of the of eye Abnormality of the of the outer 8 globe the outer ear Short digit midface uvea movement Microtia Cryptorchidism penis ear obsolete Displacement of Abnormality Abnormality of the Abnormality Aplasia/Hypoplasia Abnormality of the external Cleft upper 9 Nystagmus Low-set ears Ptosis of the mouth ocular region of the mouth of the ear the outer ear urethral meatus lip obsolete Abnormality of the ocular region Abnormal Abnormal Abnormality of (Abnormality involuntary eye location of Renal Narrow Abnormality of the Cerebellar Aplasia/Hypoplasia the male of the orbital 10 movements ears hypoplasia/aplasia forehead periorbital region malformation of the external ear genitalia Hypospadias region)

48

Table 2.4 Strabismus Candidate Gene Collection The 18 strabismus candidate genes that are present in at least 2 out of 3 sets of gene predictions (reported in Supplementary Tables A.7a and A.7b). Association with Gene strabismus upon gene Name Ensembl gene stable ID Gene description Associated Diseases Notes disruption (PubMed ID) AP1G1 ENSG00000166747 adaptor related protein complex 1 subunit gamma 1 Hypoactive Sexual Desire Disorder N/A Raf (ARAF, BRAF, CRAF), the first MAPK pathway kinase; ARAF ENSG00000078061 A-Raf proto-oncogene, serine/threonine kinase RASopathies N/A Spinocerebellar Ataxia 37 and Lissencephaly With DAB1 ENSG00000173406 DAB1, reelin adaptor protein Cerebellar Hypoplasia N/A DCT ENSG00000080166 dopachrome tautomerase Microphthalmia and Hermansky-Pudlak Syndrome 3 N/A EMC3 ENSG00000125037 ER membrane protein complex subunit 3 Primary Cutaneous Amyloidosis N/A Greig Cephalopolysyndactyly Syndrome and Polydactyly, 12414818 GLI3 ENSG00000106571 GLI family zinc finger 3 Postaxial, Types A1 And B Alacrima, Achalasia, And Mental Retardation Syndrome GMPPA ENSG00000144591 GDP-mannose pyrophosphorylase A and Achalasia. 24035193 GTF2E1 ENSG00000153767 general transcription factor IIE subunit 1 Trichothiodystrophy 1, Photosensitive N/A Cerebellar Ataxia, Deafness, And Narcolepsy, Autosomal HDAC2 ENSG00000196591 histone deacetylase 2 Dominant and Pulmonary Disease, Chronic Obstructive N/A RAS/MAPK signalling KSR1 ENSG00000141068 kinase suppressor of ras 1 pathway N/A Chromosome 22q11.2 Deletion Syndrome, Distal and MAPK1 ENSG00000100030 mitogen-activated protein kinase 1 Pertussis MAPK N/A Small Intestine Neuroendocrine Neoplasm and Pilocytic MAPK3 ENSG00000102882 mitogen-activated protein kinase 3 Astrocytoma Of Cerebellum MAPK N/A NFIX ENSG00000008441 nuclear factor I X Sotos Syndrome 2 and Marshall-Smith Syndrome. 22982744 Tumoral Calcinosis, Familial, Normophosphatemic and RGL2 ENSG00000237441 ral guanine nucleotide dissociation stimulator like 2 Familial Tumoral Calcinosis RAS signalling pathway N/A Mental Retardation, Autosomal Dominant 47 and STAG1 ENSG00000118007 stromal antigen 1 Ovarian Mucinous Cystadenocarcinoma N/A STAG2 ENSG00000101972 stromal antigen 2 Laryngotracheitis and Tracheitis N/A

TFAP2D ENSG00000008197 transcription factor AP-2 delta N/A tyrosine 3-monooxygenase/tryptophan 5- YWHAB ENSG00000166913 monooxygenase activation protein beta Gerstmann-Straussler Disease N/A

49

Chapter 3: Strabismus in children with intellectual disability: part of a broader motor control phenotype?

3.1 Synopsis

Purpose: Intellectual disability (ID) frequently occurs in association with other clinical features such as seizures, hypotonia, or malformations. We suspected that strabismus might also be unusually frequent in this population and that it might be associated with sub-phenotypes affecting motor control.

Methods: We reviewed phenotypic descriptors, extracted from medical records, for a heterogeneous series of 222 children with ID who had been enrolled in a study of clinical application of exome sequencing. We estimated the frequency of strabismus and other common clinical features, and explored statistical associations between them. Data from Population Data

British Columbia and Online Mendelian Inheritance in Man (OMIM) were also examined for confirmation of our observations.

Results: Strabismus had a higher prevalence among children with ID than in the general population (odds ratio=5.46). Moreover, children with both ID and strabismus were more likely to have problems affecting motor control than those with ID and no strabismus (odds ratio=2.84). Hypotonia was one of the most common motor control sub-phenotypes affecting the

ID children, and a frequent co-occurrence between strabismus and hypotonia was also observed

(odds ratio=2.51). There was no evidence for associations between strabismus and other frequent clinical features among these children with ID.

50

Conclusion: Strabismus is a frequent feature in children with ID. The frequent co-occurrence of strabismus and motor control phenotypes, in particular hypotonia, suggests that a common central nervous system mechanism or pathway may underlie these phenotypes.

3.2 Introduction

Intellectual disability (ID) results from a very large and heterogeneous group of disorders or exposures, and is diagnosed on the basis of limitations affecting both cognitive and adaptive domains 112. The U.S. Centers for Disease Control and Prevention currently estimate that 1.14% of children aged 3-17 years have been diagnosed with ID 113.

Strabismus is a broad phenotype of ocular misalignment affecting 2% - 4% of children 9.

Multiple mechanisms can lead to strabismus, including abnormalities of the brain, the cranial nerves, or the extraocular muscles, although in most children the underlying pathology remains obscure 71,114. Motor problems often occur in children with ID, and we suspected that strabismus, which may reflect a functional abnormality of ocular movement, might also be more frequent than expected in this population. Here we undertook an analysis of phenotypes in a heterogeneous series of children with ID to (i) document the prevalence of strabismus and (ii) test the hypothesis that strabismus may be part of a broader motor control phenotype, as evidenced by correlation with motor sub-phenotypes such as hypotonia.

3.3 Methods:

3.3.1 Clinical characterization of CAUSES subjects

Clinical phenotypic information was extracted from electronic and hard copy health records (dating up to the end of 2017) for probands enrolled in a study of exome sequencing: the 51

CAUSES (Clinical Assessment of the Utility of Sequencing and Evaluation as a Service) project at BC Children’s & Women’s Hospital in Vancouver, British Columbia, Canada. Not all probands had ID; only cases labelled with one of the following were identified for inclusion in our analysis: intellectual disability or cognitive disability. Within this CAUSES-ID series, those with at least one of the following were considered to have a ‘motor control’ phenotype: hypotonia, hypertonia, dystonia, cerebral palsy, spasm, tremor, any movement defect, or any coordination defect.

Clinical features were obtained from close inspection of phenotypic descriptors compiled by CAUSES study investigators after review of free-text medical records. In order to identify the most common phenotypes, appropriate for inclusion in our study, we first used a text mining and visualization approach with R (v3.4) packages tm (text mining), wordcloud, and RColorBrewer, followed by a manual curation step. We manually inspected the results, focussing on the most frequently presented words (≥25 subjects). During the manual inspection process, synonymous clinical features were identified, and the ones associated with the most common words were combined to obtain more accurate frequencies of the phenotypes. Contingency analysis with two-sided Fisher’s exact tests (R package “exact2x2”) was then applied to detect associations between strabismus and other phenotypes.

3.3.2 Analyses based on Population Data BC and OMIM

Additional data were obtained from two independent sources to determine if our observation replicated with different datasets. BC Data Scout™ is a service of Population Data

British Columbia that provides aggregate numbers of people with specific International

52

Classification of Diseases (ICD) diagnoses recorded in billings submitted to the Medical Service

Plan (MSP) records in British Columbia (BC), Canada 115.

An ICD code for ‘specific motor retardation’ was used to select subjects with motor control phenotypes, despite the possibility that motor delays related to peripheral neuromuscular disorders could be included. Hypotonia was not significantly represented in billing records and could not be reliably analyzed. To further assess the relationship between strabismus and hypotonia, the Online Mendelian Inheritance in Man (OMIM) database was used to compile entities featuring ID, strabismus, and hypotonia. The search was limited to the OMIM Clinical

Synopsis, which provides systematic phenotypic descriptions of monogenic disorders and some chromosomal disorders such as Down syndrome. Detailed search strategies for both sources are outlined in the Supplementary Methods (Appendix B).

3.4 Results

3.4.1 Strabismus frequently affects children with ID

To study the prevalence of strabismus and its association with other phenotypes, we reviewed clinical information for 479 subjects enrolled in the CAUSES study, 222 (46.3%) of whom were affected by ID. An additional 106 subjects had global developmental delay, which frequently precedes receiving a diagnosis of ID once children were old enough for intellectual assessment; these were not included in the current analysis. Analysis of phenotypic records in the

CAUSES-ID series revealed 6 phenotypes occurring in at least 25 subjects (Table 3.1, Figure

B.1). Among these, strabismus occurred in 14.4% of subjects, and hypotonia occurred in 17.6%.

To compare the prevalence of strabismus in ID to that of the rest of the population, we generated a report through the BC Data Scout™ service 116. We identified a total of 754,125 53

children aged 4-17, 3,540 of whom (0.47%) had been reported as having ID via billing codes.

Strabismus occurred more often in the ID group (30.4%) compared to the rest of the population

(7.4%; odds ratio (OR)=5.46, 95% CI [5.08, 5.86]; Figure 3.1B).

3.4.2 Motor control phenotypes are associated with strabismus in ID

To explore an association between motor control phenotypes and strabismus, we further sub-classified the CAUSES-ID series into an ID-motor group (77 subjects) and ID-non-motor group (145 subjects). Strabismus was reported more often in the ID-motor group (18/77=23.4%) compared to the ID-non-motor group (14/145=9.7%; OR=2.84, 95% CI [1.24, 6.63]; Figure

3.1A). As a comparison, we also tested for associations between strabismus and other frequent phenotypes in ID. The occurrence of strabismus was not significantly higher in groups with dysmorphisms, autism, seizures, or microcephaly compared to the corresponding non-affected groups (Figure 3.1A, Table 3.2).

Although hypotonia was included within the ID-motor group analysis, it is, by itself, among the most frequent phenotypes in ID. Therefore, we also examined strabismus prevalence in the ID-hypotonia group. It is more common than in the ID group without hypotonia

(OR=2.51, 95% CI [1.07, 6.04]; Figure 3.1A).

To replicate the association between strabismus and motor control within ID, we analyzed 3,540 ID subjects from Population Data BC. Strabismus occurred more often in the ID- motor group (50/110, 45.5%) compared to the ID-non-motor group (1025/3430, 29.9%;

OR=1.95, 95% CI [1.33, 2.87]) (Figure 3.1B). Since the Population Data BC records do not report on hypotonia, we could not use this dataset to replicate the strabismus-hypotonia association observed in the CAUSES-ID series. Instead, we examined this association using 54

phenotype descriptions from OMIM records. A total of 1,481 entities featuring ID and 433 entities featuring strabismus were identified. Strabismus occurred in 174 records together with hypotonia and showed a significant co-occurrence compared to records without hypotonia

(OR=3.40, 95% CI [2.60, 4.46]; Figure 3.1C).

3.5 Discussion

While independent datasets and literature reviews support our finding of an independent association between hypotonia and strabismus in children with ID, we report and interpret our observations with two challenges in mind. First, free-text was used to record subject phenotypes upon assessment in the CAUSES study, and the use of uncontrolled vocabularies poses a challenge for deep and precise analysis of all the phenotypes since different terminology can be used to describe the same feature 117. Future improvement will come from the systematic use of controlled vocabularies, such as the Human Phenotype Ontology (HPO), during data collection

117. A more significant limitation of the current study is rooted in the fact that the original data is not recorded for the purpose of studying the associations explored here. Misclassification, over- or under-ascertainment of ID, strabismus, hypotonia, or motor control phenotype can be potential confounding factors.

In a study specifically designed to describe ocular findings and refractive errors in a cohort with ID, Akinci and coworkers prospectively performed ophthalmological examinations and reported a similar strabismus prevalence of 14.0% with both idiopathic and syndromic intellectual disability compared to 1.3% of the control group 118. We used population-based health data (British Columbia (BC) billing information) --a larger sample size--to show that strabismus occurs more frequently in children with ID than in control. Both rates are however 55

approximately twice as high as in previous reports: 30% vs. 14% and 7% vs. 4%, respectively9,118. We also noticed that the reported ID prevalence in BC is approximately half of the expected frequency of 1.14% 113. These observed discrepancies are likely non-random: the inclusion of referrals to ophthalmologists for ruling out strabismus likely increases the strabismus rates, and the under-representation of mild ID cases in the billing data may be a cause for reduced ID prevalence. Replicating this association in a dataset that ascertains and classifies

ID, strabismus, and various motor problems more rigorously and consistently would be an important next step.

Our replicated observation of an association between strabismus and hypotonia in intellectual disability across different data sources could suggest a common underlying mechanism or pathway contributing to both strabismus and hypotonia. The neuro-anatomical or physiological underpinnings of both entities remain poorly understood 71,119. The absence of specific peripheral neuromuscular signs and the presence of cognitive disability in almost all the probands in the CAUSES-ID series suggest that hypotonia is centrally mediated 120. Diverse and numerous genes and central nervous system structures are therefore implicated in hypotonia and strabismus 71,72,121. We propose that the cerebellum in particular is implicated in both phenotypes because it is known to be critical in regulating motor output. How the cerebellum controls movement is still poorly understood, but it is known to contribute to control of both the oculomotor systems and tone 122,123.

56

A Motor ●

Hypotonia ●

Dysmorphism ●

Seizure ●

Autism ●

Microcephaly ●

B BC ID ●

BC Motor ●

C

OMIM Hypotonia ●

0 1 2 3 4 5 6 7 8 Odds Ratio

Figure 3.1 Odds ratios and 95% confidence intervals (x-axis) for the association of different clinical features (y-axis) with strabismus vs. non-strabismus ID groups. Odds ratio represents the odds that a clinical feature co-occurs with strabismus, compared to the odds that the clinical feature occurs in the absence of strabismus. (A) Data from the CAUSES series with ID. (B) Data from the British Columbia population. BC ID represents odds ratio between odds of having strabismus in the presence of ID & odds of having strabismus in the absence of ID in British Columbia (BC), as determined by ICD-codes used in billing. (C) Data from OMIM records.

57

Table 3.1 Prevalence of clinical features in CAUSES ID subjects.

Clinical Feature Number (%) (total 280) Dysmorphism 102 (45.9%) – including “dysmorphic,” “unusual morphological features,” “morphological difference”, “hypertelorism”, “down slanting palpebral fissures”, abnormally formed external ears Autism 63 (28.4%) -including “autism spectrum disorder,” “Asperger” Seizure 47 (21.2%) –including “epilepsy” Hypotonia 39 (17.6%) - including “low muscle tone” and “low tone” Strabismus 32 (14.4%) – including “esotropia”, “exotropia”, “hypertropia”, “hypotropia” Microcephaly 30 (13.5%) – including “small head”

Table 3.2 Co-occurrence of strabismus with other clinical features in CAUSES-ID series.

Clinical Feature Cases with Odds 95% Confidence Strabismus Ratio Interval Lower Higher Dysmorphism 16 1.21 0.57 2.57

Autism 7 0.67 0.26 1.62 Seizure 7 1.05 0.39 2.63 Hypotonia 10 2.51 1.07 6.04 Microcephaly 3 0.63 0.15 2.23 Motor control 18 2.84 1.30 6.47 phenotypes

58

Chapter 4: Linkage analysis identify isolated strabismus locus at 14q12 overlapping FOXG1 syndrome region

4.1 Synopsis

Purpose: Strabismus has long been studied with genetic linkage analysis, and multiple loci have been identified. However, the identified loci have relatively low LOD scores and no likely casual gene has been identified from these studies. This study aims to identify a likely casual variant in a large pedigree displaying a Mendelian dominant pattern of transmission.

Methods: This genetic study is based on an extensive pedigree, describing a seven-generation family with isolated strabismus inherited in an autosomal dominant manner. A total of 13 offspring of a common ancestor have been used for linkage analysis, with the results refined by haplotype analysis. Whole exome sequencing and whole genome sequencing were applied to selected individuals followed by a series of advanced bioinformatic analyses to identify and prioritize all observed variants in the linkage locus.

Results: A single peak has been identified at chromosome 14q12 with a LOD score of 4.77 through linkage analysis. With the incorporation of next generation sequencing and in-depth bioinformatic analyses, we identify a 4bp deletion within a reported cis-regulatory region as the likely variant for causing the phenotype.

Conclusion: Our analyses suggest a role of this deletion in FOXG1 auto-regulation, as it overlaps an experimentally supported FOXG1 binding site. FOXG1 syndrome has a high

59

prevalence of strabismus, and future study of this specific deletion may shed light to the spatiotemporal regulation of FOXG1 expression and enhance our understanding of the mechanisms contributing to strabismus.

4.2 Introduction

Diagnosis and treatments for strabismus are well-established, but the pathophysiology for most isolated strabismus remains largely unknown. Disturbances anywhere along the visual input and oculomotor output pathways can be postulated to lead to eye deviation 21. During the last century, twin studies and family studies have demonstrated a substantial genetic contribution to strabismus, and both autosomal dominant and autosomal recessive transmission patterns have been reported 71. Strabismus occurs commonly in syndromes, such as congenital Rett syndrome

(FOXG1 syndrome), in which 84% of individuals display a strabismus phenotype 124. On the other hand, families displaying isolated strabismus transmitting in simple Mendelian patterns are uncommon. Few cases have shown a clear genetic locus in isolated strabismus in affected families 37,71. Eleven genes (PHOX2A, ROBO3, KIF21A, SALL1, TUBB3, HOXB1, SALL4,

CHN1, HOXA1, TUBB2B, MAFB), of which five encode transcription factors (underlined), have been identified for a subgroup of strabismus, known as congenital cranial dysinnervation disorders 46, but the genetic etiology of other strabismus subtypes remains elusive. Identification of a locus with high confidence and detailed examination of the landscape of the locus in a single nucleotide level can provide new insights into the molecular mechanisms of strabismus.

We report a study of a large, seven-generation, non-consanguineous pedigree with 21 individuals affected by isolated strabismus with an autosomal dominant inheritance pattern.

Through linkage analysis, we mapped this familial strabismus to chromosome 14q12, which 60

overlaps with the FOXG1 syndrome locus. Next generation sequencing and in-depth analysis identified a likely casual deletion within this locus. Integration of data supports that the deletion situates in a transcription factor bound region and suggests its regulatory role for FOXG1 expression.

4.3 Materials and methods

4.3.1 Patient ascertainment

Working from a substantial family-provided genealogy, we traced back seven generations of an index case based on family records (including photos displaying eye alignment) and built the pedigree accordingly. During the project period, we expanded the pedigree to include multiple branches and invited individuals from across the pedigree to participate in a study. The study was approved by the University of British Columbia Children’s & Women’s Research

Ethics Board (approval number CW10-0317/H10-03215), and written consents were obtained from all participating family members. Thirteen individuals were descendants of a common ancestor. Among the thirteen descendants, nine participants reported early onset isolated strabismus, and the other four reported no strabismus based on past medical records.

Eight affected descendants and four self-reported unaffected descendants were examined by one or more of three ophthalmologists (Drs. J. Horton, V. Pegado, and C. Lyons). One of the clinicians assessed 7 affected individuals (except for 014 who joined the study subsequently and

006) on a single day in a common setting. All participants were asked about the age of onset (if applicable), ocular history, and medical history. Examination included visual acuity, pupils, eye movements, ocular alignment, stereopsis, slit lamp examination, fundus examination, and intraocular pressure. Individuals 009, 011, and 013 (seen by Dr. C. Lyons) did not have a history 61

of extraocular muscle surgery and therefore underwent full orthoptic exams. In addition, 013 underwent MRI study for cranial nerve IV abnormality due to the presentation of superior oblique palsy.

4.3.2 DNA isolation

Genomic DNA of participants was isolated from either saliva or blood. At least 4 ml blood samples or 6 ml saliva samples were collected for one round of next generation sequencing, and at least a 2ml saliva sample was collected from participants for genotyping.

Blood samples were collected in a clinical setting while saliva samples were collected using

Oragene-DNA (OG-500) saliva kits. DNA was extracted from blood samples using the Qiagen

QIAsymphony SP instrument and the QIAsymphony DNA Midi Kit and from saliva samples with DNA Genotek prepIT-L2P sample preparation kit following protocol # PD-PR-015.

Approximately, 7-10 µg DNA per sample at a concentration no less than 70 ng/µl was sent for

NGS. A 500 ng DNA per sample at a concentration of at least 50 ng/µl was sent for genotyping.

4.3.3 Genotyping: statistical linkage analysis and haplotype analysis

Genotyping was performed on HumanOmni2.5, using the Infinium LCG assay. Multiple analyses were performed for quality control purposes. Simulations were performed to determine the maximum possible LOD (logarithm of the odds) score for different model parameters under the alternative hypothesis (linkage). SLINK 3.02 was used to simulate pedigrees under dominant and recessive models with a range of disease allele frequencies and penetrance 125. For a particular model, the maximum LOD score from the analysis of 1000 simulated pedigrees was declared the maximum LOD score. 62

Multiple filters were applied to select a set of markers suitable for linkage analysis. Only markers with alleles unambiguous for strand information on autosomes and X were kept, and a minor allele frequency >0.45 and pairwise r2 < 0.1 were required. Merlin 1.1.2 was used to perform multipoint linkage analysis under the same model as in SLINK simulation

126.

As individual 014 was recruited at a later time point, we extracted single nucleotide polymorphisms (SNPs) from whole genome sequencing (WGS) data corresponding to SNPs used for linkage analysis for the family and performed genome-wide parametric and non- parametric linkage analyses using Merlin 1.1.2. To refine the boundaries of the linked region, we compared the SNPs near the edge manually between descendants and identified the minimum shared region.

4.3.4 Whole-exome sequencing

We performed whole exome sequencing (WES) on 001 and 011, third-degree cousins.

WES was performed via the Agilent SureSelect Human All Exon 38Mb kit and Illumina HiSEQ

2000 platform (Perkin Elmer). The genomic aligners, Bowtie (version 0.12.9) and BWA (version

0.6.1), were used to map the reads to the hg19 reference genome 127,128. The Genome Analysis

Toolkit (GATK) (version 1.0) performed local re-alignment, which allowed for correcting misalignment at the extremity of reads 129. SAMtools (versions 0.1.18) was applied to call variants from aligned WES reads 130. In-house scripts were used to filter variants according to the following criteria: under an autosomal dominant model, with a frequency not higher than 1% in dbSNP build 135, non-synonymous coding variants, and predicted by SIFT to be ‘damaging’ or indeterminate. 63

4.3.5 Whole-genome sequencing

We performed WGS on 001, 013, and 014, who were three distantly related cousins.

WGS was performed on an Illumina HiSEQ 2000 platform (BGI America). An informatics pipeline (similar to the WES pipeline but with newer versions of software) was applied to this batch of WGS data: Bowtie (version 1.0.0) and BWA (version 0.7.5a) for mapping the reads to the hg19 reference genome 127,128, GATK (version 2.8) for local re-alignment 129, and SAMtools

(version 0.1.19) for variant calling 130.

Variants located within the linkage region were selected for further analysis. Allele frequency was assessed using dbSNP build 137 and Exome Variant Server (EVS), and variants with a frequency higher than 1% were excluded. Heterozygous variants shared across the three samples were selected, and SnpEff (with hg19 database) was applied to annotate those variants.

4.3.6 Non-coding variant annotation and interpretation

SnpEff did not provide sufficient annotation for non-coding variants. To enable analysis of such variants, we used multiple databases and corresponding bioinformatic tools to annotate such variants, including functional annotation of the mammalian genome 5 (FANTOM5) database, JASPAR, Segway, RegulomeDB and Combined Annotation Dependent Depletion

(CADD) 131–134. Variants that were predicted to be due to a sequencing error or alignment error were assigned to a lower priority category. Candidates were visualized in the UCSC Genome

Browser. Based on the qualitative assessment, the top prioritized variant was confirmed to be present by standard PCR in WGS subjects. Selected variants were further compared against the

Genome Aggregation Database (gnomAD) database 135.

64

4.4 Results

4.4.1 Pedigree and participant profile

We compiled a seven-generation pedigree with over 170 individuals, including deceased individuals. Three major branches from the same common ancestor were traced (Figure 4.1a). A roughly even distribution of strabismus cases was observed between females (12 individuals) and males (nine individuals). By qualitative observation, an autosomal dominant model with high penetrance best matched the inheritance pattern. Most participants belonged to branch 1 (Figure

4.1b). In branch 1, strabismus was reported across four consecutive generations according to strong family anecdotes and/or medical records.

Each of the nine affected descendants was seen by one of three ophthalmologists specialized in strabismus. The specific characteristics of strabismus were not uniform across the descendants in the family (Table 4.1). The original direction and deviation angle can be difficult to ascertain retrospectively after multiple surgeries and/or development of other ocular conditions. The affected individuals could be grouped into two broad directional categories: esotropia and hypertropia. Both esotropia and hypertropia were noted in 014, but this individual had undergone multiple corrective surgeries.

Among the three individuals with no history of extraocular muscle surgery, one had esotropia (011) while the other two displayed hypertropia (009, 013). Individual 013 presented characteristics consistent with cranial nerve fourth palsy, but a subsequent MRI showed that cranial nerves IV were present and symmetric (data not shown).

65

4.4.2 Linkage analysis and haplotype analysis

Samples from 12 individuals (8 affected; 4 unaffected) were genotyped using a high- density genotyping panel. A set of 17,779 SNPs was obtained after the SNP filtering step for linkage analysis. Simulations under the alternative hypothesis (linkage) generated a maximum simulated LOD score of 3.56, under an autosomal dominant model with minor allele frequency q

= 0.005 and 99% penetrance. The LOD score curves did not change significantly with disease allele frequency, and the dominant models had consistently higher LOD scores than recessive models.

Based on the genotyping data from the subject family, the largest observed LOD score was 3.55, on chromosome 14, which was close to the theoretical maximum (3.56) obtained from simulations. This was the only region with a LOD score higher than 3, and thus the only region for which rejection was made of the null hypothesis of independent assortment. The linked region on chromosome 14 spanned approximately 10Mb on the physical map and was bounded by the markers rs7146411 and rs1951187, corresponding to chr14: 22,779,843 - 32,908,192. This region identified by linkage analysis is a novel locus for isolated strabismus. Displaying the candidate region in the UCSC genome browser revealed, at the centre of the region, a roughly 5

Mb region containing only three protein coding genes (i.e. a gene desert) (Figure 4.3b).

Subsequent to the genotyping analysis, 014 was recruited to the study for WGS, representing a distant branch in the pedigree. An expanded linkage analysis with corresponding

SNPs from 014’s WGS data further supported the linkage within chromosome 14. We observed a LOD score of 4.77 for the same linkage region, and it remained as the sole likely casual variant

(Figure 4.2a). In addition, we performed non-parametric analyses and obtained the same linkage region (Figure 4.2b). 66

An approximately 8.5 Mb region (chr14:22,779,843 - 31,289,720) was shared between nine affected descendants. An unaffected descendant (subsequently deceased), who had not been assessed by the ophthalmologists, shared a 5.5 Mb region within the linked region (Figure 4.3a and Figure 4.3b). Thus, an approximately 3 Mb region was shared exclusively by 9 affected descendants, corresponding to chr14: 28,467,136 - 31,289,720. This region lies within the aforementioned gene desert (Figure 4.3b).

4.4.3 No impactful coding variant in the 10 Mb region identified through WES and WGS

WES showed that two distantly related individuals (001 and 011; third cousins) shared

119 heterozygous non-synonymous variants across the entire exome. A subset of 60 among the

119 had a frequency lower than 1% in in-house database. Only one of the variants (chr14:

31061628 A > G, rs145527124) was located within the shared region of interest, falling within an exon of G2E3 (G2/M-phase specific E3 ubiquitin protein ligase). Despite relatively low frequency (0.2% of alleles in gnomAD), this variant was not supported as likely disease- associated by computational analysis (predicted to be “tolerated” with SIFT and “benign” with

Polyphen) or in-depth manual review. We reviewed the remaining 59 coding variants, of which eight variants that were identified by SIFT (damaging variants or variants with unknown effect) were selected for further consideration. Qualitative review of the literature did not suggest a role for any of the SIFT prioritized variants in a strabismus phenotype.

As the critical region lacked promising casual variants in the WES results, WGS was performed on three individuals, 001, 013, and 014, who were selected to represent three distinct branches of the tree. A total of 42 rare, shared, heterozygous variants in the core region remained after applying filters (see methods). Of these, 41 were non-coding (with the 42nd being the 67

previously noted G2E3 coding variant). SnpEff was applied to inform the location of non-coding variants in terms of nearby gene.

4.4.4 WGS and bioinformatic analyses highlight a heterozygous non-coding variant in a regulatory region of FOXG1

As 41 of 42 likely casual variants identified in the WGS analysis were non-coding, we used diverse methods to annotate non-coding variants with regulatory information. There is no standard practice to annotate non-coding variants, and different types of data provide different information and help to identify different types of regulatory elements. One variant was noted recurrently as interesting using a variety of bioinformatic predictions.

Segway identified the variant chr14:29247628 TAAAC > T as being situated within a candidate repressor region, due to the presence of H3 trimethyl-lysine 27 (H3K27me3) mark in

H1 human embryonic stem cell lines. H3K27me3 is associated with inactive genes, and it signals bivalent promoter with K4me3 in embryonic stem cells. This deletion was absent in gnomAD.

Additional annotation tools were used to integrate information to evaluate the variant: chr14:29247628 TAAAC > T had a RegulomeDB score of 2b and a CADD score over 20. While both were amongst the top scores of the 41 variants, these are moderate scores, based on the descriptions of the scoring schemes found in respective references 134,136. This deletion variant was confirmed through Sanger sequencing to be present in all three subjects.

We examined topological associating domains (TADs) for the 3 Mb region (Figure 4.4).

The likely casual variant was located within the same TAD as FOXG1, and hereafter this TAD will be referred to as the FOXG1-TAD. Both FOXG1 and the sequence surrounding the likely

68

casual mutation are highly conserved across vertebrates, with the affected sequence retained from fish to humans (Figure 4.6).

In annotations, the variant chr14:29247628 TAAAC > T was located within an alternative exon of a long non-coding RNA gene (LINC01551). Within the mouse, chicken, and zebrafish annotation and supporting data, there were no transcripts of the

LINC01551 ortholog isoform containing the variant. As the variant position is conserved back to fish, and the transcript evidence is not supportive of transcription of the region in other species, we considered whether the variant might be situated within a cis-regulatory region. We examined predicted TFBS motifs overlapping the deletion and observed a match to a FOXG1 binding pattern (Figure 4.5). Due to two consecutive AAAC repeats, a new FOXG1 binding site is formed after the deletion, but the deletion variant has lost one strong binding motif relative to wildtype. Publicly available mouse ChIP-seq data (GSE96070) showed that Foxg1 binds to this site in tissue from E14 brain. Thus, it appears that the deletion is situated within a Foxg1 binding site, in a highly conserved region with conservation patterns consistent with a functional role in the cis-regulation of the FOXG1 gene.

4.5 Discussion

We identified a new locus for isolated strabismus in a family with a LOD of 4.77, and this locus overlapped with the FOXG1 syndrome locus (a syndrome with high prevalence of strabismus). In-depth phenotyping of individuals without surgery illustrated clinical heterogeneity of strabismus within the family. The success of this study rooted from (i) an extensive seven-generation pedigree with clear autosomal inheritance pattern; (ii) genotyping of nine affected descendants and four unaffected descendants from a common ancestor; (iii) 69

examining patient genomes at single nucleotide level. We used next generation sequencing and bioinformatic analyses to examine both coding and non-coding variants, which led to identification of a potential strabismus causing sequence alteration within a FOXG1 TFBS within the FOXG1-TAD, suggesting disruption of an auto-regulatory loop (where the role of FOXG1 as an activator or a suppressor remains unknown).

To the best of our knowledge, our report contains the largest isolated strabismus pedigree in the literature with the highest LOD score. A single linkage peak was identified on chromosome 14 with a LOD score of 4.77. Moreover, the inheritance pattern suggested by the linkage analysis matched the autosomal dominant inheritance pattern observed in the pedigree.

Through haplotype analysis, a 3 Mb region was identified present in all affected participants and absent from all unaffected. This chromosome 14q12 region overlaps with microdeletions/microduplications known to cause FOXG1 syndrome in which a high prevalence of strabismus is observed.

FOXG1 syndrome, which is also known as congenital Rett syndrome, is a neurological disorder characterized by impaired development and structural brain abnormalities. Strikingly,

84% of affected individuals have displayed strabismus 124. Distal microdeletions that disrupt the topological associating domains can lead to FOXG1 syndrome while FOXG1 is intact 137. Due to the close proximity and shared TAD with FOXG1, this 4bp deletion is suspected to alter FOXG1 expression in a spatiotemporal manner (Figure 4.4). A spectrum of partially overlapping phenotypes have been reported in patients with FOXG1 syndrome124. Based on our study, it appears that the strabismus phenotype is separable from intellectual disability and other severe disabling phenotypes observed in FOXG1 syndrome.

70

Close examination with different types of data provided important insights into the potential regulatory impact of the likely casual deletion. First, the sequences surrounding the deletion were highly conserved in the genomes of different species, suggesting that it was under evolutionary selection and that a change may have a functional impact. Indeed, the sequence containing the deletion and the coding region of the FOXG1 gene were the only two highly conserved elements in a 180 Kb neighbourhood (Figure 4.6). In addition, this conserved sequence was not supported as part of a long non-coding RNA in other species (e.g. mouse, chicken, frog), implying a cis-regulatory effect. Second, the conserved sequence disrupted by the deletion was predicted to be a TFBS for FOXG1 according to the FOXG1 binding site profile from JASPAR138, suggesting that the reported deletion likely disrupts FOXG1 binding. Third,

Foxg1 ChIP-seq data from E14 mouse brain (GSE96070) showed that Foxg1 binds to this sequence.

The binding of Foxg1 to this sequence in mouse provides the basis for the hypothesis of disrupted FOXG1 auto-regulation leading to strabismus in the subject family. The proposed auto-regulatory model is illustrated in Figure 4.7. FOXG1 is transcribed and translated, the transcription factor binds to the sequence, helping to maintain the appropriate expression of

FOXG1 during critical developmental period. The disrupted FOXG1 binding site leads to dysregulation of FOXG1 expression.

Auto-regulation for critical transcription factors in vision is not new to the field. The

SIMO regulatory sequence controlling expression of the PAX6 transcription factor gene is such a distal auto-regulatory element 139. While PAX6 is a crucial transcription factor for delineating the dorsal forebrain in mouse E10.0, Foxg1 is critical transcription factor for delineating the ventral forebrain in mouse E9.0 140. Thus they may share similar sensitivity to regulatory disruption. 71

FOXG1 expression is specific to fetal brain and its dysregulation leads to unbalanced development of excitatory and inhibitory synapses in iPSC-derived neurons and mice 141. In combination with other transcription factors, Foxg1 in pyramidal neurons is crucial for establishing cortical layers and axon trajectory of callosal projection neurons. Moreover, some

Foxg1-directed processes are more vulnerable to dosage changes than others 142. These observations suggest that Foxg1 has a dosage and time sensitive role in different brain structures.

This implies an alternation in Foxg1 expression pattern can have a very specific impact, and the specific phenotype can be separable from the rest.

In summary, we identified a 3MB region on chromosome 14 that is associated with autosomal dominant transmission of isolated strabismus. The region contains the FOXG1 gene, for which 84% of syndromic disruptions present with observed strabismus. Within the 3MB region, the likely casual variant is situated within a FOXG1 transcription factor binding motif, suggesting disrupted auto-regulation as a mechanism underlying the observed strabismus phenotype. As the causal functional alteration remains to be proven, additional studies will be required to identify other families with genetic forms of strabismus mapping to the locus and to conclusively prove the causal sequence alteration and its pathophysiological mechanism.

72

died at birth died at died at birth birth

004 006 005 014 007 009 015

008 012 013 002 003

? ? ? ? ? ? ? ? ? ? ? ? ? ?

010 011 001 ?

?

Branch 2 Branch 1 Branch 3 a)

b)

Figure 4.1 Pedigree for the subject family with isolated strabismus. a) The pedigree represents a seven-generation family with 170 individuals, including deceased individuals. Three major branches are identified: 12 participants from Branch 1 and one (014) from Branch 2. b) Simplified Branch 1 of the subject family showing the genotyped individuals (with study ID) and ancestors required to link them. Individual 014 represents Branch 2, and all the other individual comes from Branch 1.

73

Genome−wide parametric linakge analysis results

5

0

−5 Parametric LOD scores Parametric

−10

Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Chr10 Chr11 Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 Chr20 Chr21 Chr22 Chr23 0 100 200 0 100 200 0 50100150200 0 50100150200 0 50100150200 0 50100150 0 501001502000 50 100150 0 50 100150 0 50 100150 0 50 100 150 0 50 100150 0 50 100 2550751001250 50 100 0 50 100 0 50 100 1500 2550751001250 25 50 751000 30 60 90120 20 40 60 80 20 40 60 80 0 50 100150 Chromosome Position (cM) a)

Genome−wide non−parametric linkage analysis results 4

3

2 parametric LOD scores − 1 Non

0

Chr1 Chr2 Chr3 Chr4 Chr5 Chr6 Chr7 Chr8 Chr9 Chr10 Chr11 Chr12 Chr13 Chr14 Chr15 Chr16 Chr17 Chr18 Chr19 Chr20 Chr21 Chr22 Chr23 0 100 200 0 100 200 0 50100150200 0 50100150200 0 50100150200 0 50100150 0 501001502000 50 100150 0 50 100150 0 50 100150 0 50 100 150 0 50 100150 0 50 100 2550751001250 50 100 0 50 100 0 50 100 1500 2550751001250 25 50 751000 30 60 90120 20 40 60 80 20 40 60 80 0 50 100150 Chromosome Position (cM) b)

Figure 4.2 Linkage analysis for subject family. a) Parametric analysis. An expanded linkage analysis was performed in all 13 individuals who shared the common ancestor. We observed a LOD score of 4.77 for the linkage region in chromosome 14. b) Non-parametric analysis. We performed non-parametric analyses and obtained the same linkage region on chromosome 14.

74

Individual Affected? a) ID

014 +

011 +

013 +

009 + 8.5 Mb

007 +

006 + 3 Mb

004 +

002 +

001 +

005 -

Markers rs8008403 rs2319682

8.5 Mb b) Scale 5 Mb hg19 chr14: 24,000,000 25,000,000 26,000,000 27,000,000 28,000,000 29,000,000 30,000,000 31,000,000 32,000,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) TCRA CEBPE BX161431 CMA1 Mir_548 MIR4307 DD413682 MIR548AI G2E3 BC041327 Mir_684 T-Cell Receptor V-alpha region PSMB5 THTPA NRL CTSG NOVA1 LINC00645 FOXG1 PRKD1 SCFD1 DTD2 AKAP6 AV4S1 CDH24 JPH4 NRL GZMH BC148262 C14orf23 BC062469 COCH GPR33 hADV38S2 ACIN1 DHRS2 EMC9 GZMB BC034423 BC062469 LOC100506071 NUBPL TCR- alpha V 33.1 LRP10 EFS DHRS4 STXBP6 U6 STRN3 TRNA_Glu TCR-alpha REM2 IL25 LRRC16B MIR624 ARHGAP5-AS1 AK093552 SLC7A8 DHRS4L2 AP4S1 ARHGAP5 AK125397 TRNA MYH6 CPNE6 HECTD1 TCRDV2 U6 PCK2 HEATR5A hDV103S1 HOMEZ DCAF11 TCRA PPP1R3E FITM1 3 Mb ADV21S1A1N SLC22A17 PSME1 TCRA CMTM5 PSME2 TCRA MIR208A RNF31 TCRA MYH7 IRF9 DAD1 MIR208B REC8 ABHD4 NGDN IPO4 OXA1L ZFHX2 TM9SF1 SLC7A7 AP1G2 TSSK4 MRPL52 DHRS4-AS1 MMP14 NEDD8-MDP1 RBM23 GMPR2 PRMT5 TINF2 HAUS4 TGM1 Gene Desert AJUBA RABGGTA C14orf93 HP08474 ~ 5 Mb PSMB11 DHRS1 C14orf119 NOP9 BC153822 CIDEB BCL2L2-PABPN1 LTB4R2 AX747770 LTB4R ADCY4 RIPK3 NFATC4 NYNRIN CBLN3 KHNYN SDR39U1 AK056368 22,779,843 28,467,136 31,289,720 32,908,192

Figure 4.3 Linkage region. a) Haplotype analysis for subject family. An approximately 8.5 Mb region (chr14:22,779,843 – 31,289,720) was shared between nine affected descendants. A self-reported unaffected descendant (subsequently deceased), who had not been assessed by the ophthalmologists, shared a 5.5 Mb region within the linked region. Thus, an approximately 3 Mb region was shared exclusively by 9 affected descendants, corresponding to chr14: 28,467,136 – 31,289,720. Yellow indicates the haplotype inherited from the common ancestor. The 8.5 Mb and the core shared 3 Mb are indicated. b) Key regions in relation to the 10 Mb linkage region. The linkage region, the 8.5 Mb region, and the core 3Mb region are indicated in relationship to UCSC genes. The core region lies within a gene desert ~ 5Mb. 75

Figure 4.4 Topologically associated domains within the 3 Mb core region. FOXG1-TAD is indicated by the black triangle shape. Three blue highlights from left to right correspond to the putative regulatory region within the FOXG1- TAD: 1) chr14:29247628 TAAAC > T 2) The DGV regulation region affecting FOXG1 expression 137 3) The SRO regulation region affecting FOXG1 expression 137

76

Score: 503 Reference sequence GTCGGGTTAAACAAACAACTT Score: 360

Altered sequence Score: 503 in subject family GTCGGGTTAAAC_____AACTT

Figure 4.5 FOXG1 transcription factor binding site matching to reference and alternative sequence. Two FOXG1 TFBS are identified in reference sequence with scores of 503 and 360 respectively. Only one FOXG1 TFBS is identified in sequence with the 4 bp deletion. Scores are based on PWMScan with “JASPAR CORE 2018 vertebrates” library (Ambrosini G., PWMTools, http://ccg.vital-it.ch/pwmtools).

77

a) Ultra-conservation of FOXG1 (left conserved column) and the 4bp deleted region (right conserved column).

78

Scale 10 bases hg19 chr14: 29,247,620 29,247,625 29,247,630 29,247,635 29,247,640 ---> G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Haplotypes to GRCh37 Reference Sequence Patches to GRCh37 Reference Sequence UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) C14orf23 C14orf23 Conserved Transcription Factor Binding Sites FOXG1 FOXI1 FOXC1 4.88 _ 100 vertebrates Basewise Conservation by PhyloP

100 Vert. Cons 0 - -4.5 _ Multiz Alignments of 100 Vertebrates Gaps 2 2 Human G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Chimp G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Gorilla G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Orangutan G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Gibbon G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Rhesus G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Crab-eating_macaque G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Baboon G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Green_monkey G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Marmoset G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Squirrel_monkey G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Bushbaby G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Chinese_tree_shrew G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Squirrel G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Lesser_Egyptian_jerboa G A G C T G T C G G G T T A A ACAAACA A T T TGT A C Prairie_vole G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Chinese_hamster G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Golden_hamster G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Mouse G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Rat G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Naked_mole-rat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Guinea_pig G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Chinchilla G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Brush-tailed_rat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Rabbit G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Pika G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Pig N N NNN NNNNNNN NNN NNNNNNN NNN NNNNN Alpaca G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Bactrian_camel G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Dolphin G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Killer_whale G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Tibetan_antelope G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Cow G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Sheep G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Domestic_goat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Horse G A G C T G T C G G G T T A A ACAAACA A C T TGT A C White_rhinoceros G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Cat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Dog G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Ferret_ G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Panda G A G T T G T C G G G T T A A ACAAACA A C T TGT A C Pacific_walrus G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Weddell_seal G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Black_flying-fox G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Megabat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Davidʼs_myotis_(bat) G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Microbat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Big_brown_bat G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Hedgehog G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Shrew G A G C T G T C G G G T T A A ACAAACA A C A TGT A C Star-nosed_mole G A T C T G T C G G G T T A A ACAAACA A C T TGT A C Elephant G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Cape_elephant_shrew G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Manatee G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Cape_golden_mole G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Tenrec G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Aardvark G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Armadillo G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Opossum G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Tasmanian_devil G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Wallaby G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Platypus A A G C T G T C G G G T T A A ACAAACA A C T TGT A C Saker_falcon G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Peregrine_falcon G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Collared_flycatcher G A G C T G T C G G G T T A A ACAAACA A C T TGT A C White-throated_sparrow G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Medium_ground_finch G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Zebra_finch G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Tibetan_ground_jay G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Budgerigar G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Parrot G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Scarlet_macaw G A G C T G T C G G G T T A A ACAAACA A T T TGT A C Rock_pigeon G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Mallard_duck G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Chicken G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Turkey G A G C T G T C G G G T T A A ACAAACA A C T TGT A C American_alligator G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Green_seaturtle G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Painted_turtle G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Chinese_softshell_turtle G A G C T G T C G G G T T A A ACAAACA A C T A G T A C Spiny_softshell_turtle G A G C T G T C G G G T T A A ACAAACA A C T A G T A C Lizard G A G C T G T C G G G T T A A ACAAACA A C T TGT A C X_tropicalis G A T C T G T G G G G T T A A ACAAACA A C T TGT A C Coelacanth G A A C T G T C G G G T T A A ACAAACA A C T TGT A C Tetraodon Fugu Yellowbelly_pufferfish Nile_tilapia G G C C T G T C G G G T T A A ACAAA C T CTC C - --- Princess_of_Burundi G G C C T G T C G G G T T A A ACAAA C T CTC T - --- Burtonʼs_mouthbreeder G G C C T G T C G G G T T A A ACAAA C T CTC T - --- Zebra_mbuna G G C C T G T C G G G T T A A ACAAA C T CTC T - --- Pundamilia_nyererei G G C C T G T C G G G T T A A ACAAA C T CTC T - --- Medaka G G C C T G T C A G G T T A A ACAAA C T CTC C - --- Southern_platyfish G G C C T G T C A G G T T A A ACAAA C T CTC C - --- Stickleback Atlantic_cod A G CTC G T C G G G T T C A A C T G A G T C G A C - --- Zebrafish G A G C T G T C G G G T T A A ACAAACA A C T TGT A T Mexican_tetra_(cavefish) A ------C A G G T T A A A C T A T C A A T T TGT A G Spotted_gar G A G C T G T C G G G T T A A ACAAACA A C T TGT A C Lamprey Simple Nucleotide Polymorphisms (dbSNP 151) Found in >= 1% of Samples Common SNPs(151) Repeating Elements by RepeatMasker RepeatMasker

b) Forkhead protein binding motif sequences (FOXG1, FOXI1, and FOXC1) and the actual sequence of a variety of vertebrates. Figure 4.6 Ultra-conservation of the likely causal variant region.

79

FOXG1 4bp deletion

DGV SRO

FOXG1

Figure 4.7 Cis-regulatory mechanism within FOXG1-TAD. 1) 4bp deletion chr14:29247628 TAAAC > T 2) The DGV regulation region affecting FOXG1 expression 137 3) The SRO regulation region affecting FOXG1 expression 137

80

Table 4.1 Ophthalmological characterization of the subject family.

Eye Type of reported movement Identifier Affected? strabismus full? Concomitant? Stereopsis Other features

001 Y Left esotropia Y N/A Absent Trace amblyopia Left exotropia and hypotropia (originally 002 Y likely esotropia) Y N/A Absent No double image, likely due to Suppression

003 N N/A N/A N/A N/A N/A

004 Y Right esotropia N/A Y N/A right cataract, right dense amblyopia

005 N N/A N/A N/A N/A N/A

006 Y N/A N/A N/A N/A N/A

007 Y N/A N/A N/A N/A Left eye macular degeneration

008 N N/A N/A N/A N/A N/A

009 Y Right hypertropia Y N Intact N/A

010 N N/A N/A N/A N/A N/A

011 Y Right esotropia Y Gross fusion & stereopsis N/A

012 N N/A N/A N/A N/A N/A Right hypertropia, likely Have the ability to use the 2 eyes together with stereopsis potential 013 Y 4th nerve palsy N when the 2 images are artificially aligned N/A Esotropia, right cataract, double vision, latent nystagmus, left eye 014 Y hypertropia Y N/A N/A suppression

81

Chapter 5: Conclusion

The genetic architecture of strabismus is poorly defined. Despite extensive efforts with familial and population genetic studies, the genes and molecular mechanisms contributing to eye misalignment have remained elusive. This thesis has confronted this challenge at three levels: trying to reveal and understand the relevant gene networks; clarifying the relationship between strabismus and other phenotypes; and seeking the identification of the first Mendelian autosomal dominant, isolated strabismus gene.

In this chapter, I will first explore the concept of genetic architecture and how insights from the more advanced studies of Parkinson’s disease and age-related macular degeneration

(AMD) can inform our expectations for strabismus research. Then, I will summarize the findings of this thesis, emphasizing how these findings contribute to a richer understanding of strabismus genetics. Lastly, limitations and new insights will be bridged to highlight future research directions that can help reveal the genetic architecture of strabismus.

5.1 Complex genetic architecture – Parkinson’s disease and age-related macular degeneration as models for strabismus research

Modern genetics increasingly recognizes that mechanisms range from simple Mendelian transmission to complex, modifier-influenced disruption of biological systems 143. In this age, the boundary between Mendelian diseases and common complex diseases has blurred, as the power of larger populations assessed with new technologies has been broadly explored 144. Studies with well-defined patient cohorts and increased resolution of genetic alterations start to reveal mechanistic layers contributing to human diseases and phenotypes. While progress in understanding the genetic architecture in strabismus has been slow, the study of other more 82

debilitating disorders has been impressive and can inform our expectations for strabismus research.

To illustrate progress in revealing the complexity of genetic architecture of human diseases, I’ll briefly examine the genetics of Parkinson’s disease and AMD. Parkinson’s is highlighted due to the advanced state of the genetics research, while AMD presents an example more relevant to vision research and illustrates the environmental modification on genetics.

Many genes have been identified as causal for forms of Parkinson’s disease.

Homozygous or compound heterozygous variants in PRKN and PINK1 have been found to cause early onset parkinsonism with high penetrance while heterozygous individuals for these same variants may cause susceptibility to the idiopathic form of the disease 145. Heterozygous variants in LRRK2, VPS35, and EIF4G1 have been observed to cause Parkinson’s with moderate penetrance 146. A diverse collection of additional genes and loci have been identified as causing sub-classes of Parkinson’s as reviewed by Hernandez and colleagues 147. In addition to the multiple forms of Parkinson’s disease and a range of genetic factors underlying each form, there is evidence that the insights arising from the genetic forms are highly relevant to the idiopathic forms, as there are rare variants of these genes enriched in sporadic cases 148.

Underlying much of the exploration of the genetic architecture of Parkinson’s disease is the concept of penetrance and the mechanisms that modify it. The penetrance of LRRK2 p.G2019S parkinsonism is incomplete, varying across ethnicities and subject to environment.

Examining a large Arab-Berber patient population in which the disease variant is common, genetic variants in the DNM3 gene have been linked to earlier onset of LRRK2 p.G2019S parkinsonism 149. With the advanced state of understanding of the genes contributing to

83

Parkinson’s disease and the higher resolution genotyping tools now available, researchers can develop a nuanced understanding of the genetic architecture of the disorder.

While Parkinson’s disease has demonstrated an evolved view spanning from a simple

Mendelian form to more complex forms, AMD, a complex ocular trait, has advanced our understanding regarding both genetic and non-genetic (environmental) modifiers. Over 20 genes/loci have been associated with late-stage AMD through GWAS studies, but only CFH and

ARMS2 have been identified for early-stage AMD 150. Among all the recognized genetic factors,

CFH and ARMS2 have the strongest effects on increasing the risk of developing AMD: the risk increased by ~2.5 fold and ~ 6 fold for individuals with heterozygous and homozygous CFH p.Y402H 151 and increased by ~2.5 fold and ~8.5 fold for individuals with heterozygous and homozygous ARMS2 p.A69S 152. While CFH is equally associated with both early and late- stage/advanced AMD, the ARMS2 locus is more closely associated with the advanced form 153.

The predictive power of developing AMD based on 19 common risk variants (minor allele frequency ≥ 1%), including CFH and ARMS2 ones, can distinguish cases and control well (area under the curve (AUC) = 0.8). Inclusion of eight additional risk variants only increases this risk score slightly (AUC = 0.83). On the other hand, the predictive power based on the non-genetic factors , including age, sex, AMD baseline grade, smoking, and body mass index is similar

(AUC = 0.78) to that of the model based on 19 variants 150. Furthermore, the absence of AMD in the lifetime of many individuals carrying risk alleles and the late onset of AMD in the population suggest that AMD development is modified by risk alleles as well as environmental factors through pathways interaction and synergistic mechanisms150. The example of AMD illustrates an additional layer, the lifetime exposure, acting upon the already complex genetic architecture, and it emphasizes how non-genetic modifiers can contribute to disrupt biological systems and thus 84

disease state. Such modifications further complicate the discovery of genetic architecture of a human disease

While the exploration of Parkinson’s disease and AMD may appear disconnected from the thesis, they represent examples of where strabismus genetics research can reach in regards of the pathophysiological complexity. The focus on a complete picture of the genetic architecture of a disorder requires a breadth of research approaches to reveal the complete mechanism(s).

5.2 Summary of the thesis

Prior genetic studies of strabismus have had limited success in finding causal genes for isolated strabismus. Recently a few genes (e.g. PAX3, AHI1) have been proposed based on whole exome sequencing in small families, but the evidence is inconclusive 42,43. Over the past 15 years, several genetic loci have been mapped in familial studies, but no two families have been reported that share the same locus with the same inheritance pattern. A recently reported large- scale genome wide association study (GWAS) suggested potential links to one SNP each for two types (nonaccommodative concomitant esotropia and accommodative esotropia) of strabismus 41. These observations imply a potentially complex genetic architecture and demonstrate the challenge in identifying specific genetic mutations for strabismus.

To advance the understanding of strabismus genetics, this thesis attempted to study strabismus from three levels: analyze the overall genetic network; examine associated phenotypes in a subgroup (strabismus co-occurring with intellectual disability); and identify genetic mutation in a family (Figure 5.1). These three studies consistently present ties to the brain and support the expectation of a complex genetic architecture for strabismus.

85

At the beginning of this project, there was no dataset of causal genes for disorders associated with strabismus, let alone a comprehensive analysis for the involved biological processes. To explore the genetic heterogeneity and the common underlying features potentially leading to strabismus, the network project presented in Chapter 2 was established to curate genes and explore the common attributes. This study compiled comprehensive lists of genes that are involved in disorders that include strabismus. The utility of the analysis results includes the identification of likely casual genes within linkage and GWAS loci and is demonstrated through the identification of nine genes for five linkage loci using the compiled gene lists. In addition, this study highlighted new aspects to consider in strabismus gene discovery, particularly the role of the Ras-MAPK pathway, for which the network analysis found 9 of 23 of the compiled and candidate genes to be associated with RASopathies. Overall the network analysis reveals a broad genetic connection between strabismus and disruptions affecting the central nervous system.

Strabismus is often a relatively minor phenotype and not commonly discussed in the context of neurological disorders, so its co-occurrence with other phenotypes is not well documented. Thus Chapter 3 focused on the prevalence of strabismus and its association with other phenotypes in the context of neurological disorders, using databases with comprehensive clinical characterization. Our study showed that strabismus was more prevalent in children with intellectual disability than in the general population. Furthermore, the study illustrated co- occurrence between strabismus and motor defects, suggesting an underlying mechanism of cerebellum defect in this subgroup of strabismus. In the findings of both Chapters 2 and 3, connections between the cerebellum and strabismus arise, implying an underlying movement coordination defect. Investigation based on cases with relevant features may help identify the genes and mechanisms for cerebellum defect-related strabismus. 86

Mendelian inheritance patterns have been qualitatively observed in individual families with strabismus since ancient times. Until recently, the confidence and resolution of linkage analysis have been limited by family size and genotyping technology. Chapter 4 was an effort to trace and present a large pedigree and to use WGS to improve the resolution within the linkage locus. In this study, which builds upon initial work presented in my master’s thesis, we identified a single linkage region with a LOD score of 4.77, fitting the observed autosomal dominant pattern in the subject family. The identification of a single linkage locus of significance supports the core hypothesis: isolated strabismus arises from a single locus with Mendelian dominant inheritance pattern in the subject family. The linkage region overlaps with the locus of FOXG1 syndrome, also known as congenital Rett syndrome. More than 80% of individuals with congenital Rett syndrome display a strabismus phenotype 124. Within the critical region, we identified a small deletion that potentially disrupts an auto-regulatory element for FOXG1, a critical transcription factor for cortex development. The potential regulatory role of this deletion is supported by different lines of evidence, including high evolutionary conservation of the sequence, the disruption of a canonical FoxG1 transcription factor binding sequence, and molecular ChIP-seq data for the binding of the FoxG1 transcription factor in published studies.

This study has identified the largest isolated strabismus family with the highest LOD score, highlighting that the strabismus phenotype can be isolated from the other phenotypes observed in

FOXG1 syndrome.

FOXG1 dysregulation can lead to imbalanced numbers between excitatory and inhibitory synapses 141. The effect of disrupted synapses signaling could impact multiple brain regions, potentially providing ties to the findings from Chapter 2. Synapse-related defects could have a wide-spread influence in strabismus development. Global chemical synaptic dysregulation is 87

observed in schizophrenia, for which exotropia has been found to occur at higher rates than in the general population154. An investigation on the genetics of schizophrenia population with exotropia may reveal a specific synapse pathway for functional studies.

This thesis investigated strabismus from three perspectives: a gene network analysis; phenotype correlations in strabismic individuals with intellectual disability; and in a familial genetics study of a seven-generation pedigree (Figure 5.1). Taken together, the findings of this thesis recurrently highlight a theme of central nervous system involvement. This thesis also highlights a complex genetic architecture underlying strabismus, suggesting new research strategies that consider and address the complexity are needed to unravel the causes of strabismus.

5.3 The importance of phenotyping

In order to properly identify and stratify a patient population to enable successful genetic studies, the importance of phenotyping should not be underestimated. If a set of mixed diseases are grouped together, the effect of the genetic factor can be diluted and become undetectable.

Two major challenges I faced during this thesis research related to the documentation and classification of strabismus.

Firstly, most existing studies and databases have very brief descriptions of strabismus.

For example, when “strabismus” was documented, its details (such as direction of deviation, concomitant vs incomitant, amblyopia, stereopsis etc.) were usually not included. In syndromes with major organs or performance defects, strabismus was not even reported while the photographs of patients indicated its presence. Therefore, it is difficult to estimate the strabismus prevalence in specific syndromes and to investigate strabismus mechanism in the context of 88

those syndromes. A proper form of documentation for patient phenotypes has been frequently discussed in the medical field due to the clinical time trade-off to obtain a full record and the debatable need to report negative clinical findings. An interesting direction for future pursuit is to work with patients as partners to more comprehensively capture phenotype descriptions.

Recently progress has been made in linking lay descriptions to Human Phenotype Ontology terms, potentially allowing non-clinicians to contribute to enriched phenotype descriptions 78.

Secondly, the existing strabismus classification systems can inappropriately separate individuals sharing a common underlying genetic cause. Some of the most common classification pairs are incomitant (e.g. congenital cranial dysinnervation disorders) vs. concomitant (e.g. accommodative esotropia, non-accommodative esotropia), and esotropia vs. exotropia. However, examination of the literature10, personal communication with other research groups, and our observation of subject families challenge the current classification scheme. In the family investigated in Chapter 4, we confirmed that isolated strabismus was the only shared phenotype amongst family members and the strabismus types were not uniform. The various types of strabismus, however, are caused by the same genetic factor since the linked haplotype is shared by all strabismic individuals.

As a spectrum of strabismus forms appear to arise from the same genetic variant, a more reliable or nuanced classification system may be required. In phenotype ontologies, hierarchical classification systems allow for groupings to be examined at multiple levels of resolution, which appears to be an important capacity missing in our current approach to strabismus phenotyping.

Such an advance could lead to an improved clinical understanding of strabismus and more efficient strategies to study the genetics of strabismus.

89

5.4 The importance of non-coding genetic variants

Recent studies start to reveal the impact of non-coding variants for human diseases. For example, there has been a large-scale analysis of 8,000 patients, investigating the impact of non- coding variants in neurodevelopmental disorders 155. While this study found that 42% of patients with developmental disorders carried pathogenic de novo mutations in coding sequences, 1-3% of patients without a diagnostic coding variant were found to carry pathogenic de novo mutations in fetal brain-active regulatory elements 155. These findings imply a critical role of regulatory elements for human diseases. In particular, the highly conserved elements active in fetal brain may be essential for neurological development and should be examined closely in related diseases. Indeed, the FOXG1 related deletion identified in Chapter 4 is a highly conserved fetal brain-active regulatory elements, as are causal regulatory alterations underlying a subset of aniridia cases 139.

The efficiency of non-coding variants identification and interpretation is low compared to coding variants. Many cases without a satisfying causal coding variant remain unsolved, in part due to the difficulty in assessing non-coding variants. In addition to the analysis for intronic variants affecting splice sites, new tools to interpret non-coding variants are being developed based on deepening knowledge of cis-regulatory regions (promoter, enhancer, suppressor), regulatory RNA, and epigenetic and other chromatin properties 156. The application of these approaches to large collections of individuals with strabismus may be valuable. As WGS becomes widespread, the genetic variations within previously reported strabismus loci can be compiled, with the potential to reveal specific DNA positions associated with strabismus risk.

As discussed in Chapter 2, experimental validation of the non-coding candidate variants will require appropriate animal models, which are lacking. Continuous attempts to identify and 90

annotate non-coding variants in strabismus loci may lead to new discoveries and highlight genes in the same mechanical pathway.

5.5 Conclusion

An understanding of the genetic architecture of strabismus is an elusive goal of the strabismus research community. New attempts to find causal variants for strabismus are ongoing, but to date both familial and population genetic studies have had limited success. This thesis identifies potential roles for genes participating in the Ras-MAPK pathway, emphasizes the role of the central nervous system, and reveals FOXG1 as a causal gene for isolated strabismus. The work will require further exploration, but reveals portions of the long-sought genetic architecture.

91

Chapter 2 Genotyping Population

Overall Genetic Network

Chapter 3 Phenotyping Intellectual Disabilty Subgroup

Associated Phenotypes

Chapter 4 Genotyping A Family Phenotyping Specific Genetic Variant

Figure 5.1 Schematic representation of the thesis.

92

Bibliography

1. Lorenz B. Genetics of isolated and syndromic strabismus: facts and perspectives. Strabismus. 2002 Jun;10(2):147–56.

2. Mazyn LIN, Lenoir M, Montagne G, Savelsbergh GJP. The contribution of stereo vision to one-handed catching. Exp Brain Res. 2004 Aug;157(3):383–90.

3. Satterfield D, Keltner JL, Morrison TL. Psychosocial aspects of strabismus study. Arch Ophthalmol. 1993 Aug;111(8):1100–5.

4. Grosvenor T, Grosvenor TP. The Binocular Vision Examination. In: Primary Care Optometry. Elsevier Health Sciences; 2007. p. 244.

5. Engle EC. The genetic basis of complex strabismus. Pediatr Res. 2006 Mar;59(3):343–8.

6. Multi-ethnic Pediatric Eye Disease Study Group. Prevalence of Amblyopia and Strabismus in African American and Hispanic Children Ages 6 to 72 Months: The Multi-ethnic Pediatric Eye Disease Study. Ophthalmology. 2008 Jul;115(7):1229-1236.e1.

7. Chia A, Dirani M, Chan Y-H, Gazzard G, Eong K-GA, Selvaraj P, et al. Prevalence of Amblyopia and Strabismus in Young Singaporean Chinese Children. IOVS. 2010 Jul 1;51(7):3411–7.

8. Matsuo T, Matsuo C. The Prevalence of Strabismus and Amblyopia in Japanese Elementary School Children. Ophthalmic Epidemiology. 2005 Jan;12(1):31–6.

9. Engle EC. Genetic basis of congenital strabismus. Arch Ophthalmol. 2007 Feb;125(2):189– 95.

10. Ferreira R da C, Oelrich F, Bateman B. Genetic aspects of strabismus. Arquivos Brasileiros de Oftalmologia. 2002 Mar;65(2):171–5.

11. Mash AJ, Spivey BE. Genetic aspects of strabismus. Doc Ophthalmol. 1973 Feb 1;34(1):285–91.

12. Tinley C, Grötte R. Comitant horizontal strabismus in South African black and mixed race children--a clinic-based study. Ophthalmic Epidemiol. 2012 Apr;19(2):89–94.

13. Dufier JL, Briard ML, Bonaiti C, Frezal J, Saraux H. Inheritance in the Etiology of Convergent Squint. Ophthalmologica. 1979;179(4):225–34.

14. Chew E, Remaley NA, Tamboli A, Zhao J, Podgor MJ, Klebanoff M. Risk factors for esotropia and exotropia. Arch Ophthalmol. 1994 Oct;112(10):1349–55.

15. Horta-Santini JM, Vergara C, Colón-Casasnovas JE, Izquierdo NJ. Strabismus surgery at the Puerto Rico Medical Center: a brief report. P R Health Sci J. 2011 Dec;30(4):203–5. 93

16. Maconachie GDE, Gottlob I, McLean RJ. Risk Factors and Genetics in Common Comitant Strabismus: A Systematic Review of the Literature. JAMA Ophthalmol. 2013 Jul 11;1–8.

17. Graham PA. Epidemiology of strabismus. Br J Ophthalmol. 1974 Mar;58(3):224–31.

18. Torp-Pedersen T, Boyd HA, Skotte L, Haargaard B, Wohlfahrt J, Holmes JM, et al. Strabismus Incidence in a Danish Population-Based Cohort of Children. JAMA Ophthalmol. 2017 Oct 1;135(10):1047–53.

19. Nusz KJ, Mohney BG, Diehl NN. The Course of Intermittent Exotropia in a Population- Based Cohort. Ophthalmology. 2006 Jul 1;113(7):1154–8.

20. Spontaneous resolution of early-onset esotropia: experience of the congenital esotropia observational study21A complete list of the Investigator group is listed at the end of this article.22Address reprint requests to PEDIG Data Coordinating Center, Jaeb Center for Health Research, 3010 East 138th Avenue, Suite 9, Tampa, FL 33613; e-mail: [email protected]. American Journal of Ophthalmology. 2002 Jan 1;133(1):109–18.

21. Demer JL. Neuroanatomical Strabismus. In: Pediatric Ophthalmology, Neuro- Ophthalmology, Genetics [Internet]. Springer Berlin Heidelberg; 2010. p. pp 59-75. (Essentials in Ophthalmology). Available from: http://link.springer.com.ezproxy.library.ubc.ca/chapter/10.1007/978-3-540-85851-5_6

22. Worth CA. Squint : its causes, pathology and treatment [Internet]. Philadelphia : Blakiston; 1903 [cited 2013 Dec 5]. 260 p. Available from: http://archive.org/details/squintitscausesp00wortrich

23. Von Noorden GK, Campos EC. Chapter 2. Binocular Vision and Space Percpetion. In: Binocular vision and ocular motility: theory and management of strabismus. 6th ed. St. Louis, Mo: Mosby; 2002. p. 7–35.

24. Tychsen L. Infantile esotropia: current neurophysiologic concepts. In: Clinical strabismus management: principles and surgical techniques. Philadelphia: Saunders; 1999.

25. Tychsen L. Visual Cortex Mechanisms of Strabismus: Development and Maldevelopment. In: Lorenz B, Brodsky MC, editors. Pediatric Ophthalmology, Neuro-Ophthalmology, Genetics [Internet]. Springer Berlin Heidelberg; 2010 [cited 2013 Nov 26]. p. 41–57. (Essentials in Ophthalmology). Available from: http://link.springer.com/chapter/10.1007/978-3-540-85851-5_5

26. Schoeff K, Chaudhuri Z, Demer JL. Functional magnetic resonance imaging of horizontal rectus muscles in esotropia. J AAPOS. 2013 Feb;17(1):16–21.

27. Helveston EM. Understanding, detecting, and managing strabismus. Community Eye Health. 2010 Mar;23(72):12–4.

94

28. Das VE. Strabismus and the Oculomotor System: Insights from Macaque Models. Annual Review of Vision Science. 2016;2(1):37–59.

29. Heesy CP. On the relationship between orbit orientation and binocular visual field overlap in mammals. The Anatomical Record Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology. 2004 Nov 1;281A(1):1104–10.

30. Schlossman A, Priestley BS. Role of heredity in etiology and treatment of strabismus. AMA Arch Ophthalmol. 1952 Jan;47(1):1–20.

31. Ziakas NG, Woodruff G, Smith LK, Thompson JR. A study of heredity as a risk factor in strabismus. Eye (Lond). 2002 Sep;16(5):519–21.

32. Aurell E, Norrsell K. A longitudinal study of children with a family history of strabismus: factors determining the incidence of strabismus. Br J Ophthalmol. 1990 Oct;74(10):589–94.

33. Matsuo T, Hayashi M, Fujiwara H, Yamane T, Ohtsuki H. Concordance of strabismic phenotypes in monozygotic versus multizygotic twins and other multiple births. Jpn J Ophthalmol. 2002 Feb;46(1):59–64.

34. Wilmer JB, Backus BT. Genetic and environmental contributions to strabismus and phoria: evidence from twins. Vision Res. 2009 Oct;49(20):2485–93.

35. Podgor MJ, Remaley NA, Chew E. Associations between siblings for esotropia and exotropia. Arch Ophthalmol. 1996 Jun;114(6):739–44.

36. Waardenburg PJ. Squint and heredity. Doc Ophthalmol Proc Ser. 1954;7–8:422–94.

37. Parikh V, Shugart YY, Doheny KF, Zhang J, Li L, Williams J, et al. A strabismus susceptibility locus on chromosome 7p. Proc Natl Acad Sci USA. 2003 Oct 14;100(21):12283–8.

38. Rice A, Nsengimana J, Simmons IG, Toomes C, Hoole J, Willoughby CE, et al. Replication of the recessive STBMS1 locus but with dominant inheritance. Invest Ophthalmol Vis Sci. 2009 Jul;50(7):3210–7.

39. Fujiwara H, Matsuo T, Sato M, Yamane T, Kitada M, Hasebe S, et al. Genome-wide search for strabismus susceptibility loci. Acta Med Okayama. 2003 Jun;57(3):109–16.

40. Shaaban S, Matsuo T, Fujiwara H, Itoshima E, Furuse T, Hasebe S, et al. Chromosomes 4q28.3 and 7q31.2 as new susceptibility loci for comitant strabismus. Invest Ophthalmol Vis Sci. 2009 Feb;50(2):654–61.

41. Shaaban S, MacKinnon S, Andrews C, Staffieri SE, Maconachie GDE, Chan W-M, et al. Genome-Wide Association Study Identifies a Susceptibility Locus for Comitant Esotropia and Suggests a Parent-of-Origin Effect. Invest Ophthalmol Vis Sci. 2018 Aug 1;59(10):4054–64. 95

42. Gong H-M, Wang J, Xu J, Zhou Z-Y, Li J-W, Chen S-F. Identification of rare paired box 3 variant in strabismus by whole exome sequencing. Int J Ophthalmol. 2017 Aug 18;10(8):1223–8.

43. Min X, Fan H, Zhao G, Liu G. Identification of 2 Potentially Relevant Gene Mutations Involved in Strabismus Using Whole-Exome Sequencing. Med Sci Monit. 2017 Apr 9;23:1719–24.

44. Meienberg J, Bruggmann R, Oexle K, Matyas G. Clinical sequencing: is WGS the better WES? Hum Genet. 2016;135:359–62.

45. Zhang F, Lupski JR. Non-coding genetic variants in human disease. Hum Mol Genet. 2015 Oct 15;24(R1):R102-110.

46. Chilton JK, Guthrie S. Axons get ahead: Insights into axon guidance and congenital cranial dysinnervation disorders. Developmental Neurobiology. 2017 Jul 1;77(7):861–75.

47. Graeber CP, Hunter DG, Engle EC. The Genetic Basis of Incomitant Strabismus: Consolidation of the Current Knowledge of the Genetic Foundations of Disease. Seminars in Ophthalmology. 2013 Sep 1;28(5–6):427–37.

48. Appukuttan B, Gillanders E, Juo SH, Freas-Lutz D, Ott S, Sood R, et al. Localization of a gene for Duane retraction syndrome to chromosome 2q31. Am J Hum Genet. 1999 Dec;65(6):1639–46.

49. Andrews CV, Hunter DG, Engle EC. Duane Syndrome. In: Pagon RA, Adam MP, Bird TD, Dolan CR, Fong C-T, Stephens K, editors. GeneReviewsTM [Internet]. Seattle (WA): University of Washington, Seattle; 1993 [cited 2013 Aug 6]. Available from: http://www.ncbi.nlm.nih.gov/books/NBK1190/

50. Khan AO, Shinwari J, Al Sharif L, Khalil D, Al-Gehedan S, Tassan NAA. Infantile esotropia could be oligogenic and allelic with Duane retraction syndrome. Mol Vis. 2011;17:1997–2002.

51. Connell BJ, Wilkinson RM, Barbour JM, Scotter LW, Poulsen JL, Wirth MG, et al. Are Duane syndrome and infantile esotropia allelic? Ophthalmic Genet. 2004 Sep;25(3):189– 98.

52. Miyake N, Chilton J, Psatha M, Cheng L, Andrews C, Chan W-M, et al. Human CHN1 mutations hyperactivate alpha2-chimaerin and cause Duane’s retraction syndrome. Science. 2008 Aug 8;321(5890):839–43.

53. Miyake N, Andrews C, Fan W, He W, Chan W-M, Engle EC. CHN1 mutations are not a common cause of sporadic Duane’s retraction syndrome. Am J Med Genet A. 2010 Jan;152A(1):215–7.

96

54. Park JG, Tischfield MA, Nugent AA, Cheng L, Di Gioia SA, Chan W-M, et al. Loss of MAFB Function in Humans and Mice Causes Duane Syndrome, Aberrant Extraocular Muscle Innervation, and Inner-Ear Defects. The American Journal of Human Genetics. 2016 Jun 2;98(6):1220–7.

55. Zankl A, Duncan EL, Leo PJ, Clark GR, Glazov EA, Addor M-C, et al. Multicentric carpotarsal osteolysis is caused by mutations clustering in the amino-terminal transcriptional activation domain of MAFB. Am J Hum Genet. 2012 Mar 9;90(3):494–501.

56. Al-Baradie R, Yamada K, St Hilaire C, Chan W-M, Andrews C, McIntosh N, et al. Duane radial ray syndrome (Okihiro syndrome) maps to 20q13 and results from mutations in SALL4, a new member of the SAL family. Am J Hum Genet. 2002 Nov;71(5):1195–9.

57. Kohlhase J, Heinrich M, Schubert L, Liebers M, Kispert A, Laccone F, et al. Okihiro syndrome is caused by SALL4 mutations. Hum Mol Genet. 2002 Nov 1;11(23):2979–87.

58. Kohlhase J, Chitayat D, Kotzot D, Ceylaner S, Froster UG, Fuchs S, et al. SALL4 mutations in Okihiro syndrome (Duane-radial ray syndrome), acro-renal-ocular syndrome, and related disorders. Hum Mutat. 2005 Sep;26(3):176–83.

59. Koshiba-Takeuchi K, Takeuchi JK, Arruda EP, Kathiriya IS, Mo R, Hui C, et al. Cooperative and antagonistic interactions between Sall4 and Tbx5 pattern the mouse limb and heart. Nat Genet. 2006 Feb;38(2):175–83.

60. Sakaki-Yumoto M, Kobayashi C, Sato A, Fujimura S, Matsumoto Y, Takasato M, et al. The murine homolog of SALL4, a causative gene in Okihiro syndrome, is essential for embryonic stem cell proliferation, and cooperates with Sall1 in anorectal, heart, brain and kidney development. Development. 2006 Aug 1;133(15):3005–13.

61. Yamada K, Andrews C, Chan W-M, McKeown CA, Magli A, de Berardinis T, et al. Heterozygous mutations of the kinesin KIF21A in congenital fibrosis of the extraocular muscles type 1 (CFEOM1). Nat Genet. 2003 Dec;35(4):318–21.

62. Nakano M, Yamada K, Fain J, Sener EC, Selleck CJ, Awad AH, et al. Homozygous mutations in ARIX(PHOX2A) result in congenital fibrosis of the extraocular muscles type 2. Nat Genet. 2001 Nov;29(3):315–20.

63. Tischfield MA, Baris HN, Wu C, Rudolph G, Van Maldergem L, He W, et al. Human TUBB3 mutations perturb microtubule dynamics, kinesin interactions, and axon guidance. Cell. 2010 Jan 8;140(1):74–87.

64. Whitman MC, Andrews C, Chan W-M, Tischfield MA, Stasheff SF, Brancati F, et al. Two unique TUBB3 mutations cause both CFEOM3 and malformations of cortical development. American Journal of Medical Genetics Part A. 2016 Feb 1;170(2):297–305.

97

65. Friocourt F, Chédotal A. The Robo3 receptor, a key player in the development, evolution, and function of commissural systems. Developmental Neurobiology. 2017 Jul 1;77(7):876– 90.

66. Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine J-P, et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2019 Jan 8;47(D1):D1018–27.

67. Newacheck PW, Taylor WR. Childhood chronic illness: prevalence, severity, and impact. Am J Public Health. 1992 Mar;82(3):364–71.

68. Motley WW. Chapter 12: Strabismus. In: Riordan-Eva P, Augsburger JJ, editors. Vaughan & Asbury’s General Ophthalmology, 19e [Internet]. New York, NY: McGraw-Hill Education; 2017 [cited 2018 Feb 28]. Available from: accessmedicine.mhmedical.com/content.aspx?aid=1144468906

69. Uretmen O, Egrilmez S, Kose S, Pamukçu K, Akkin C, Palamar M. Negative social bias against children with strabismus. Acta Ophthalmologica Scandinavica. 2003 Apr 1;81(2):138–42.

70. Olson JH, Louwagie CR, Diehl NN, Mohney BG. Congenital esotropia and the risk of mental illness by early adulthood. Ophthalmology. 2012 Jan;119(1):145–9.

71. Ye XC, Pegado V, Patel MS, Wasserman WW. Strabismus genetics across a spectrum of eye misalignment disorders. Clin Genet. 2014 Aug;86(2):103–11.

72. Bui Quoc E, Milleret C. Origins of strabismus and loss of binocular vision. Front Integr Neurosci [Internet]. 2014 Sep 25;8. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4174748/

73. Williams A, Hoyt CS. Acute comitant esotropia in children with brain tumors. Arch Ophthalmol. 1989 Mar 1;107(3):376–8.

74. Lee J-M, Kim S-H, Lee J-I, Ryou J-Y, Kim S-Y. Acute comitant esotropia in a child with a cerebellar tumor. Korean J Ophthalmol. 2009 Sep;23(3):228–31.

75. Wiwatwongwana A, Lyons CJ. Chapter 156 - Eye movement control and its disorders. In: Dulac O, Lassonde M, Sarnat HB, editors. Handbook of Clinical Neurology [Internet]. Elsevier; 2013. p. 1505–13. (Pediatric Neurology Part III; vol. 113). Available from: http://www.sciencedirect.com/science/article/pii/B9780444595652000216

76. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD). Online Mendelian Inheritance in Man, OMIM. [Internet]. [cited 2017 Dec 19]. Available from: https://www.omim.org/

98

77. Kochinke K, Zweier C, Nijhof B, Fenckova M, Cizek P, Honti F, et al. Systematic Phenomics Analysis Deconvolutes Genes Mutated in Intellectual Disability into Biologically Coherent Modules. Am J Hum Genet. 2016 Jan 7;98(1):149–64.

78. Köhler S, Vasilevsky NA, Engelstad M, Foster E, McMurry J, Aymé S, et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 2017 Jan 4;45(Database issue):D865–76.

79. Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009 Apr 15;25(8):1091–3.

80. Ballouz S, Pavlidis P, Gillis J. Using predictive specificity to determine when gene set analysis is biologically meaningful. Nucleic Acids Res. 2017 Feb 28;45(4):e20.

81. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016 Jul 8;44(Web Server issue):W90–7.

82. Dougherty JD, Schmidt EF, Nakajima M, Heintz N. Analytical approaches to RNA profiling data for the identification of genes enriched in specific cells. Nucleic Acids Res. 2010 Jul;38(13):4218–30.

83. Grote S, Prüfer K, Kelso J, Dannemann M. ABAEnrichment: an R package to test for gene set expression enrichment in the adult and developing human brain. Bioinformatics. 2016 Oct 15;32(20):3201–3.

84. Miller JA, Ding S-L, Sunkin SM, Smith KA, Ng L, Szafer A, et al. Transcriptional landscape of the prenatal human brain. Nature. 2014 Apr;508(7495):199–206.

85. Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9(Suppl 1):S4.

86. Deng Y, Gao L, Wang B, Guo X. HPOSim: an R package for phenotypic similarity measure and enrichment analysis based on the human phenotype ontology. PLoS ONE. 2015;10(2):e0115692.

87. Nielsen LS, Skov L, Jensen H. Visual dysfunctions and ocular disorders in children with developmental delay. II. Aspects of refractive errors, strabismus and contrast sensitivity. Acta Ophthalmologica Scandinavica. 2007 Jun 1;85(4):419–26.

88. Bagheri A, Fallahi MR, Tamannaifard S, Vajebmonfared S, Zonozian S. Intelligence Quotient (IQ) in Congenital Strabismus. J Ophthalmic Vis Res. 2013 Apr;8(2):139–46.

89. Dataset - ProteomicsDB Cell Type and Tissue Protein Expression Profiles [Internet]. [cited 2018 Sep 4]. Available from:

99

http://amp.pharm.mssm.edu/Harmonizome/dataset/ProteomicsDB+Cell+Type+and+Tissue +Protein+Expression+Profiles

90. Koehler K, Malik M, Mahmood S, Gießelmann S, Beetz C, Hennings JC, et al. Mutations in GMPPA cause a glycosylation disorder characterized by intellectual disability and autonomic dysfunction. Am J Hum Genet. 2013 Oct 3;93(4):727–34.

91. Elson E, Perveen R, Donnai D, Wall S, Black GCM. De novo GLI3 mutation in acrocallosal syndrome: broadening the phenotypic spectrum of GLI3 defects and overlap with murine models. J Med Genet. 2002 Nov;39(11):804–6.

92. Priolo M, Grosso E, Mammì C, Labate C, Naretto VG, Vacalebre C, et al. A peculiar mutation in the DNA-binding/dimerization domain of NFIX causes Sotos-like overgrowth syndrome: a new case. Gene. 2012 Dec 10;511(1):103–5.

93. Hayashi S, Wakabayashi K, Ishikawa A, Nagai H, Saito M, Maruyama M, et al. An autopsy case of autosomal-recessive juvenile parkinsonism with a homozygous exon 4 deletion in the parkin gene. Movement Disorders. 2000 Sep 1;15(5):884–8.

94. Kang SL, Shaikh AG, Ghasia FF. Vergence and Strabismus in Neurodegenerative Disorders. Front Neurol [Internet]. 2018 May 16 [cited 2018 Sep 11];9. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5964131/

95. Pevec U, Rozman N, Gorsek B, Kunej T. RASopathies: Presentation at the Genome, Interactome, and Phenome Levels. Mol Syndromol. 2016 May;7(2):72–9.

96. RASopathies [Internet]. The RASopathies Network. [cited 2018 Mar 19]. Available from: https://rasopathiesnet.org/rasopathies/

97. van der Lee R, Feng Q, Langereis MA, Ter Horst R, Szklarczyk R, Netea MG, et al. Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response. PLoS Comput Biol. 2015 Oct;11(10):e1004553.

98. Gursoy H, Basmak H, Bilgin B, Erol N, Colak E. The effects of mild-to-severe retinopathy of prematurity on the development of refractive errors and strabismus. Strabismus. 2014 Jun;22(2):68–73.

99. Mathews KD, Afifi AK, Hanson JW. Autosomal Recessive Cerebellar Hypoplasia. J Child Neurol. 1989 Jul 1;4(3):189–94.

100. Hüfner K, Frenzel C, Kremmyda O, Adrion C, Bardins S, Glasauer S, et al. Esophoria or esotropia in adulthood: a sign of cerebellar dysfunction? J Neurol. 2015 Mar 1;262(3):585– 92.

101. Tan G, Huang X, Zhang Y, Wu A-H, Zhong Y-L, Wu K, et al. A functional MRI study of altered spontaneous brain activity pattern in patients with congenital comitant strabismus

100

using amplitude of low-frequency fluctuation. Neuropsychiatr Dis Treat. 2016 May 20;12:1243–50.

102. Huang X, Li S-H, Zhou F-Q, Zhang Y, Zhong Y-L, Cai F-Q, et al. Altered intrinsic regional brain spontaneous activity in patients with comitant strabismus: a resting-state functional MRI study. Neuropsychiatr Dis Treat. 2016 Jun 3;12:1303–8.

103. Dal Monte O, Costa VD, Noble PL, Murray EA, Averbeck BB. Amygdala lesions in rhesus macaques decrease attention to threat. Nature Communications. 2015 Dec 14;6:10161.

104. Micera A, Lambiase A, Aloe L, Bonini S, Levi-Schaffer F, Bonini S. Nerve growth factor involvement in the visual system: implications in allergic and neurodegenerative diseases. Cytokine Growth Factor Rev. 2004 Dec;15(6):411–7.

105. Willoughby CL, Fleuriet J, Walton MM, Mustari MJ, McLoon LK. Adaptation of slow myofibers: the effect of sustained BDNF treatment of extraocular muscles in infant nonhuman primates. Invest Ophthalmol Vis Sci. 2015 Jun;56(6):3467–83.

106. Agarwal AB, Feng C-Y, Altick AL, Quilici DR, Wen D, Johnson LA, et al. Altered Protein Composition and Gene Expression in Strabismic Human Extraocular Muscles and Tendons. Invest Ophthalmol Vis Sci. 2016 Oct 1;57(13):5576–85.

107. Hennigan A, O’Callaghan RM, Kelly AM. Neurotrophins and their receptors: roles in plasticity, neurodegeneration and neuroprotection. Biochem Soc Trans. 2007 Apr;35(Pt 2):424–7.

108. Cabelli RJ, Hohn A, Shatz CJ. Inhibition of ocular dominance column formation by infusion of NT-4/5 or BDNF. Science. 1995 Mar 17;267(5204):1662–6.

109. Tidyman WE, Rauen KA. Expansion of the RASopathies. Curr Genet Med Rep. 2016 Sep 1;4(3):57–64.

110. Lee NB, Kelly L, Sharland M. Ocular manifestations of Noonan syndrome. Eye (Lond). 1992;6 ( Pt 3):328–34.

111. Jindal GA, Goyal Y, Burdine RD, Rauen KA, Shvartsman SY. RASopathies: unraveling mechanisms with animal models. Dis Model Mech. 2015 Aug 1;8(8):769–82.

112. Carulla LS, Reed GM, Vaez-Azizi LM, Cooper S-A, Leal RM, Bertelli M, et al. Intellectual developmental disorders: towards a new name, definition and framework for “mental retardation/intellectual disability” in ICD-11. World Psychiatry. 2011 Oct;10(3):175–80.

113. Zablotsky B, Black LI, Blumberg SJ. Estimated Prevalence of Children With Diagnosed Developmental Disabilities in the United States, 2014-2016. NCHS Data Brief. 2017;(291):1–8.

101

114. Kim E, Kim JH, Hwang JM, Choi BS, Jung C. MR Imaging of Congenital or Developmental Neuropathic Strabismus: Common and Uncommon Findings. American Journal of Neuroradiology. 2012 Dec 1;33(11):2056–61.

115. BC Data Scout | www.popdata.bc.ca [Internet]. [cited 2018 Jul 16]. Available from: https://www.popdata.bc.ca/resources/BCDataScout

116. Generated from BC Health Data Discovery Service, August 13 2018, ID 597044409.

117. Robinson PN. Deep phenotyping for precision medicine. Human Mutation. 33(5):777–80.

118. Akinci A, Oner O, Bozkurt OH, Guven A, Degerliyurt A, Munir K. Refractive errors and ocular findings in children with intellectual disability: A controlled study. Journal of American Association for Pediatric Ophthalmology and Strabismus. 2008 Oct 1;12(5):477– 81.

119. Lisi EC, Cohn RD. Genetic evaluation of the pediatric patient with hypotonia: perspective from a hypotonia specialty clinic and review of the literature. Developmental Medicine & Child Neurology. 2011 Jul 1;53(7):586–99.

120. Laugel V, Cossée M, Matis J, Saint-Martin A de, Echaniz-Laguna A, Mandel J-L, et al. Diagnostic approach to neonatal hypotonia: retrospective study on 144 neonates. Eur J Pediatr. 2008 May 1;167(5):517–23.

121. Bodensteiner JB. The Evaluation of the Hypotonic Infant. Seminars in Pediatric Neurology. 2008 Mar 1;15(1):10–20.

122. Manto M, Bower JM, Conforto AB, Delgado-García JM, Guarda SNF da, Gerwig M, et al. Consensus Paper: Roles of the Cerebellum in Motor Control—The Diversity of Ideas on Cerebellar Involvement in Movement. Cerebellum. 2012 Jun 1;11(2):457–87.

123. Gilman S. The Mechanism Of Cerebellar Hypotonia -an Experimental Study in the Monkey. Brain. 1969 Mar 1;92(3):621–38.

124. Mitter D, Pringsheim M, Kaulisch M, Plümacher KS, Schröder S, Warthemann R, et al. FOXG1 syndrome: genotype–phenotype association in 83 patients with FOXG1 variants. Genetics in Medicine. 2018 Jan;20(1):98–108.

125. Schäffer AA, Lemire M, Ott J, Lathrop GM, Weeks DE. Coordinated conditional simulation with SLINK and SUP of many markers linked or associated to a trait in large pedigrees. Hum Hered. 2011;71(2):126–34.

126. Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin--rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002 Jan;30(1):97–101.

127. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. 102

128. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754–60.

129. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297–303.

130. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078–9.

131. Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Meth. 2012 May;9(5):473–6.

132. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014 Jan;42(Database issue):D142-147.

133. Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science. 2013 Oct 4;342(6154):1235587.

134. Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Mar;46(3):310–5.

135. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug;536(7616):285–91.

136. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012 Sep;22(9):1790–7.

137. Mehrjouy MM, Fonseca ACS, Ehmke N, Paskulin G, Novelli A, Benedicenti F, et al. Regulatory variants of FOXG1 in the context of its topological domain organisation. European Journal of Human Genetics. 2018 Feb;26(2):186–96.

138. Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, van der Lee R, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 2018 Jan 4;46(D1):D260–6.

139. Bhatia S, Bengani H, Fish M, Brown A, Divizia MT, de Marco R, et al. Disruption of Autoregulatory Feedback by a Mutation in a Remote, Ultraconserved PAX6 Enhancer Causes Aniridia. Am J Hum Genet. 2013 Dec 5;93(6):1126–34.

140. Hébert JM, Fishell G. The genetics of early telencephalon patterning: some assembly required. Nature Reviews Neuroscience. 2008 Sep;9(9):678–85. 103

141. Patriarchi T, Amabile S, Frullanti E, Landucci E, Rizzo CL, Ariani F, et al. Imbalance of excitatory/inhibitory synaptic protein expression in iPSC-derived neurons from FOXG1+/− patients and in foxg1+/− mice. European Journal of Human Genetics. 2016 Jun;24(6):871– 80.

142. Cargnin F, Kwon J-S, Katzman S, Chen B, Lee JW, Lee S-K. FOXG1 Orchestrates Neocortical Organization and Cortico-Cortical Connections. Neuron. 2018 Dec 5;100(5):1083-1096.e5.

143. Freund MK, Burch KS, Shi H, Mancuso N, Kichaev G, Garske KM, et al. Phenotype- Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits. The American Journal of Human Genetics. 2018 Oct 4;103(4):535–52.

144. Katsanis N. The continuum of causality in human genetic disorders. Genome Biology. 2016 Nov 17;17(1):233.

145. Wirdefeldt K, Adami H-O, Cole P, Trichopoulos D, Mandel J. Epidemiology and etiology of Parkinson’s disease: a review of the evidence. Eur J Epidemiol. 2011 May 28;26(1):1.

146. Spatola M, Wider C. Genetics of Parkinson’s disease: the yield. Parkinsonism & Related Disorders. 2014 Jan 1;20:S35–8.

147. Hernandez DG, Reed X, Singleton AB. Genetics in Parkinson disease: Mendelian versus non-Mendelian inheritance. Journal of Neurochemistry. 2016;139(S1):59–74.

148. Spataro N, Rodríguez JA, Navarro A, Bosch E. Properties of human disease genes and the role of genes linked to Mendelian disorders in complex disease aetiology. Hum Mol Genet. 2017 Feb 1;26(3):489–500.

149. Trinh J, Gustavsson EK, Vilariño-Güell C, Bortnick S, Latourelle J, McKenzie MB, et al. DNM3 and genetic modifiers of age of onset in LRRK2 Gly2019Ser parkinsonism: a genome-wide linkage and association study. The Lancet Neurology. 2016 Nov 1;15(12):1248–56.

150. Fritsche LG, Fariss RN, Stambolian D, Abecasis GR, Curcio CA, Swaroop A. Age-Related Macular Degeneration: Genetics and Biology Coming Together. Annual Review of Genomics and Human Genetics. 2014;15(1):151–71.

151. Thakkinstian A, Han P, McEvoy M, Smith W, Hoh J, Magnusson K, et al. Systematic review and meta-analysis of the association between complement factor H Y402H polymorphisms and age-related macular degeneration. Hum Mol Genet. 2006 Sep 15;15(18):2784–90.

152. Ross RJ, Bojanowski CM, Wang JJ, Chew EY, Rochtchina E, Ferris FL, et al. The LOC387715 polymorphism and age-related macular degeneration: replication in three case- control samples. Invest Ophthalmol Vis Sci. 2007 Mar;48(3):1128–32.

104

153. Brión M, Sanchez-Salorio M, Cortón M, Fuente M de la, Pazos B, Othman M, et al. Genetic association study of age-related macular degeneration in the Spanish population. Acta Ophthalmologica. 2011;89(1):e12–22.

154. Agarwal AB, Christensen AJ, Feng C-Y, Wen D, Johnson LA, Bartheld CS von. Expression of schizophrenia biomarkers in extraocular muscles from patients with strabismus: an explanation for the link between exotropia and schizophrenia? PeerJ. 2017 Dec 22;5:e4214.

155. Short PJ, McRae JF, Gallone G, Sifrim A, Won H, Geschwind DH, et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature. 2018 Mar;555(7698):611–6.

156. Mathelier A, Shi W, Wasserman WW. Identification of altered cis-regulatory elements in human disease. Trends in Genetics. 2015 Feb 1;31(2):67–76.

105

Appendices

Appendix A Supplementary Material for Chapter 2

Methods ClueGO selection criteria Statistical Test Used = Enrichment/Depletion (Two-sided hypergeometric test) Correction Method Used = Bonferroni step down Min GO Level = 8 Max GO Level = 15 Cluster #1 Sample File Name = File selection: Number of Genes = 2 Min Percentage = 6.0

GO Fusion = false GO Group = true Kappa Score Threshold = 0.4 Over View Term = SmallestPValue Group By Kappa Statistics = true Initial Group Size = 1 Sharing Group Percentage = 50.0

Refining function enrichment modules with ErmineJ A gene can be associated with multiple functions, and heavily annotated genes can bias the results of GO enrichment analysis 80. To reduce the resulting bias in GO enrichment, we used ErmineJ for GO term analysis of 233 genes (P-Strab). After correction, only five GO groups remain significant at a false discovery rate of 0.05. These five groups can be categorized into two main groups: photoreceptor-associated and cilliary transition zone associated. The photoreceptor- associated gene set is a subset of the “camera-type eye development” group while the cilliary transition zone associated gene set cannot be linked to any module identified from the ClueGo analysis. While the clarity of the resulting sets provides insight, the exclusion of common embryonic development processes by ErmineJ may be too stringent for the study of strabismus genes, and thus we elected to include all of the genes from the P-Strab Set for analysis with ClueGO.

106

Results Six strabismus risk factors have been reported16, of which five were successfully matched to HPO terms (Supplementary Table A.3a); “Smoking throughout pregnancy” has no HPO term match. Two terms were not attached to any human genes: “Retinopathy of prematurity” and “Anisometropia”. A set of 119 genes are annotated with the remaining three HPO terms: “Small for gestational age”, “Premature birth”, and “Hypermetropia” (Supplementary Table A.3b). A total of 33 out of 119 genes overlap with P-Strab, of which20 are annotated with “Premature birth” (Supplementary Table A.3c). These overlapped genes support the link of these 3 risk factors in strabismus.

107

A) S-Strab gene set B) HPO-based strabismus gene set

OMIM- OMIM- OMIM- SysID- HPO Database - HPO Database - “strabismus” & “Duane “Fibrosis of “strabismus” phenotype_annotaon.tab build130_ALL_SOURCES_FREQUENCIES Prefixes: +,* (syndrome)” & extraocular 22 genes 136,802 entries _disease_to_genes_to_phenotypes.txt 64 genes Prefixes: +,* muscles” & 135,780 entries 12 genes Prefixes: +,* 4 genes Evidence Code - ICE (individual clinical experience),

PCS (published clinical study),

TAS (traceable author statement)

74,731 entries

95 unique genes Strabismus HPO IDs - Strabismus HPO IDs - HP:0000486, HP:0000487, HP:0000486, HP:0000487, HP:0000565, HP:0000577, HP:0000565, HP:0000577, HP:0010877, HP:0025068, HP:0010877, HP:0025068, HP:0025069, HP:0025312, Manual Inspecon Criteria HP:0025069, HP:0025312, HP:0025313, HP:0008033, - Autosomal Dominant: ≥ 3 paents HP:0025313, HP:0008033, HP:0001137 - Autosomal Recessive, Hemizygous, HP:0001137 721 entries (genes to phenotypes) Compound Heterozygous: ≥ 2 paents 389 entries (diseases) (565 unique genes)

54 unique genes 411 unique genes (56.8%) HP:0000486 - Strabismus HP:0000487 - Congenital strabismus HP:0000565 - Esotropia HP:0000577 - Exotropia Manual Inspecon Criteria HP:0010877 - Unilateral strabismus ≥ 1 paent HP:0025068 - Incomitant strabismus HP:0025069 - Concomitant strabismus HP:0025312 - Esophoria HP:0025313 - Exophoria HP:0008033 - Congenital exotropia 204 unique genes HP:0001137 - Alternang esotropia (49.6%)

Figure A.1 Workflow of compilation and curation of strabismus-associated genes. Based on the (a) OMIM and SysID databases, resulting in the Stringent Strabismus set (S-Strab), and (b) Human Phenotype Ontology (HPO) database, resulting in the less HPO-based strabismus gene set. The combination of the two gene lists is termed the Permissive Strabismus set (P-Strab).

108

Appendix B Supplementary Material for Chapter 3

Methods Population Data BC search We queried for patients born between January 2001 and July, 2014 (age range 4-17) who had received a diagnosis through visiting a physician for one of the three International Classification of Diseases (ICD) diagnostic codes: “Strabismus”, “Mental Retardation”, or “Specific Motor Retardation”. The selected ICD codes best correspond to the following CAUSES categories: strabismus, ID, motor control phenotype respectively.

The prevalence of strabismus was calculated for a Mental Retardation (ID) group and compared to the rest of the BC population. The strabismus prevalence of the ID-motor group was compared to the rest of the ID population with R package “epitools” (default setting of the oddsratio.wald function).

OMIM search The database was accessed on July 10, 2018. The following terms were used to identify the ID entities in the Clinical Synopsis search: "intellectual disability" OR "intellectual impairment" OR "mental retardation" OR "mentally retarded" OR "cognitive delay" OR "cognitive impairment". Strabismus entities were searched with the following terms “strabismus” OR “esotropia” OR “exotropia” OR “hypertropia” OR “hypotropia”.

Figure B.1 WordCloud of CAUSES ID group phenotypic descriptors. Descriptions occur in clinical records of the ID subjects from the CAUSES study. Front sizes correlate with frequencies.

109