MUTATIONAL AND FUNCTIONAL ANALYSES OF KABUKI SYNDROME

DEMONSTRATE CRITICAL ROLES IN CRANIOFACIAL, HEART AND BRAIN DEVELOPMENT

By

PETER MARCEL VAN LAARHOVEN

B.S., University of California, Los Angeles, 2008

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Human Medical Genetics and Genomics Program

2015

This thesis for the Doctor of Philosophy degree by

Peter Marcel Van Laarhoven

has been approved for the

Human Medical Genetics and Genomics Program

by

Kristin B. Artinger, Chair

Tamim H. Shaikh, Advisor

Bruce H. Appel

Paul C. Megee

Jay R. Hesselberth

Date _01/27/2015_

ii

Van Laarhoven, Peter Marcel (Ph.D., Human Medical Genetics and Genomics)

Mutational and Functional Analyses of Kabuki Syndrome Genes Demonstrate Critical Roles

in Craniofacial, Heart, and Brain Development

Thesis directed by Associate Professor Tamim H. Shaikh.

ABSTRACT

Kabuki syndrome (KS) is a rare multiple congenital anomaly syndrome characterized by distinctive facial features, global developmental delay, intellectual disability, and cardiovascular and musculoskeletal abnormalities. Mutations in KMT2D have been identified in a majority of KS patients, and mutations in KDM6A have been identified as a rare cause of KS. Fifty-seven individuals clinically diagnosed with KS were analyzed for mutations in KMT2D and KDM6A, 17 by the group that implicated KMT2D and forty by ourselves. Putative pathogenic mutations were detected in KMT2D in 27 subjects and

KDM6A in 4 subjects. Observed mutations included single nucleotide variations and indels leading to frameshifts, nonsense, missense or splice site alterations in both genes. Custom target enrichment and whole exome sequencing were used to identify new candidate genes.

Whole exome sequencing of five KMT2D and KDM6A mutation negative subjects identified the KMT2D paralog KMT2C as a possible candidate . To elucidate the functional roles of

KMT2D and KDM6A, we knocked down the expression of their orthologs in zebrafish.

Following knockdown of and the two zebrafish paralogs kdm6a and kdm6al, we analyzed morphants for developmental abnormalities in tissues that are affected in individuals with KS, including craniofacial structures, heart and brain. The kmt2d morphants exhibited severe abnormalities in all tissues examined. Although the kdm6a and kdm6al morphants had similar brain abnormalities, they differed in their roles in craniofacial and cardiac development. The kdm6a morphants exhibited craniofacial abnormalities, while kdm6al morphants had prominent defects in heart development. Our

iii

results provide further support for the roles of KMT2D and KDM6A in KS by identifying additional cases with KMT2D and KDM6A mutations. Furthermore, we have used a vertebrate to provide direct evidence for the role of KMT2D and KDM6A in the development of organs and tissues affected in KS patients.

The form and content of this abstract are approved. I recommend its publication.

Approved: Kristin B. Artinger

iv

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my amazing wife, Mary Van Laarhoven.

Without your love and support I would have never gotten where I am today. You are the reason I have pushed through when things seemed impossible. You make me strive to be a better, smarter and stronger person. I would also like to thank Leif Neitzel, for his contributions to the zebrafish project and for sharing his knowledge in the field of developmental biology. I would also like to thank James Yu for all of his help with the next- generation sequencing project and for his advice. I want to thank Elizabeth Geiger for keeping everything running smoothly and for answering all of the hundreds of questions I have asked over the years. Thank you Tamim for pushing me to be a better scientist.

v

CONTENTS

CHAPTER

I. BACKGROUND ...... 1

Kabuki Syndrome ...... 1

History and Phenotype ...... 1

Genetics of Kabuki Syndrome ...... 6

Structure and Function of KS Genes ...... 9

The KMT2 Gene Family of Histone Methyltransferases ...... 9

KMT2D Structure and Function ...... 10

The KDM6 Gene Family of Demethylases ...... 13

KDM6A Structure and Function ...... 13

The KMT2D Complex ...... 15

The Role of KS Genes in Development ...... 16

II. ANALYSIS OF THE GENETIC BASIS OF KABUKI SYNDROME ...... 17

Introduction ...... 17

Background ...... 17

Mutation Screen of KMT2D and KDM6A ...... 18

Identification of KS Candidate Genes ...... 19

Materials and Methods ...... 20

Sanger Sequencing of KMT2D and KDM6A in KS Subjects...... 20

Custom Capture of Potential KS Candidate Genes and Next-Generation Sequencing ...... 21

Whole Exome Sequencing ...... 24

Results ...... 25

Sanger Sequencing Analysis of KMT2D and KDM6A ...... 25

In-silico Analysis of Missense Variants in KMT2D and KDM6A ...... 27

vi

Analysis Pipeline for Next-generation Sequencing Data ...... 32

Custom Capture and Next-Generation Sequencing ...... 37

Exome Sequencing ...... 38

Discussion ...... 41

KMT2D and KDM6A Mutations in KS ...... 41

KMT2D and KDM6A Mutations are Not Identified in All KS Subjects ..... 43

Custom Capture and Sequencing of Potential KS Candidate Genes ...... 43

Whole Exome Sequencing ...... 44

Identification of a Novel KS Candidate Gene...... 45

III. THE ROLES OF KMT2D AND KDM6A IN DEVELOPMENT ...... 48

Introduction ...... 48

Known Roles of KMT2D and KDM6A Relevant to Development ...... 48

The Zebrafish as a Model Organism for Studying the Genetics of MCA Disorders ...... 49

Conservation of KS Genes in Zebrafish ...... 49

Morpholino Knockdown of Genes Implicated in Kabuki Syndrome ...... 50

Stable Gene Knockout Strategies ...... 51

The Role of Retinoic Acid Signaling in Development ...... 52

Materials and Methods ...... 53

Zebrafish Transgenic Strains and Husbandry ...... 53

In-situ Hybridization Probes...... 53

Whole Mount In-situ Hybridization ...... 54

Antisense Morpholino and mRNA Injections ...... 55

Phenotyping of Morphants ...... 56

Hematoxylin and Eosin Staining ...... 57

Immunofluorescence ...... 57

vii

Validation of Splice-site Morpholino Effects ...... 58

Imaging ...... 59

Zinc-finger Nucleases ...... 59

CRISPR/Cas9 Nucleases ...... 59

Incubation of Morphants in Exogenous All-trans Retinoic Acid ...... 60

Results ...... 60

Whole mount In-situ hybridization of KS Genes ...... 60

Morpholino Effects on mRNA Transcripts ...... 61

Analysis of Craniofacial Development in Morphants ...... 61

Analysis of Neural Crest Cell Specification in Morphants ...... 65

Analysis of Cranial Neural Crest Cell Differentiation in Morphants ...... 65

Analysis of Heart Development in Morphants ...... 66

Analysis of Brain Development in Morphants ...... 69

Analysis of Neural Progenitor Cell Specification ...... 70

Mutagenesis of kmt2d, kdm6a and kdm6al ...... 70

Effects of Exogenous Retinoic Acid on Morphant Development ...... 74

Discussion ...... 75

Roles of KS Genes in Development ...... 75

Malformations in KS Implicate Neural Crest Cell Derivatives ...... 77

Defects in Neuronal Development ...... 78

Effects of Exogenous Retinoic Acid on Morphant Development ...... 78

IV. CONCLUSIONS ...... 80

The Etiology of Kabuki Syndrome...... 80

The Genetic Basis of Kabuki Syndrome ...... 80

Developmental Roles of KS Genes ...... 82

viii

Future Directions ...... 83

Mechanisms of Neural Crest Cell Defects Caused by KS Gene Knockdown ...... 83

Effects of KS Gene Knockout on Neuronal Outgrowth and Pruning ...... 85

Significance ...... 86

REFERENCES ...... 88

APPENDIX

A. CLINICAL MANIFESTATIONS OF KABUKI SYNDROME FROM 14 STUDIES ...... 100

B. KDM6A AMPLIFICATION AND SEQUENCING PRIMERS ...... 102

C. KMT2D AMPLIFICATION AND SEQUENCING PRIMERS...... 103

D. ANALYSIS PIPELINE WITH COMMANDS ...... 104

E. ZINC-FINGER NUCLEASE INSERT SEQUENCES ...... 108

ix

TABLES

TABLE

1. Clinical Features of Kabuki Syndrome ...... 2

2. Chromosomal Abnormalities Reported for KS Patients ...... 7

3. KDM6A Variants Detected by CNV Analysis and Sanger Sequencing ...... 18

4. Genes Targeted by Custom Capture ...... 21

5. Spectrum of KMT2D and KDM6A Mutations in our Cohort of 57 KS Subjects ...... 28

6. Mutations in KMT2C Identified by Exome Sequencing ...... 40

7. Validated Missense Variants Detected by Exome Sequencing ...... 41

x

FIGURES

FIGURE

1.1 Characteristic facies present in subjects with Kabuki syndrome ...... 2

1.2 Abnormal MRI demonstrates the variable brain malformations identified in subjects with Kabuki syndrome ...... 3

1.3 Common cardiac defects present in Kabuki syndrome ...... 5

1.4 The targets of each of the lysine methyltransferases and demethylases ...... 11

1.5 Phylogeny and domains of the KMT2 family of ...... 11

1.6 Phylogeny and domains of the KDM6 family of proteins ...... 14

2.1 Deletions on X detected in two patients ...... 17

2.2 Overlapping RNA probe design for Agilent custom DNA capture ...... 22

2.3 Bioanalyzer analysis of sheared genomic DNA from subject 56 demonstrating the expected peak near 200bp ...... 23

2.4 Bioanalyzer analysis of the final capture sequencing library from subject 56 using the High Sensitivity DNA chip ...... 24

2.5 Determination of Indel sequences ...... 26

2.6 Spectrum of deleterious mutations in subjects diagnosed with Kabuki syndrome ...... 32

2.7 Empirical quality score vs. reported quality score before and after base quality score recalibration in subject 24 ...... 34

2.8 Distribution of reported quality scores before and after base quality score recalibration ...... 35

2.9 Effects of BQSR on quality scores by cycle ...... 35

2.10 Custom capture and 36 basepair Illumina sequencing failed to identify the known mutation in KMT2D ...... 38

2.11 An ideogram of KMT2C variants displayed in the UCSC genome browser ...... 40

3.1 In-situ hybridization demonstrating kmt2d, kdm6a and kdm6al expression ...... 61

3.2 Validation of splice-site morpholino effects ...... 62

xi

3.3 Craniofacial defects observed in morphant zebrafish embryos ...... 64

3.4 Expression of in 48 hpf wild-type and morphant embryos ...... 65

3.5 Expression of in 48 hpf embryos ...... 66

3.6 Defects in heart looping observed in morphant zebrafish embryos ...... 68

3.7 Defects in brain morphology are observed in morphant zebrafish embryos, and partially rescued by coinjection of human KDM6A mRNA ...... 71

3.8 Knockdown of kmt2d, kdm6a, and kdm6al leads to mild impairments in hindbrain development ...... 72

3.9 Defects in NPC differentiation are observed in morphant zebrafish embryos, and partially rescued by coinjection of human KDM6A mRNA ...... 73

3.10 Expression of at 14 hpf in wild-type and morphant embryos ...... 74

3.11 A typical kdm6al morphant ...... 75

xii

CHAPTER I

BACKGROUND

Kabuki Syndrome

History and Phenotype

Kabuki syndrome (KS, OMIM # 147920) is a rare, autosomal dominant multiple congenital anomaly (MCA) syndrome. The disorder was first described by Norio Niikawa and Yoshikazu Kuroki in separate publications in 1981, leading to an alternative name for

KS, Niikawa-Kuroki syndrome (1,2). Prevalence was estimated by the authors at 1 in

32,000 live births (3). The cardinal features of this disorder are a distinctive facies, mild to moderate intellectual disability, skeletal anomalies, dermatoglyphic abnormalities, and postnatal dwarfism (3). Aside from these classic features that aid in diagnosis, there are a wide variety of additional features that may be present in KS (Table 1).

The distinctive facies common to all patients consists of long palpebral fissures with eversion of the lower lateral lids, high arched eyebrows with sparse growth along the distal third, low-set and prominent ears, a shortened nasal septum and depressed nasal tip

(Figure 1.1) (3,4). Ophthalmologic features may include strabismus, coloboma, ptosis and blue sclera (5). Structural defects of the midface include orofacial and dental abnormalities such as cleft lip and palate, high arched palate, hypodontia, malocclusion, cross-bite, maxillary recession and mid-facial hypoplasia (6,7).

Intellectual capacity is diminished in KS, with clinical intellectual disability (ID) noted in 92% of patients (3). The highest reported IQ of 83 suggests that KS patient IQ occupies a range that is collectively reduced compared to the typical population. Brain abnormalities are often reported, which may indicate that defects in brain development are involved in the etiology of ID. Microcephaly is the most commonly reported abnormality,

1

Table 1. Clinical Features of Kabuki Syndrome

Adapted with permission (4).

Figure 1.1. Characteristic facies present in subjects with Kabuki syndrome. Figure adapted with permission (4).

2

Figure 1.2. Abnormal MRI demonstrates the variable brain malformations identified in subjects with Kabuki syndrome. The abnormalities include: atrophy of the cerebellum and brainstem (A), polymicrogyria (B) and chiari I malformation (C, D). Figures adapted with permission (8–10).

3

affecting approximately 25% of patients (4). The cutoff for a diagnosis of microcephaly is a head circumference more than two standard deviations below the mean. In a cohort of 62 patients, the average head circumference was -0.71 SD below the mean. The authors

concluded that head size was normal, but this seems more indicative of a reduced head

circumference as a universal KS phenotype that fails to meet the cutoffs for clinical

significance. Structural brain abnormalities have also been reported, including

polymicrogyria, chiari I malformation and atrophy of the cerebellum and brainstem (Figure

1.2) (8–11). The lack of commonality between these malformations suggests that there may be a general dysregulation of neuronal development that manifests differently for each individual. Epilepsy and hypotonia are also are also findings that have a neurological basis,

(3,12).

Skeletal abnormalities are a cardinal phenotype of KS, yet their presentation may vary considerably. Postnatal dwarfism is common, with the height of most patients falling under two standard deviations from the mean (4). Both hands and feet may be affected, with short, stubby fingers and metacarpals or clinodactyly commonly reported (13).

Vertebral malformations reported include spina bifida, sagittal clefting, scoliosis and

structural rib and vertebral malformations. Joint laxity is very common, which increases susceptibility to hip and knee dislocations (13).

Visceral abnormalities are present in the majority of subjects. The most frequent

are cardiac defects present in up to 55% of patients. These most commonly consist of atrial

or ventricular septal defects and coarctations of the aorta (Figure 1.3) (4). Less frequent

malformations include tetralogy of Fallot, single ventricle with common atrium, patent ductus arteriosus, transposition of great vessels and valve defects (3,12,14,15). These malformations can be life threatening, and often must be corrected with surgery.

4

One of the more unusual features is retention of fetal fingertip pads, which is present along with other dermatoglyphic abnormalities such as increased ulnar loops and reduced palmar creases (3). Additional visceral abnormalities include horseshoe kidney and genitourinary malformations including cryptorchidism, micropenis and imperforate anus (3,13,15). Patients suffer from a number of complications including failure to thrive,

recurrent otitis media, hearing loss, abnormal hormone levels, obesity, and susceptibility to

infection (3,15). An analysis of immunologic function in 14 KS subjects identified

reductions in memory CD19+ and CD4+ cells, as well as reduced serum levels of IgA and IgG

(16). It is suspected that immunological abnormalities are linked to the increased risk of recurrent otitis media, although structural inner ear defects may also play a role.

Autoimmune disorders have been reported, including idiopathic thrombocytopenia

purpura, autoimmune hemolytic anemia, hypogammaglobulinemia and vitiligo (12,16,17).

A comprehensive table of KS phenotypes can be found in Appendix A.

Figure 1.3. Common cardiac defects present in Kabuki syndrome. RA, Right Atrium; RV. Right Ventricle; LA, Left Atrium; LV, Left Ventricle; SVC, Superior Vena Cava; IVC, Inferior Vena Cava; MPA, Main Pulmonary Artery; Ao, Aorta; TV, Tricuspid Valve; MV, Mitral Valve. Images are in the public domain under the Creative Commons License by the Centers for Disease Control and Prevention.

5

Genetics of Kabuki Syndrome

Kabuki syndrome was thought to be predominantly caused by sporadic mutations, although anecdotal reports by clinicians have noted similar KS-like features in the parents of affected children. An examination of photographic evidence from previous studies that included a father and his affected son and identical twins concordant for the disorder determined that KS is probably caused by a dominant mutation (15). It is notable that these patients with presumably identical mutations had very different clinical presentations. It was suspected that chromosomal abnormalities encompassing multiple genes were responsible for the KS phenotype, based on both the phenotypic variability and the large number of seemingly unrelated anomalies associated with the syndrome. Initial genetic studies used cytogenetics and microarrays to identify causal chromosomal abnormalities.

Milunsky and Huang reported that a microduplication of 8p22–8p23.1 was responsible for

KS in 2003 (18). This report was based on array comparative genomic hybridization

(array-CGH) and fluorescent in-situ hybridization of a bacterial artificial chromosome.

Validation of the 8p22–8p23.1 microduplication by other clinical laboratories was unsuccessful, prompting a reexamination of the data by the authors in 2008. They identified issues with the methods used, and retracted their data as unsubstantiated (19). A total of six additional autosomal chromosomal abnormalities have been associated with KS, including a complex transversion/duplication/deletion between 6 and 12, paracentric inversion of 4p, a balanced translocation between 15q15 and 17q21, pseudodicentric chromosome 13, a duplication of 1p13.1-p22.1 and a balanced translocation between 3p25 and 10p15 (Table 2) (20–25). None of these autosomal abnormalities had overlapping loci to suggest a causative gene or region of interest. The number of sex chromosome abnormalities reported is much more intriguing, with three reported ring X abnormalities, a ring-Y abnormality, Y-chromosome inversion, X-Y balanced

6

translocation and mosaic loss of the X-chromosome (3,26,27). The sex chromosome

abnormalities identified suggested that a pseudoautosomal or homologous region of chromosome X and Y may be involved in the etiology of KS. Despite years of hints into the cause of KS, a compelling genetic basis was not identified until 2010.

In October of 2010, Ng et al published a landmark paper demonstrating the first use of exome sequencing to identify the gene responsible for a dominant disorder when they identified KMT2D (previously MLL2, ALR or MLL4) as the gene responsible for the majority of cases of KS (28). Nine months earlier, they were the first to identify the genetic basis of

Miller syndrome, an autosomal recessive disorder (29). In the KS study, exome sequencing of a discovery cohort of 10 subjects was performed, and novel variants shared between subjects were compared. They identified loss-of-function mutations in KMT2D in 7/10 subjects. PCR amplification and Sanger sequencing of KMT2D in the remaining 3 subjects

Table 2. Chromosomal Abnormalities Reported for KS Patients

Adapted with permission (30).

7

identified KMT2D mutations in two of the subjects that were missed by the exome

sequencing. They also sequenced KMT2D in 43 additional KS subjects by Sanger sequencing and identified 26 mutations in these subjects. Of the 35 total KMT2D mutations, one was detected in two different individuals and two were passed from parent to child. Of the distinct variants, 20/32 (62%) were nonsense mutations, 7/32 (22%) were indels leading to a frameshift and 5/32 (16%) were missense mutations. The majority of these variants

(88%) can be confidently inferred as loss-of-function mutations, indicating that either a dominant-negative effect or haploinsufficiency are responsible for the KS phenotype.

A follow-up study was carried out to identify the KMT2D mutation status of each individual in a cohort of 110 KS subjects, and to identify any correlations between mutation status and the many features associated with KS. This cohort consisted of 53 previously reported subjects and 57 unreported subjects (31). We contributed 17 unreported subjects from our cohort that best fit the classic KS presentation to the study. KMT2D mutations were identified in 81/110 cases (74%), including two cases of direct transmission of the disease from parent to child, supporting the dominant inheritance model. The phenotype- genotype correlation was only mildly informative. The only statistically significant difference between KS subjects with and without KMT2D mutations was that renal abnormalities were present in a higher percentage of subjects with a KMT2D mutation

(47% vs. 14%) (31). Additional studies have detected KMT2D mutations in 56-76% of patients (28,31–34).

In 2012, Lederer et al. identified pathogenic deletions of KDM6A (formerly UTX) in three patients (35). The first two subjects had 284kb and 816kb deletions that overlapped at the KDM6A and CXorf36 genes, and the third subject had a 45kb intragenic deletion of exons 5-9 of KDM6A. These subjects had the classic KS phenotype. Subsequent reports

8

have identified single-nucleotide polymorphisms (SNPs) in KDM6A in 6-14% of KS subjects

(36–38).

A significant percentage of KS subjects do not have recognizable mutations in

KMT2D or KDM6A. There are three likely explanations for the failure to identify causative mutations in these subjects. The first is that the remaining subjects have a disorder with phenotypic overlap with KS and were misdiagnosed. This is always possible because an inherent hazard of working with clinical samples from a complex MCA disorder is that the accuracy of the diagnosis is contingent upon the skill of the clinician and their familiarity with the disorder. Another possibility is that the causative mutations in KMT2D were not identified. Both the exome and Sanger sequencing methods did not investigate noncoding regions, and it is possible that mutations in gene regulatory elements could be causative in some cases. - thalassemia, Mutationshemophilia in and promoter atherosclerosis elements (39) are .responsible A third possibility for some is diseasesthat the geneticsuch as β basis of KS has not been entirely determined, and there are additional genes involved in the etiology of KS. Binding partners or cofactors of KMT2D and KDM6A are all potential candidates. Identification of the remaining genes responsible for KS became a priority in a number of clinical laboratories.

Structure and Function of KS Genes

The KMT2 Gene Family of Histone Methyltransferases

KMT2D is a member of the Lysine Methyltransferase 2 (KMT2) family of proteins that catalyze the post-translational addition of methyl-groups to lysine 4 on histone H3

(H3K4). KMT2 proteins are part of a large family of proteins that contain a SET domain, named for the three Drosophila melanogaster genes that share this domain: suppressor of variegation 3-9, enhancer of zeste, and trithorax. One, two or three methyl groups can be progressively added to H3K4 by the KMT2 proteins, and each of these states acts as a

9

distinct mark as part of the comprehensive “histone code,” regulating transcription through recruitment of enhancer or repressor complexes that affect chromatin dynamics and recruitment of transcription factors (Figure 1.4). Monomethylation of H3K4 (H3K4me1) is associated with enhancer regions, H3K4me2 is often found in the body of actively transcribed genes and H3K4me3 is often found at promoters (40).

The KMT2 proteins are related to Set1 of Saccharomyces cerivisiae, and trithorax and trithorax related of Drosophila melanogaster. There are 7 members of the KMT2 family of methyltransferases in humans, KMT2A-G. Six of these genes are closely related, with the highest sequence identity between three pairs of genes: KMT2A (MLL) and KMT2B (MLL4),

KMT2C (MLL3) and KMT2D (MLL2), and KMT2F (SETD1A) and KMT2G (SETD1B) (Figure

1.5). KMT2E is distantly related to the rest of the group (41). KMT2A-D are responsible for regulation of distinct target genes during development, yet knocking down any of the

KMT2A-D genes does not change global methylation status. The bulk of H3K4 methylation is carried out by KMT2F and KMT2G.

KMT2D Structure and Function

The KMT2D gene spans 36kb of 12q12-q13 and contains 54 exons. The transcript is

19kb in length and is transcribed into a huge consisting of 5537 amino acids and a molecular weight of 593 kilodaltons (42). KMT2D has a complex domain architecture

(Figure 1.5). The C-terminal SET-domain is conserved among the majority of the lysine methyltransferases and is the enzymatic domain, requiring the cofactor S-adenosyl methionine as a methyl-donor (43). The tandem PHD fingers 4-6 have been shown to bind specifically to either unmethylated or asymmetrically dimethylated arginine 3 on histone

10

Figure 1.4. The targets of each of the histone H3 lysine methyltransferases and demethylases. The effect that each of these methylation marks has on transcription are identified by the colored dots above each modified amino acid. A color coded key is provided at the bottom. Used with permission (47).

Figure 1.5. Phylogeny and domains of the KMT2 family of proteins. Adapted with permission (41).

11

H4 (46). Two LXXLL motifs near the C-terminus of KMT2D allow direct binding of ligand-

(48). signaling molecules activateddifferentiates estrogen KMT2D from other α (ERα) members Activation of the KMT2 by faERαmily. and other

Knockdown of KMT2D results in reduced expression of its target genes, many of

which are implicated in developmental processes (49–51). Short hairpin RNAs were used to knock down KMT2D in HeLa cells, and microarray analysis of total mRNA isolated from these cells was used to determine genes that were regulated by KMT2D (49). The top 20 downregulated genes had functions in extracellular matrix dynamics, cell polarization and cell adhesion, as well as regulation of proliferation, muscle development and immune response. KMT2D was knocked out in HCT116 colorectal cancer cells and both KMT2D binding sites and H3K4me3 patterns were determined by chromatin immunoprecipitation and next-generation sequencing (ChIP-seq) (50). By combining this data with expression arrays, they were able to gather a global picture of KMT2D binding, methyltransferase activity and sites of transcriptional regulation. They determined that loss of KMT2D leads to moderate downregulation of 228 genes, which correlated well with sites of reduced

H3K4me3. They used Ingenuity Pathway Analysis to determine the pathways that are regulated by KMT2D. Developmentally relevant pathways highlighted by this analysis included -catenin signaling and cardiac differentiationretinoic. Defects acid receptor in any of (RAR) these activation, signaling pathwaysWNT/β could potentially lead to developmental defects. In this study there was no significant binding at HOX gene clusters, probably due to use of a cancer cell line. They also found KMT2D poised at 2060 sites, most of which were not differentially transcribed by the loss of KMT2D. They acknowledged that the transcription profile would likely change under different cellular contexts, especially application of exogenous signaling molecules. The authors noted an overlap between the locations of KMT2D binding sites and retinoic acid responsive elements (RAREs). They

12

applied retinoic acid (RA) to normal HCT116 and the KMT2D knockout lines. This resulted in increased transcription of the RA responsive gene ASB2 only in the non-KMT2D deficient cell line. In a placental chorionic carcinoma cell line, KMT2D and KMT2C were determined to bind to the promoter of HOXC6 and regulate its transcription in the presence of RA (52).

were both required for the binding of the KMT2C/D proteins.

Thus,Estrogen reduced receptors transcription α and β of RA-responsive genes due to KMT2D mutations may be the cause of some of the developmental defects present in KS.

The KDM6 Gene Family of Demethylases

The lysine demethylase 6 (KDM6) family of proteins contains three members,

KDM6A (UTX), KDM6B (JMJD3) and KDM6C (UTY) (Figure 1.6). These proteins selectively demethylate lysine 27 of histone H3 (H3K27), and removal of this epigenetic mark is associated with increased gene transcription (53). KDM6A is located on the X-chromosome, while KDM6C is the Y-chromosome paralog. Until recently, KDM6C was thought to be enzymatically inactive, but it has recently been demonstrated that the demethylase activity of KDM6C is detectable at low levels (54). The KDM6 proteins contain a Jumonji C domain that confers demethylase activity, requiring the cofactors -ketogluterate

(53). The H3K27me3 epigenetic mark is established by EZH2,iron (Fe[II])a subunit and of αthe polycomb repressive complex 2 (PRC2) and is associated with chromatin compaction, leading to gene silencing and X-inactivation. The KDM6 family of demethylases antagonize PRC2 function, and are thus involved in opening of chromatin and increased transcription.

KDM6A Structure and Function

The KMD6A gene is also quite large, although it is dwarfed by KMT2D. Its genomic length is nearly 240kb at Xp11.3 and contains 29 exons. The 5438bp transcript codes for a

154 kilodalton protein (42). The domain architecture is simpler than that of KMT2D, containing tetratricopeptide repeats that are implicated in protein-protein interactions, a

13

treble-clef zinc-finger domain that confers binding specificity to histone H3K27, and the catalytic Jumonji C demethylase domain (55). Aside from its enzymatic activity, KDM6A has also been found to associate with SWI/SNF chromatin remodeling complexes and the

H3K27 histone acetyltransferase Creb-binding protein (56,57). Acetylation of H3K27 by

CBP is associated with actively transcribed genes (57). This suggests that KDM6A may work cooperatively with these remodeling proteins, linking demethylation of H3K27me3 to

H3K27 acetylation and chromatin decompaction (58).

Figure 1.6 Phylogeny and domains of the KDM6 family of proteins. Homology between KDM6A and KDM6C, and homology between the Jumonji C domains of KDM6A and KDM6B are shown. Percent similarity is represented first, with percent identity in parentheses. Adapted with permission (53).

KDM6A activity has been implicated in a number of developmental processes. Like

KMT2D, it has been shown to be important in regulating expression of Hox genes. Lan et al. knocked down KDM6A in HeLa cells, resulting in reduced transcription of the posterior

HoxD cluster genes HOXD10, HOXD11 and HOXD12 (59). They also detected an increase in

H3K27me3 at these gene locations. Copur and Jürg used whole-mount in-situ hybridization of D. melanogaster larvae to determine that Utx is essential for proper patterning of the Hox genes Ubx, and Abd-B (60). They also determined that complete knockout of the Utx gene is lethal by early adulthood in D. melanogaster. The embryonic lethality was delayed due to

14

the presence of maternally deposited Utx protein. When Utx knockout embryos were generated without the maternal contribution, embryos died prior to the third larval stage.

Fertilization of Utx knockout eggs with wild-type sperm was able to rescue lethality, but a number of these embryos displayed homeotic transformations. This is important in the implications for KS because they showed that a complete knockout is lethal, but haploinsufficiency leads to developmental defects.

The KMT2D Complex

Members of the KMT2 family form large multiprotein complexes. KMT2D was initially identified in a protein complex by a coimmunoprecipitation (CoIP) experiment using against NCOA6 (previously ASC-2), a transactivating protein known to associate with ligand-bound nuclear receptors at distinct DNA sequences known as hormone response elements (61). The protein complex was called ASCOM (ASC-2

Complex), although more recent reports refer to it as the KMT2D complex. A later experiment expanded upon the complex constituents using CoIP with antibodies directed against KMT2D. This experiment determined that KMT2D forms a complex with KDM6A,

- -

RBBP5,tubulin (49)DPY30,. Interestingly, WDR5, MATR3, KMT2C ASH2L, was NSD1,detected EVI1, as partPAXIP1, of the NCOA6, original PAGR1 ASCOM and complex α and .β It was not detected when KMT2D antibodies were used for the CoIP, suggesting that KMT2C and KMT2D form separate, but similar complexes. The other KMT2 proteins also form complexes that share some of these subunits, but they have differences that contribute to their unique roles in gene regulation. One of the unique components of the KMT2A/B complexes is Menin, a tumor suppressor that is required for activation of many HOX genes

(62). KMT2F/G complexes contain WDR82, which is required for recruitment of the complexes to transcription start sites (63). One of the subunits shared between each of these complexes is WDR5, which appears to be a scaffolding protein that is required for the

15

integrity of the complex (64). KMT2C/D complex proteins that are unique include KDM6A,

NCOA6 and PAXIP1. -Catenin signaling by binding to the transcriptionPAXIP1 factor links thePITX2 KMT2C/D (65). PAXIP1 complex may to alsoWNT/β play non-redundant roles in immunoglobulin class-switching. As part of the KMT2C/D complex, it is required for transcription of Igh- - transcription to openγ2b, surrounding Igh γ3, two chromatin immunoglobulin so that activation heavy chain-induced components cytidine that deaminase require may cleave at switch regions. PAXIP1 is also a constituent of the complex that repairs the double-stranded break after recombination (66). It is unlikely that mutations in those

KMT2D complex constituents that are also present in other functional complexes will be identified in KS subjects. This is because their loss would affect additional cellular processes and would likely cause a more severe phenotype. Therefore the strongest candidates for additional KS genes are those complex constituents that are unique to the

KMT2D complex.

The Role of KS Genes in Development

Although the KS genes KMT2D and KDM6A are well studied in the context of cancers, their roles in developmental processes are largely unexplored. This is an important distinction because cancers are caused by abnormal regulation of cellular processes and are far removed from the original context of the gene’s activity. To understand the nuanced roles of these genes it is important to study them in the context of development and maintenance of the organism as a whole. The defects caused by their loss indicate that they play important roles in craniofacial, heart and brain development, but provide very little mechanistic insight. Although the physical and neuronal abnormalities associated with KS are established during development and for the most part cannot be alleviated by medical intervention, we may find that some aspects of the disorder may be amenable to treatment, which will require a better understanding of the etiology of the disorder.

16

CHAPTER II

ANALYSIS OF THE GENETIC BASIS OF KABUKI SYNDROME

Introduction

Background

A cohort of fifty-seven KS patients was collected through clinical collaborations at the Children’s Hospital of Philadelphia. It was suspected that KS is caused by microdeletions involving several contiguous genes due to the wide range of phenotypic

manifestations. To identify potential copy number variations (CNVs) in KS subjects, high

resolution copy number analysis was performed. This analysis identified microdeletions on

chromosome X in two female subjects. Subject 1 had a 3.2 megabase (Mb) deletion at

ChrXp11.3 (chrX:43620636-46881568) involving 13 genes, and subject 2 had a 2Mb

deletion (chrX:43620636-45642604) involving 7 genes nested within the larger deletion

present in the other patient (Figure 2.1). The deletions overlapped in a region containing the genes MAOB, NDP, EFHC2, FUNDC1, DUSP21, CXorf36, and KDM6A. No other copy

number variations of clinical significance were detected in any of the remaining subjects.

Figure 2.1. Deletions on Chromosome X detected in two patients. An ideogram of chromosome X is shown, highlighting a region of Xp11.3-p11.23 with a red box. An expanded view of this region is shown below including the genes and chromosomal coordinates. The microdeletions detected in subjects 1 and 2 are shown as black bars.

17

Sanger sequencing of the genes within the commonly deleted region in 15 additional patients identified one KDM6A variant in subject 3, a splice-site mutation resulting in the replacement of the consensus +1 “GT” splice donor of exon 25 with a “GC” that is predicted by the MaxEntScan algorithm to establish a significantly weaker splice-site (Table 3) (67).

Splice donor mutations of this type typically result in a retained intron in the processed mRNA transcript (68). KDM6A was an ideal candidate gene based on its known roles in processes related to development, and this additional mutation solidified its status as a candidate KS gene.

Table 3. KDM6A Variants Detected by CNV Analysis and Sanger Sequencing Gene Chromosomal Effect at cDNA Effect at Protein Proband M / F Affected Location (hg19) Level Level

chrX:43620636- 1 F KDM6A Microdeletion Haploinsufficiency 46881568

chrX:43620636- 2 F KDM6A Microdeletion Haploinsufficiency 45642604

3 M KDM6A chrX:44949177 c.3736+2T>C Splice Site

Only three of the subjects could be explained by KDM6A mutations. To identify other KS candidate genes, one subject that did not have any detectable variants in KDM6A was subjected to exome capture and high-throughput next-generation sequencing. No candidate genes were identified, and it was not until years later that the reason became apparent. After the discovery that mutations in KMT2D are the major cause of KS, the data was reexamined, and it was realized that coverage of the KMT2D gene was very poor in the original exome capture library.

Mutation Screen of KMT2D and KDM6A

Upon joining the lab, my first project was a genetic analysis of the cause of KS in our cohort of KS subjects. The first reports identifying mutations in KMT2D as a major cause of

18

KS had recently been published (28). Seventeen subjects that had a high-confidence diagnosis of KS were contributed for candidate gene sequencing by Hannibal et al. (2012)

(31). In the remaining subjects we had evidence that mutations in KDM6A are responsible for KS in three of them, a novel discovery at the time. I sequenced the KMT2D gene in the remaining cohort of 40 subjects to determine both the number of KMT2D mutations present and the types of mutations that predominate in our KS cohort. This was done to determine if the majority of subjects would contain loss-of-function variants as was reported by Ng et al. The KDM6A gene had been sequenced in 15/40 of the subjects, so the remaining 25 were subjected to Sanger sequencing as well to strengthen our hypothesis that KDM6A mutations represent a second causative gene in the etiology of KS.

Identification of KS Candidate Genes

Mutations in KMT2D have been shown to be the cause of 44-76% of the cases of KS

(31–33,69). With recent evidence that KDM6A mutations account for 5-10% of cases, there is still a substantial number of KS subjects that have no known genetic cause. To identify new candidate genes for KS, a next-generation sequencing strategy was developed. This strategy had already been effectively used in KS, but a number of factors had to be considered. One of the advantages in the original exome screen was that the main cause of

KS can be considered the “low hanging fruit,” in that it will be mutated in the majority of samples. It is unknown if there are only the two previously identified genes or many related genes involved in KS. Because there may be multiple genes responsible, we developed a method to potentially sequence all of the patients that were negative for KMT2D or KDM6A mutations. The presence of both known KS candidates in a single protein complex suggested that mutations in other complex constituents may contribute to the etiology of KS as well.

19

Materials and Methods

Sanger Sequencing of KMT2D and KDM6A in KS Subjects

All human subject samples were collected after obtaining informed consent, and subjects were enrolled into an IRB approved research protocol at the Children’s Hospital of

Philadelphia. Inclusion of subjects was contingent on an initial diagnosis of KS by clinicians with expertise in identifying MCA disorders. Blood samples were drawn from each of the patients and processed by members of the Shaikh lab. DNA was extracted from peripheral blood lymphocytes using a commercially available kit (Qiagen, Valencia, CA).

Using subject derived DNA as a template, PCR amplification of the KMT2D and

KDM6A exons was performed. PCR Primers from prior sequencing of KDM6A were utilized to amplify all 29 exons of KDM6A (Appendix B). PCR primers to amplify all 54 exons of

KMT2D were provided by Kati Buckingham of the University of Washington (Appendix C)

(28). PCR amplification of KMT2D required a minimum of 22 PCR amplifications and 98 sequencing reactions for each of the 40 subjects. Amplification of KDM6A required 27 PCR amplifications and 54 sequencing reactions for each of the 25 subjects that had not undergone mutation analysis. PCR reactions were carried out in 96-well PCR plates on the

Veriti 96-well thermal cycler (Applied Biosystems, Foster City, CA). PCR products were quantified on a NanoDrop 2000c spectrophotometer (Thermo-Fisher, Waltham, MA), and normalized to 20ng/µL for sequencing. Amplicons were purified using SPRI beads and sequenced in both directions on an ABI PRISM 3730xl using BigDye Terminator v3.1 chemistry by Beckman-Coulter Genomics (Danvers, Massachusetts). Sequences were analyzed using CodonCode Aligner (CodonCode Corporation, Centerville, MA). To increase accuracy, variants with low coverage or low sequence quality were reamplified by PCR using newly designed primers and resequenced.

20

Custom Capture of Potential KS Candidate Genes and Next-Generation Sequencing

A custom exome capture was designed using the Agilent eArray online tool to enrich for the coding sequences of the most likely candidate genes for causing KS. The list of potential candidates included all of the members of the KMT2D complex, the related SET- domain containing proteins, and any other potential binding partners of KMT2D (Table 4).

A total of 60 genes was included in the custom capture. BioMart was used to create a .bed file containing the genomic intervals of the exons from each of the candidate genes (42).

This file was used in the Agilent eArray design tool to design overlapping 120mer probes for the capture kit (Figure 2.2). In total, the probes covered 2.8Mb, with repeat regions masked out to avoid difficult to map sequence. The biotinylated RNA probes were synthesized on a solid support and enzymatically sheared in solution to allow a completely solution based capture.

Table 4. Genes Targeted by Custom Capture Members of the KMT2D Complex KMT2D KDM6A WDR5 PAXIP1 RBBP5 PAGR1 EVI1 NSD1 ASH2L DPY30 NCOA6 MATR3 KMT Family Genes KMT2A EHMT2 PRDM7 PRDM16 SETDB1 SUV39H1 KMT2B EZH1 PRDM8 SETD2 SETDB2 SUV39H2 KMT2C EZH2 PRDM9 SETD3 SETMAR SUV420H1 KMT2E PRDM1 PRDM11 SETD4 SMYD1 SUV420H2 KMT2F PRDM2 PRDM12 SETD5 SMYD2 WHSC1 KMT2G PRDM4 PRDM13 SETD6 SMYD3 WHSC1L1 ASH1L PRDM5 PRDM14 SETD7 SMYD4 EHMT1 PRDM6 PRDM15 SETD8 SMYD5 KDM Family Genes KDM6B KDM6C KDM4B

21

Figure 2.2. Overlapping RNA probe design for Agilent custom DNA capture. Repeat- masked elements were excluded due to difficulty mapping these sequences back to the reference genome.

Genomic DNA from each subject was used for the preparation of Illumina

sequencing libraries. Concentration of DNA was determined by Qubit spectrofluorometer

(Invitrogen, Grand Island, NY) and normalized to 3µg in 120µL. Sonication of DNA was

performed on a Covaris S220 (Covaris, Woburn, MA) with settings: duty cycle= 10%,

intensity= 5, cycles per burst= 200, time= 180 seconds and set mode= frequency sweeping at 4°C. These settings result in a size distribution with a peak at 200bp. Sheared DNA was purified with a Qiaquick PCR purification kit and quality was assessed on the Agilent 2100

Bioanalyzer using a DNA 1000 chip (Figure 2.3). End repair was carried out and blunted fragments were purified as before. A one base A-overhang was added with 3’ to 5’ exonuclease deficient Klenow fragment. Custom barcoded adapters with the sequences 5’

P-(barcode reverse-complement)AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG (modified by a phospho-group at the 5’ end) and 5’ P-ACACTCTTTCCCTACACGACGCTCTTCCGATCT

(barcode)*T (additionally modified by a phosphorothioate bond at the 3’ end) were pre- annealed by briefly heating to 95°C and cooling slowly to room temperature. The barcodes used in the pilot experiment were CGTCGT and GCTGCT. The annealed barcoded adapters were ligated to the A-tailed DNA fragments and purified as before. The product was run on a 3% agarose gel and the 200-300bp region was excised with a clean razor. The size- selected DNA was purified using a Qiagen Gel Extraction Kit and subjected to 8 additional cycles of PCR amplification using the PCR primers

22

Figure 2.3. Bioanalyzer analysis of sheared genomic DNA from subject 56 demonstrating the expected peak near 200bp.

5’ P-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT and

5’ P-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT and purified as before. Target enrichment of the adapter-ligated genomic library was then performed. The DNA library was incubated in blocking reagent and hybridization buffer, then incubated with the custom capture probes for 24 hours at 65°C. Streptavidin coated magnetic beads were added to the hybridization mixture, causing the DNA bound biotinylated RNA capture probes to bind to them. The supernatant was removed and the beads washed in a series of wash buffers to remove unhybridized DNA. The beads were then incubated in elution buffer and the supernatant containing the captured DNA and RNA probes was removed and desalted using a Qiagen MinElute PCR Purification Kit. This target enriched library was amplified again for

12 cycles using Herculase II Fusion DNA Polymerase (Agilent) and the PCR primers from the previous amplification step. The final prepared library was run on the Bioanalyzer 2100

23

using a DNA High-sensitivity chip to assess yield and library quality (Figure 2.4). Cluster

amplification and 35bp single-end sequencing was performed by the High Throughput

Sequencing Core in the Department of Biochemistry and Molecular Genetics at the

University of Colorado Anschutz Medical Campus. Sequences were analyzed using software

tools including SAMtools, the Genome Analysis Toolkit (GATK), Picard and SnpEff.

Figure 2.4. Bioanalyzer analysis of the final capture sequencing library from subject 56 using the High Sensitivity DNA chip.

Whole Exome Sequencing

Concurrent with the exome capture pilot, a prepared genomic DNA sample from subject 3 was sent to Otogenetics for whole-exome capture and sequencing for a comparison of data coverage and quality. Exome-capture was performed using the

NimbleGen SeqCap EZ Human Exome Library v2.0 (Roche Nimblegen, Basil, Switzerland).

24

This capture probe library covers 30,000 genes and 36.5Mb of DNA. The prepared sample

library was sequenced on an Illumina HiSeq2000. The data was analyzed in the same

manner as the exome capture sequence. Clear advantages in third-party exome sequencing

over the custom capture, as well as decreasing costs of exome sequencing led to the

decision to send additional samples to a third-party sequencing service. Exome capture and

sequencing was performed on four additional samples by the Beijing Genomics Institute

(BGI, Hangzhou, China) using the Agilent SureSelect Human All Exon 50Mb Kit (Agilent,

Santa Clara, CA).

Results

Sanger Sequencing Analysis of KMT2D and KDM6A

Sanger sequencing reads were analyzed in chromatogram format, with each nucleotide base represented by a peak in one of four colors; the purines adenine (A, green) and guanine (G, black), or the pyrimidines thymine (T, red) and cytosine (C, blue).

Trimming of low quality sequence, base calling, alignment to gene references and variant identification were performed using CodonCode Aligner, and all variants were visually inspected for quality. Manual inspection of the entire sequence was performed for each subject to identify variants missed by the software. Heterozygous single nucleotide polymorphisms (SNPs), in which a single base is changed from the more common reference base, were detected as dual peaks within the space of a single nucleotide, one for each of the two different alleles. Surrounding bases are represented by single peaks. Heterozygous insertions and deletions (indels) were easily detected, but more difficult to interpret. Indels appear as a run of single base calls that abruptly converts into a run of continuous dual peaks. This is caused by both the wild-type and indel-containing alleles containing the same

sequence up to the point of the indel, at which point the original sequence from the

unaffected allele is combined with a sequence that is shifted by the number of bases

25

inserted or deleted. By aligning both forward and reverse sequencing reads to the

reference sequence, then identifying the non-reference base at each dual peak, it is possible

to deduce the number of bases inserted or deleted and their identity (Figure 2.5).

Figure 2.5. Determination of Indel sequences. Sequences in chromatogram format were aligned to the reference sequence, represented by black lettering. The secondary non- reference peak sequence beginning at the indel breakpoint in both directions is represented in red. By determining where the derivative sequence matches the reference, the identity of the deleted or inserted bases can be deduced.

PCR amplification and Sanger sequencing of KDM6A identified one new nonsense mutation in subject 4 out of the 25 subjects not previously sequenced. This variant resulted in a premature stop codon at amino acid 251 of 1401 before the catalytic Jumonji C domain

(Figure 2.6). Sanger sequencing of KMT2D resulted in the identification of 21 mutations in

19 subjects. The missense mutation in subject 2 is in addition to the previously identified

KDM6A deletion (Table 5). In 8 out of the 19 subjects, the mutations detected were either nonsense mutations (subjects 5-8) or indels resulting in a frameshift (subjects 9-12), all of which lead to protein truncation prior to the catalytic SET domain. Subject 13 had a splice- site mutation in KMT2D occurring at the -1 site upstream of exon 53 and altering the “AG” consensus splice acceptor site to an “AC”. This mutation is predicted to establish a weak

26

splice acceptor site by MaxEntScan and is expected to skip exon 53 within the SET-domain,

leading to a catalytically inactive protein product (67). The mutation detected in Subject 17 is an in-frame deletion that leads to the loss of six glutamine residues from a polyglutamine tract that was described by Hannibal et al. as a “mutation hotspot” in their cohort (31).

Polyglutamine tracts have been shown to be important for protein-protein interactions, thus, a change in the size of the polyglutamine tract could potentially interfere with the ability of KMT2D to interact with binding partners and disrupt its ability to bind target genes (70). Thus this variant was considered potentially deleterious. The remainder of the

variants were missense mutations.

Of the 17 subjects from our cohort that were sequenced by Hannibal et al., KMT2D

mutations were detected in 14/17. These mutations consisted of five nonsense mutations,

four indels leading to a frameshift, one splicing defect and five missense mutations. Both

the splice-site mutation and a missense mutation co-occurred in one subject. Subject 57, in

which no mutation was found, was determined to actually be the mother of one of the

subjects.

In-silico Analysis of Missense Variants in KMT2D and KDM6A

Genetic variants that lead to a premature stop codon, a frameshift or an alternative splice product are likely to result in a loss-of-function, and were considered deleterious.

Both KMT2D and KDM6A contain C-terminal enzymatic domains that are essential for their roles, so truncation is likely to abolish protein function. The deleteriousness of missense mutations resulting in amino acid substitutions are much more difficult to categorize. In these cases, it is necessary to use algorithms to categorize the mutation as likely to be benign or deleterious. For this study, a combined in-silico analysis was performed using three prediction algorithms to estimate the effects of missense variants on protein function.

27

Table 5. Spectrum of KMT2D and KDM6A Mutations in Our Cohort of 57 KS Subjects Subjects sequenced as part of this study Subject Gene Chromosomal Location Mutation Effect at In dbSNP or Grantham Polyphen-2 Patient ID GERP score Protein Effect # Affected (hg19) Type Protein Level EVS Score Score 1 CH01-034 KDM6A chrX:43620636-46881568 Deletion Gene deleted - - - - Deleterious 2 CH03-151 KDM6A chrX:43620636-45642604 Deletion Gene deleted - - - - Deleterious 2 CH03-151 KMT2D chr12:49423202 Missense p.Asn4686Thr - 5.27 65 0.99 Probably Benign 3 CH04-099 KDM6A chrX:44949177 Splice Site Splicing Defect - - - - Deleterious 4 KBK08-006 KDM6A chrX:44913077 Nonsense p.Trp251STP - - - - Deleterious 5 KBK07-016 KMT2D chr12:49425800 Nonsense p.Gln4230STP - - - - Deleterious 6 KBK08-001 KMT2D chr12:49431178 Nonsense p.Arg3321STP - - - - Deleterious 7 KBK08-009 KMT2D chr12:49432396 Nonsense p.Arg2915STP - - - - Deleterious 8 KBK08-019 KMT2D chr12:49421090 Nonsense p.Glu4887STP - - - - Deleterious 9 CH02-245 KMT2D chr12:49436069-49436079 Frameshift Frameshift - - - - Deleterious 9 CH02-245 KMT2D chr12:49434924 Missense p.Pro2210Leu Yes 3.61 98 0 Probably Benign 10 CH04-014 KMT2D chr12:49434072-3 Frameshift Frameshift - - - - Deleterious 11 CH04-169 KMT2D chr12:49434958 Frameshift Frameshift - - - - Deleterious 12 KBK09-007 KMT2D chr12:49420511-4 Frameshift Frameshift - - - - Deleterious 13 KBK06-007 KMT2D chr12:49415935 Splice Site Splicing Defect - - - - Deleterious 14 KBK07-008 KMT2D chr12:49426675-49426692 In Frame 6 aa. Deletion - - - - Possibly Deleterious 15 KBK06-013 KMT2D chr12:49420607 Missense p.Arg5048Cys - 4.86 180 0.99 Possibly Deleterious 16 CH02-116 KMT2D chr12:49421017 Missense p.Pro4911Leu - 3.33 98 0.22 Possibly Deleterious 17 KBK06-001 KMT2D chr12:49434493 Missense p.Pro2353Ser Yes 2.99 74 0 Probably Benign 18 KBK06-004 KMT2D chr12:49434801 Missense p.Ser2251Leu Yes 5.07 145 0.92 Probably Benign 19 KBK07-001 KMT2D chr12:49431754 Missense p.Pro3129Ser Yes 4.61 74 0 Probably Benign 20 KBK08-013 KMT2D chr12:49433583 Missense p.Ala2657Val Yes 0.268 64 0 Probably Benign 21 D04-042 KMT2D chr12:49445028 Missense p.Pro813Leu Yes -1.40 98 0 Probably Benign 22 KBK08-005 KMT2D chr12:49445028 Missense p.Pro813Leu Yes -1.40 98 0 Probably Benign 23 KBK06-008 KMT2D chr12:49430947 Missense p.Met3398Val Yes 4.32 21 0 Probably Benign 24 BL4848 ------25 CH00-160 ------26 CH00-233 ------27 CH01-218 ------28 CH03-006 ------29 CH04-011 ------30 CH04-161 ------31 CH04-276 ------32 KBK06-005 ------33 KBK06-010 ------34 KBK07-003 ------35 KBK07-011 ------36 KBK07-015 ------37 KBK08-016 ------38 KBK08-020 ------39 KBK08-023 ------40 KBK09-006 ------

Subjects sequenced by Hannibal et al. 2011 (31) Subject Gene Chromosomal Location Mutation Effect at In dbSNP or Grantham Polyphen-2 Patient ID GERP score Protein Effect # Affected (hg19) Type Protein Level EVS Score Score 41 CH03-188 KMT2D chr12:49425287 Nonsense p.Q4401X - - - - Deleterious 42 CH04-006 KMT2D chr12:49427339 Nonsense p.Q3717X - - - - Deleterious 43 CH04-184 KMT2D chr12:49420410 Nonsense p.Y5113X - - - - Deleterious 44 KBK08-001 KMT2D chr12:49431178 Nonsense p.R3321X - - - - Deleterious 45 CH01-086 KMT2D chr12:49432474 Nonsense p.G2889X - - - - Deleterious 46 CH02-066 KMT2D chr12:49428410 Frameshift p.P3466LfsX36 - - - - Deleterious 47 D04-190 KMT2D chr12:49415906-09 Frameshift p.N5480VfsX6 - - - - Deleterious 48 KBK07-012 KMT2D chr12:49434959 Frameshift p.Y2199IfsX65 - - - - Deleterious 49 KBK09-001 KMT2D chr12:49432278-80 Frameshift p.K2953NfsX5 - - - - Deleterious 50 CH02-152* KMT2D chr12:49442441 Splice Site Splicing Defect - - - - Deleterious 50 CH02-152* KMT2D chr12:49442531 Missense p.P1348S - 5.92 74 1 Possibly Deleterious 51 CH02-161 KMT2D chr12:49420109 Missense p.R5214C - 5.12 180 1 Possibly Deleterious 52 CH02-025 KMT2D chr12:49440522 Missense p.C1430R - 5.09 180 1 Possibly Deleterious 53 CH03-156 KMT2D chr12:49441821 Missense p.R1388L - 5.83 102 0.697 Possibly Deleterious 54 CH02-160 KMT2D chr12:49420213 Missense p.R5179H - 4.23 29 0.84 Benign 55 BL5441 ------56 CH02-146 ------57 CH05-183 ------

28

The second generation Genomic Evolutionary Rate Profiling (GERP++) algorithm uses an alignment of 33 mammalian species to estimate the extent of purifying selection at each nucleotide in the genome, given as a conservation score (71). The conservation score is calculated by using the evolutionary distance between species to estimate the expected number of nucleotide substitutions given a neutral mutation rate, and subtracting the actual number of substitutions. This results in a rejected substitution (RS) score, with higher positive numbers indicating nucleotides are under increasingly purifying selection. This method is based entirely on analysis of genomic sequence and is protein sequence naive.

The second generation polymorphism phenotyper (PolyPhen-2) analyzes protein sequences to estimate the effects of missense mutations on protein function (72). Eight of the 11 predictive parameters considered by the algorithm are sequence based, and three are structure based. Each of these parameters was chosen by an iterative greedy algorithm to maximize identification of deleterious substitutions when trained on the known

Mendelian disease causing alleles in the UniProtKB dataset. To utilize the sequence based algorithms, a multiple sequence alignment between the variant amino acid sequence and human and mammalian sequences is performed and refined. This alignment is used to identify clusters of related sequences which may include both paralogs and orthologs. The physical properties of those amino acid substitutions observed in related sequences are compared to the properties of the missense substituted amino acid to determine the compatibility with the presumably benign substitutions in related sequences. The structural parameters require a crystal structure of the human protein to be included in the protein structure database (PDB) and use parameters including side-chain hydrophobicity, changes in solvent-accessible area, B-factor (characterizes atomic mobility) and contacts with functional sites to estimate effects on protein function. The false positive rate (FPR)

29 estimates the probability that the substitution is called as deleterious when it is not. The

PolyPhen-2 score is given as the true positive rate of 1-FPR.

The Grantham score uses three variables, side-chain composition, polarity and side- chain volume to determine the chemical difference between the original and substituted amino acids (73). Side-chain composition is determined by the ratio of the molecular weights of non-carbon constituents over carbon. The polarity variable used in this calculation is closer to the modern measurement of hydrophobicity, and was measured as the mobility of amino acids on chromatography paper in a solvent system (74). Side-chain molecular volume is considered due to its importance for maintaining protein folding. The

Grantham score has an average value of 100, and amino acid substitutions with low

Grantham scores are more likely to be benign.

With variants of unknown pathogenicity, it is important to be conservative in estimating pathogenic potential. In a recent paper on autism exome sequencing, O’Roak et al. (2012) used a similar variant prediction analysis, considering a variant severe if there

(75). In wasorder a tocombined be conservative Grantham in scoreassigning ≥50 pathogenicity,and GERP score mutations ≥3 or a Grantham were required score to≥85 meet two

-2 score out of (71three–73) cutoff. scores: a GERP++ score ≥3, a Grantham score ≥85 or a PolyPhen

≥0.95 The in-silico analysis was performed on all 11 KMT2D missense mutations detected in the cohort of 40 subjects, as well as the five missense mutations in the additional 17 subjects that were genotyped by Hannibal et al. (31). Due to the fact that KS is caused by dominant mutations, any SNPs that were present in the Single Nucleotide Polymorphism

Database (dbSNP) or the Exome Variant Server (EVS) were considered to be benign.

Additionally, because of the rarity of this disorder, it is highly unlikely that two independent pathogenic mutations would occur. Any missense variants that co-occurred with

30

deleterious mutations were also considered benign. In order to test the accuracy of the in- silico analysis, these mutations were still analyzed to determine if those variants classified as benign due to their presence in the normal population would also be considered benign by the analysis.

Interestingly, the missense mutation in subject 3 that co-occurred with a KDM6A deletion and the missense mutation in subject 9 that co-occurred with an 8bp deletion both would have been predicted to be deleterious by the analysis. The missense mutation in subject 9 was also included in the EVS database at an estimated 0.19% minor allele frequency. On the other hand, the in-silico analysis successfully predicted that the other six

SNPs in KMT2D that matched entries in the dbSNP database were benign. The two novel missense mutations in subjects 15 and 16 were predicted to be deleterious by the analysis.

Of the five missense mutations in the previously sequenced cohort, all of them were novel, and four of them were predicted to be deleterious by the analysis. Subject 50 harbored two mutations in KMT2D, a splice-site mutation and a p.P1348S missense mutation. Hannibal et al. could not determine which variant was pathogenic, and our analysis predicts that both are potentially deleterious. It is important to note that no scoring of missense mutations was performed by the authors, so the variant that was labeled as benign by our in-silico analysis has no independent validation. The purpose of the analysis is to predict which variants are most likely to be pathogenic and is by no means a perfect solution. A more comprehensive analysis could be done by creating the specific missense mutations identified in KS subjects in a model organism to compare their developmental phenotype to both wild-type and knockout embryos, but this would be both cost and time prohibitive.

31

Figure 2.6. Spectrum of deleterious mutations in subjects diagnosed with Kabuki syndrome. The KMT2D and KDM6A proteins are each represented with functional domains in each protein indicated by colored bars. The location of each variant is indicated.

Analysis Pipeline for Next-generation Sequencing Data

The analysis of next-generation sequencing is much more complex than that required when analyzing Sanger sequences. The Illumina sequencing process and both SNP and Indel calling with this data suffers from certain biases and systematic errors introduced early during PCR amplification steps, during library construction, by sequencing chemistry and during base-calling steps (76–78). These systematic errors contribute to a high rate of false positives, in which a variant is called when it is not present, or false-negatives, in which a variant is present in the sample but not detected. It is important to minimize the effects of these biases on novel variant detection and remove the vast majority of false positives and negatives. This allows the researcher to focus only on those variants that are most likely to be real, and thus focus on the biologically relevant data. For analysis of the KS sequencing data, an analysis pipeline was developed modeled on the “Best Practices” recommended by the Broad Institute, a world leader in genomic sequence analysis (79).

This methodology has been evolving as the institute analyzes more and more datasets. All of the tools utilized are freely available online, although their use requires an understanding of both Linux command line and of the sequencing strategies and tool settings implemented within the analysis.

32

The analysis pipeline was developed around the Genome Analysis ToolKit (GATK), which is a software framework published by the Broad Institute and implemented in Java for manipulating sequencing data (79). The GATK software contains many subroutines known as “walkers” that perform each of the data manipulations. This pipeline utilized the standard tools used for variant discovery in human genomes, as well as specialized tools developed to perform tasks related to increasing the accuracy of SNP and indel calls. The entire process encompassed three distinct phases: data preprocessing, variant discovery and analysis of variants. The actual commands implemented are available in Appendix D.

The data preprocessing phase began with analysis of the data type. Most of the raw sequencing data was acquired in the Illumina 1.5+ format and required conversion to the

Sanger Fastq format to be read by downstream tools. This was accomplished using a Perl script written by Chris Todd Hittinger, now an assistant professor of genetics at the

University of Wisconsin. Correctly formatted raw sequence reads were aligned to the

GRCh37 assembly of the by the Burrows-Wheeler Aligner tool (BWA), using either single- or paired-end specific commands (80). The output in SAM (Sequence

Alignment Map) format was compressed into the BAM (Binary SAM) format using Picard, a suite of tools for manipulating SAM and BAM files maintained by the Broad Institute at http://broadinstitute.github.io/picard. When working with paired-end data, mate-pair information was validated by Picard. Picard was also used to remove duplicate reads that may bias base-calling, and to collect alignment summary metrics. These metrics were read by the GATK walkers for many analysis steps. The presence of small insertions and deletions (indels) in samples are a major source of false SNPs in many alignments, as the reads cannot match the reference exactly, and both up and downstream reads are mapped incorrectly. The GATK walker RealignerTargetCreator uses a map of indels from a human variation study to mark known areas of common indels, and detects clusters of SNP calls to

33 create targets for local realignment by another walker, the IndelRealigner (81). Many of the systematic errors from the sequencer can be detected by analyzing covariates such as read group (eg. all calls from one sequencing lane), assigned quality score, machine cycle and dinucleotide identity (eg. current base + previous base) at incorrect calls. The walker

BaseRecalibrator records these data, using the assumption that all mismatches from reference not found in dbSNP135 are errors. This data is used by the GATK to determine an empirical error model and perform base quality score recalibration (BQSR) based on differences between sequencing runs (determined by read group), quality estimation errors

(the difference between assigned quality score and actual mismatch based score), machine cycle (quality tends to be lower at the ends of reads) and dinucleotide identity (some dinucleotides such as AC, have inherently lower read quality than others, like TG). The effects of base quality score recalibration are shown for subject 24 (Figures 2.7-2.9).

Figure 2.7. Empirical quality score vs. reported quality score before and after base quality score recalibration in subject 24.

34

Figure 2.8. Distribution of reported quality scores before and after base quality score recalibration. Note that before recalibration, the most commonly reported score was the highest possible. Recalibration results in increased spread in quality scores, reported as entropy.

Figure 2.9. Effects of BQSR on quality scores by cycle. Interestingly, early read quality scores were lower than reported by the sequencer, while many later cycles were actually higher than reported.

35

The variant discovery phase began with variant calling by the GATK walker

UnifiedGenotyper. It uses a Bayesian genotype likelihood model to determine genotypes with accurate estimated probability scores. It also annotates these variant calls with a number of metrics, including quality by depth, haplotype score, mapping quality rank sum test, read position rank sum test, Fisher strand and RMS mapping quality. The first round of variant calls will contain many false-positive calls. Variant quality score recalibration

(VQSR) is used to filter out the majority of these false positives. VQSR is very different from

BSQR. Instead of using the properties of false positives to recalibrate scores, it uses the values of metrics that are associated with true variants to estimate the probability that a called variant is real. This walker assumes that variant calls that match known polymorphic sites from the HapMap project, Omni 2.5M SNP chip array and dbSNP are true calls, and trains the algorithm to estimate the relationships between the values of annotated metrics from the UnifiedGenotyper walker and the probability that a variant is real. These recalibrated calls are separated into tranches. A 90% tranche contains all variants that lie within the range of each metric that contain 90% of the known true positive calls. This is the most discriminating filter and contains very few false positives. The 99% and 99.9% tranches are less and less discriminating and contain higher numbers of false positive calls.

The final step in the SNP calling process is to annotate the effects of variants with the SnpEff variant annotation and prediction tool (82).

Indel calling was performed in the same fashion as SNP calling, with the caveat that these calls are much more likely to be false positives. This is because there are not enough actual indels present in subjects to properly train the recalibration algorithms. The indel recalibration step is not performed unless there are hundreds of samples to be analyzed simultaneously.

36

Custom Capture and Next-Generation Sequencing

The sequences obtained for subjects 48 and 56 from the custom capture were

analyzed. There were 11.5 million unique reads obtained from subject 48. When mapped to the 60 target genes, only 16X average coverage was obtained. There were 13.3 million unique reads obtained from subject 56, which resulted in 17.5X mapped coverage. The on- target coverage was much lower than it should have been. With the 11.5 million reads of

35bp that were obtained from subject 48, theoretical coverage over the 2.8Mb target region is nearly 144X. This suggests that less than 1/10 sequences were on-target. The coverage was also very uneven, making variant calling problematic. Because this was a pilot experiment, variant calling was attempted despite the less than ideal results. The next- generation sequencing analysis pipeline was used to call both SNPs and indels. Five coding variants were identified in subject 48. These included both an in-frame deletion of three basepairs and an in-frame insertion of three basepairs in PRDM2, an arginine to cysteine mutation in SMYD1, a threonine to proline missense mutation in KMT2C and a lysine to glutamine missense mutation in PRDM9. The known KMT2D pathogenic mutation was not detected in subject 48 because there was no coverage at this location (Figure 2.10).

Because the causal mutation is known, all of the other mutations identified in subject 48 are likely to be false positives or benign. Only two coding variants were identified in subject 56, a +G insertion at a 3’ splice site of KMT2C and the same missense mutation in PRDM9 that was identified in subject 48. The one unique mutation in subject 56 is probably a false positive as well because the indel calling algorithm is much less robust than the SNP calling.

37

Figure 2.10. Custom capture and 36 basepair Illumina sequencing failed to identify the known mutation in KMT2D. Sequences that mapped to this region are denoted as light gray bars. The vertical dotted line denotes the location of the pathogenic variant. No sequence coverage was obtained at the region of interest.

Exome Sequencing

Subject 3 was the first sample subjected to exome sequencing. Over 39 million 90bp paired-end reads were obtained. When mapped to the human genome, this led to an average of 50X on-target coverage. Assuming a perfect exome capture, coverage of the

36.5Mb region would be nearly 192X. This means that the exome capture was far superior to that obtained by the custom capture, with more than 1 in 4 sequences mapping uniquely to the targeted locations. Using the next-generation sequencing pipeline, 296 novel missense, 6 novel nonsense, and one splice-site mutation were identified. Importantly, the splice-site variant identified is the same KDM6A variant identified previously as probably pathogenic. This pilot experiment served as an excellent validation of the power of exome sequencing to identify causal variants.

38

The next phase of the exome sequencing experiment was intended to identify pathogenic variants in five members of the KS cohort with no detected mutations in KMT2D or KDM6A. This second exome sequencing cohort consisted of subjects 24-29. In this experiment, after utilizing the next-generation sequencing analysis pipeline to call variants, variants were ranked by the genes that contained unique mutations in the highest number of subjects. A dominant mutation model was used because the previous mutations detected in KS were dominant mutations and it was suspected that this would not change. A candidate gene for KS should be mutated in multiple subjects, and each subject should have a unique variant because the disorder is sporadic in nature. This rationale was similar to that employed during the identification of KMT2D as the major cause of KS (28).

Only the MUC4 gene had a unique mutation in all five subjects. A MUC gene was also identified in 10/10 subjects in the report that identified KMT2D as the cause of KS, probably due to their large size (28). In fact, three of the six genes that had unique mutations in four of the subjects were also MUC genes. The other three genes were KMT2C, IGFN1 and

LILRA6. The IGFN1 gene is thought to be a structural component of human muscle, and

LILRA6 is a cell-surface receptor that regulates the function of toll-like receptors (78,79).

The KMT2C gene was our obvious candidate because of its close relationship to KMT2D.

There were five unique KMT2C mutations identified in four subjects (Table 6).

Subject 24 had two missense variants, while subjects 25 and 26 had one each. A nonsense mutation was identified in subject 28. Four additional variants were present in more than one subject. Two of the unique mutations and three of the shared mutations were previously identified in dbSNP. Additionally, all of the variants except the two missense variants unique to subject 24 and the missense variant unique to subject 26 lie within a large segmental duplication spanning chr7:151901420-151989558 (Figure 2.11). This duplicated region is 88 kilobases in length and matches chr21:10995231-11076838 with

39

90-98% similarity. This segmental duplication probably caused errors when mapping sequences to the reference genome, leading to false positives. This is also the most likely reason that the variant calls cluster within two discrete regions in the segmental duplication. These mutations are probably often miscalled by other groups performing exome sequencing, which is why they are included in dbSNP. While it is possible that some of the variants within the segmental duplication are real, it is difficult to obtain sequences within these regions that can be unambiguously proven to originate from one .

Table 6. Mutations in KMT2C Identified by Exome Sequencing Chromosomal Mutation In Mapping Protein Subject Subject Subject Subject Subject cDNA Effect Tranche Location Hg19 Type dbSNP? Quality Effect 24 25 26 27 28 chr7:152008951 Missense c.671A>G 757.89 >90 p.L224P HET - - - - chr7:151878026 Missense c.6919T>C 626.68 >90 p.R2307G HET - - - - chr7:151970951 Missense c.851C>T Yes 72.51 99.9to100 p.R284Q - HET - - - chr7:151841821 Missense c.14320A>G 1451.33 90to99 p.Y4774H - - HET - - chr7:151945256 Nonsense c.2263G>A Yes 51.68 99.9to100 p.Q755* - - - - HET chr7:151945071 Frameshift G>GT 4044.1 90to99 Frameshift HET - HET HET HET chr7:151932945 Missense c.2726C>T Yes 100.49 99.9to100 p.R909K HET HET - HET - chr7:151932997 Missense c.2674C>T Yes 335 99.9to100 p.G892R HET - HET - HET chr7:151945225 Missense c.2294T>C Yes 10.95 LowQual p.E765G - - HET - HET

Figure 2.11. An ideogram of KMT2C variants displayed in the UCSC genome browser. Variants identified in the exome sequencing screen are represented as vertical red dashes. The bottom track shows the location of segmental duplications. The majority of KMT2C mutations identified lie within a segmental duplication.

40

Only three variants originate from outside the segmental duplication in KMT2C.

These variants had the highest mapping scores aside from the one indel variant that is not

trustworthy due to problems with the indel calling algorithm. These missense mutations

were confirmed by PCR amplification and Sanger sequencing. The first missense mutation

in subject 24 was a c.671A>G mutation that replaces lysine at amino acid 224 with a proline

(Table 7). This mutation had a GERP score of 4.76, a Grantham score of 98, and a PolyPhen-

2 score of 0.93. Using the same cutoff scores from the KMT2D and KDM6A Sanger sequencing experiment, this mutation is predicted to be deleterious. The second missense mutation in subject 24 was a c.6919T>C mutation that replaces arginine at amino acid 2307 with a glycine. This mutation had a GERP score of 1.28, a Grantham score of 125 and a

PolyPhen-2 score of 0.38 and is predicted to be benign. The missense variant in subject 28 had a GERP score of 5.31, a Grantham score of 83 and a PolyPhen-2 score of 1 and is predicted to be deleterious.

Table 7. Validated Missense Variants Detected by Exome Sequencing Effect at Grantham Polyphen-2 Subject # GERP Score Protein Effect Protein Level Score Score 24 L224P 4.76 98 0.93 Possibly Deleterious 24 R2307G 1.28 125 0.38 Probably Benign 26 Y4774H 5.31 83 1 Possibly Deleterious

Discussion

KMT2D and KDM6A Mutations in KS

Individuals clinically diagnosed with KS have been screened for mutations in the

candidate genes KMT2D and KDM6A. Of the entire cohort of fifty-six subjects tested, thirty had mutations in either KMT2D or KDM6A, the two genes identified thus far as causative of

KS. This includes the variants that were predicted to either be deleterious or potentially deleterious by a battery of computational tools that predict the effect of mutations on protein function. Previous studies have detected KMT2D mutations in 44-76% of patients,

41

and our detection rate is in the lower range with 26/56 (46%) detected in this study

(28,31–34,69). KS has a highly variable phenotype, and there is evidence that patients exhibiting a phenotype that closely matches the “classic” gestalt of facial and other syndromic features are more likely to have mutations in KMT2D (33). The sixteen “classic”

KS subjects from our cohort that were included in a previous study had KMT2D mutations in

14/16 (88%), supporting this observation (31).

The 2012 report of deletions encompassing or within KDM6A in 3/22 (14%) KMT2D mutation negative individuals with KS was a confirmation of our own data that KDM6A mutations cause KS (35). Additional clinical reports have detected small SNPs and indel mutations in 3/32 (9%), 4/170 (9%), 3/26 (12%), 7/118 (6%) and 1/2 (50%) of KMT2D mutation negative patients (36,37,69). Other studies have attempted to identify KDM6A mutations in large cohorts and failed to do so (37,85,86). The prevalence in the total KS population has been estimated to be less than 5% (69). In our cohort we detected KDM6A mutations 4/19 (21%) of the KMT2D mutation negative subjects and 4/56 (7%) total. This number is slightly higher than previously reported, although so few variants have been reported that it is still too early to establish the percentage of subjects that will ultimately be found to have causative KDM6A mutations. Because it is located on the X-chromosome, it is important to note that KDM6A variants were detected in both male and female KS patients in our study and by Lederer et al. (35). KDM6A has been shown to escape X- inactivation independently of the pseudoautosomal regions, and a dosage contribution was found from both alleles in females (87). Complete knockout of KDM6A is embryonic lethal in female mice, but expression of KDM6C, the KDM6A paralog on the Y-chromosome, compensates for the reduced KDM6A transcript abundance in males, restoring embryo viability (88). This supports the view that KDM6A deletions and mutations, in both males and females, represent an X-linked dominant inheritance.

42

KMT2D and KDM6A Mutations are Not Identified in All KS Subjects

Thirty-six of the subjects from our cohort of 57 KS patients did not carry detectable mutations in either KMT2D or KDM6A. There are multiple possible explanations for this result. One possibility is that some of the mutation-negative subjects have disorders other than KS, but with phenotypic overlap. Another is that mutations in regulatory regions of

KMT2D and KDM6A may account for some of the cases and were not detected by the methods used. The fact that loss-of-function mutations are so common makes this less likely. Alternatively, there may be additional genes that have yet to be identified in the etiology of KS. Many subjects have “classic” presentations and yet do not have mutations in either KMT2D or KDM6A. Both of these genes function as part of a large complex containing several proteins, each of which are candidates for additional causative genes (49).

Custom Capture and Sequencing of Potential KS Candidate Genes

The custom capture and Illumina sequencing was an early attempt to identify new candidates in the etiology of KS. This experiment was designed to capture only the sequences corresponding to the best candidate genes, cutting the sequenced portion of the exome from the 36.5Mb captured by Nimblegen’s exome capture kit to 2.8Mb. This had the advantages of allowing heavy multiplexing of samples in order to make it economically feasible to sequence a large number of samples while also reducing data handling complexity. It may have been better to obtain 90bp paired-end reads to yield better coverage and mapping quality, but this would have increased cost dramatically.

The sequence data obtained from this experiment was inadequate for the detection of novel variants. Failure to verify the known mutation in the positive control sample highlighted the need for much better coverage of the genes targeted. The reason for the failure of the capture to adequately enrich the sequencing library is not trivial to determine.

The exome capture itself has many complicated steps, any of which can ruin the sample if

43

done improperly. Multiple liquid handling and PCR amplification steps both increase the chances of introducing contamination or amplification bias. Experimental error could have been one of the reasons that the capture did not work well. A major issue with the capture kit was brought to light by Agilent. The buffers that were provided with the capture kit were incorrect, which was not identified until months later. Imbalanced pH in the buffers may have drastically changed the binding efficiencies of the capture kit, and is probably largely responsible for the inefficient capture. Because there was an overabundance of off- target genomic sequence, the capture beads were most likely promiscuously binding, or the wash steps were not sufficient to remove genomic contamination. Rather than perform time-intensive and expensive optimizations to this experiment, it was decided to utilize a sequencing service to perform exome sequencing. While the logic of this strategy was sound, its execution did not live up to its potential.

Whole Exome Sequencing

Whole exome sequencing was much more successful. The advantages of using an established sequencing service to perform all of the steps of exome capture and sequencing is that they have heavily optimized and automated the process. This results in consistent data quality, and reduces the risk of wasting precious clinical samples. The cost has dropped considerably, making it economically feasible to sequence more samples than would have been possible just a few years ago.

The pilot exome sequencing experiment was performed to compare the data quality and mapping coverage to that of the custom capture experiments. Subject 4 was used for the initial experiment because the causal variant is known, a c.3736+2T>C splice-site mutation. As was proven by the custom capture experiment, positive controls are vital for assessing data quality. A non-exonic mutation was chosen to determine if it would be detected by the variant calling pipeline. The sequence data obtained for subject 4 was a

44

drastic improvement over that obtained from the custom capture. The average target

coverage was over 40X, with a much more even distribution. The KDM6A splice-site

mutation was identified as one of ten novel loss-of-function mutations, and was the only

candidate gene with a coding mutation. The excellent data quality indicated that exome sequencing was the better method for detection of novel candidate genes.

Identification of a Novel KS Candidate Gene

The sequence data obtained from the five subjects within the experimental cohort was of similarly high quality. Combining all of the raw mapping and variant calling data into one analysis facilitated superior data recalibration, allowing identification of sequencing- specific biases to train the variant scoring algorithms and increasing the accuracy of variant calls. Sequencing of multiple subjects also permitted the identification of genes that had unique mutations in multiple subjects. The identification of KMT2C as a potential causal gene for KS was very exciting. This gene is actually an ideal candidate. Of all of the other

KMT2 genes, it is the closest homolog of KMT2D. It also appears to form a complex with the same proteins as KMT2D. KMT2C is also a good candidate because the two genes in the

KMT2D complex that have been implicated in KS are also the only enzymatic members.

Many of the proteins that form the KMT2D complex are shared between multiple similar complexes, making their loss potentially more damaging. For example, loss-of-function mutations in the KMT2D scaffolding protein WDR5 would have very different consequences.

This is because WDR5 is shared by all of the H3K4 methyltransferase complexes. Mutations in this gene would affect global levels of histone H3K4 methylation and would undoubtedly have a different phenotype (89). Mutations in the KMT2D complex component NCOA6 are also unlikely to cause KS because heterozygous NCOA6 knockout mice do not exhibit a phenotype (90). KMT2C is a good candidate because it takes the place of KMT2D in a similar complex.

45

The majority of KMT2C variants were not able to be validated. These variants were detected within a large segmental duplication (segdup) at chr7:151901420-151989558 that covers 19/59 (32%) of the exons of KMT2C. The duplicated region shares 90-98% homology with chr21:10995231-11076838. The segdup most likely introduced errors in the alignment of the sequencing reads to the reference genome. The use of paired-end reads reduces the incidence of alignment errors throughout most of the genome. This is because the average inter-read distance can be calculated and used to avoid mapping of the paired reads to two different areas. The fact that the segdup is 88kb in length interferes with the effectiveness of this strategy because both of the paired-end reads can still be mapped to the wrong chromosome. While differences between the segdups are enough to unambiguously assign the majority of the reads in this region, some highly similar regions will cause sequences to align to the wrong chromosome (91). The few differences in these incorrectly mapped reads would be detected as SNPs in our analysis, when they actually originate from the other segdup. This may also be the reason that the apparent SNPs were detected in clusters.

The three missense mutations detected in subjects 24 and 26 provide tantalizing evidence that mutations in KMT2C may cause KS, but they are not conclusive. The majority of pathogenic mutations in KS are loss-of-function, but missense mutations have also been associated with the disorder. While an in-silico analysis estimated that two of the KMT2C mutations are potentially pathogenic, the burden of proof is much higher with a new candidate gene. Once a method has been validated for sequencing through the segmental duplication, the KMT2C gene should be sequenced in the remaining KMT2D mutation negative subjects to gather more evidence for its role in the pathogenesis of KS. The segdup proved problematic for sequencing validation because unique primers were difficult to identify. Sanger sequencing was attempted but in all cases yielded uninterpretable

46

sequences. One way to solve this problem would be to clone the segmentally duplicated region within KMT2C into a bacterial artificial chromosome (BAC) for sequencing. This would be a time consuming project but should ultimately give unambiguous results. An additional validation method of the role of KMT2C in KS is gene knockout or knockdown in an animal model. This was the approach utilized for validating KDM6A as a candidate gene for KS. We considered this approach for KMT2C but it was not undertaken due to time constraints. It is still worth undertaking to validate a new candidate gene for KS.

47

CHAPTER III

THE ROLES OF KMT2D AND KDM6A IN DEVELOPMENT

Introduction

Known Roles of KMT2D and KDM6A Relevant to Development

The role of transcription modifiers such as the KS associated genes KMT2D and

KDM6A can be very complex. They have been studied extensively in the context of cancer, which was the only known role in human pathogenesis before they were identified as a cause of KS. The genes that are misregulated in cancers are likely to be important in developmental processes, although these roles may be seemingly contradictory in different cancer subtypes. In colorectal and medulloblastoma cancer lines, knockdown of KMT2D causes defects in cancer proliferation and cell migration (92). Similarly, knockout of KDM6A or KMT2D reduced proliferation and invasiveness of a breast cancer cell line (93).

Contradictory to this, reestablishing KDM6A expression in two esophageal cancer lines that had inactivating mutations in KDM6A reduced proliferation (94). These studies show that the KMT2D complex plays a regulatory role in the proliferation of some cell types.

Roles for KMT2D and KDM6A in specification and differentiation have also been demonstrated in culture. KDM6A is essential for induction of pluripotency in embryonic stem cells, yet it was not required for differentiation after pluripotency was established

(95). Another study reported that KDM6A deficient ESCs were unable to differentiate into cardiac lineage-committed embryoid bodies (96). While this might suggest that development of cardiomyocytes would be affected by loss of KMT2D, the specific heart defects common in KS may have a different origin.

48

The Zebrafish as a Model Organism for Studying the Genetics of MCA Disorders

Although many processes important to development have been shown to be regulated by KMT2D and KDM6A, there is still much to be elucidated in the context of development of complex organisms (56,92). Additionally, many functions of KMT2D and

KDM6A have been demonstrated for one gene or the other, yet the phenotypic overlap in KS suggests that their roles are intrinsically linked. When this project was first undertaken,

KDM6A was still a novel, unpublished candidate gene. Its role in the pathogenesis of KS was still hypothetical and it was important to prove that the loss of this gene would result in a similar phenotype to a KMT2D knockdown.

In order to study the roles of both KMT2D and KDM6A in vertebrate development, we performed a knockdown of their expression in the zebrafish (Danio rerio), a common model organism. Zebrafish have played a critical role in elucidating the functions of candidate genes implicated in similar disorders characterized by multiple congenital abnormalities (97–99). Significant insight into the role of these genes can be gained by manipulating zebrafish genetics early in development due to the conservation of genes and developmental pathways between zebrafish and humans (100). Additionally, external development and optical clarity make zebrafish embryos excellent tools for in-vivo imaging at early stages of development (97).

Conservation of KS Genes in Zebrafish

In the zebrafish, both KMT2D and KDM6A are well conserved. There is one ortholog of human KMT2D in zebrafish (kmt2d), and two orthologs of KDM6A (kdma6 and kdm6al), which arose from an evolutionary partial genome-duplication in teleosts (101). Pairwise alignment using Emboss Needle demonstrates that there is 41% identity and 51% similarity between human and zebrafish KMT2D. The C-terminal sequence containing the catalytic

SET domain has 98.6% identity and 99.3% similarity. The conservation between KDM6A

49

and its orthologs are even higher, with 71% identity and 80% similarity with kdm6a, and

55% identity and 65% similarity to kdm6al.

Morpholino Knockdown of Genes Implicated in Kabuki Syndrome

The abundance of single allele loss-of-function mutations associated with KS suggest that the phenotype is predominantly caused by haploinsufficiency of KMT2D or KDM6A during development. In humans, this is only strictly accurate for KMT2D, as KDM6A is located on the X-chromosome and thus only present in one copy in males. The evidence that the Y-chromosome paralog KDM6C (UTY) is able to partially compensate for the loss of

KDM6A in males makes this a reasonable approximation (34).

To study the roles of KS genes in developmental processes, we approximated haploinsufficiency of kmt2d, kdm6a and kdm6al by injecting 1-4 cell-staged zebrafish embryos with Morpholinos (MOs, Gene-tools, Philomath, OR) to knock down . MOs are synthetic RNA analogs with a morpholine ring backbone in place of ribose, and a non-ionic phosphorodiamidate bond in place of the phosphate linkage between nucleic acid subunits (102). These have an advantage over small interfering RNA constructs because they are unable to be broken down by RNases and are retained within the cell. The sequences of a knockdown MO are designed as reverse complements to the targeted stretch of RNA to facilitate binding.

There are two main types of MO, those that target transcription start-sites (TSS), and those that target splice-sites (SS) of newly transcribed messenger RNAs. Each of these has a different mechanism of action. The TSS morpholino binds to a region of the mRNA that encompasses the start codon and sterically hinders the initiation step of translation

(103). SS Morpholinos are designed to bind at exon-intron junctions during transcription, hindering assembly of the spliceosome. This leads to aberrant splicing, resulting in either exon skipping or intron retention in the final transcript (104). Paul Morcos of Gene-tools

50

has demonstrated that targeting a MO to any of the binding sites of the components of the

spliceosome will inhibit splicing, indicating that steric hindrance of spliceosome assembly is the likely mechanism of action (105). In practice it is much easier to target the 5’ splice donor or the 3’ splice acceptor sites. MOs have demonstrably less off-target affects than many knockdown technologies such as siRNAs, but they are still a concern. Some MO sequences cause neural toxicity through activation of -mediated apoptosis (106). This effect is rescued by concurrent knockdown of p53. For this reason, co-injection of p53 targeting MO is an essential control when using MOs.

Stable Gene Knockout Strategies

While knockdowns are valuable for demonstrating the roles of genes in development, they have many potential caveats as well. The degree of knockdown is dosage dependent, and a compromise between efficaciousness and toxicity must be made in order to use it effectively. There is also variability in knockdown effectiveness due to the extremely small amounts of MO injected, as well as the location of the injection. This is why a stable gene knockout is desirable. Two methods of targeted gene knockout were attempted, the first using a customized zinc-finger nuclease and the second using a CRISPR-

Cas9 nuclease.

Recent advances in gene editing have revolutionized our ability to manipulate the genomes of model organisms. One of the first methods developed for targeted mutagenesis was the use of zinc-finger nucleases (ZFNs). ZFNs are synthetic proteins that bind to a DNA target sequence and cause cleavage of double-stranded DNA. The specificity of ZFNs is imparted by splitting the nuclease into two separate proteins that contain both DNA binding activity and one half of a heterodimeric fok1 nuclease domain. Both halves must each bind in the correct orientation to cause dimerization of the nuclease domains for cleavage to occur. The two halves of the nuclease contain three fused zinc-finger DNA binding domains,

51

and each of the zinc-finger domains bind to a specific 3 basepair sequence. When the two nucleases are injected into a single-cell embryo, they each bind to one half of the 23-24 bp

sequence in an orientation that brings the fok1 endonuclease halves together to dimerize

and become active, and a cut is made within the 5-6 bp spacer region between them. Error

prone DNA repair mechanisms such as non-homologous end-joining introduce indels in a

modest percentage of cases, creating heterozygous mutant embryos which can be crossed

to make knockout embryos for further study. Despite the safeguards against off-target

activity, there are still cleavages that take place at similar sequences (107).

An improved method of targeted mutagenesis has been developed based on the

CRISPR-Cas9 system discovered in many bacteria and archaea. This technique is based on

a defensive measure against viral infection utilized by a number of bacteria and archaea

(108). Guide RNAs (gRNAs) antisense to invading viral DNA sequences are encoded within

clustered regularly interspaced short palindromic repeats in the bacterial genome that are

transcribed during viral infection and guide the Cas9 nuclease to cleave the DNA introduced

by the virus at the targeted site. The minimal necessary components of the gRNAs have

been reverse engineered so that they can be customized to target any DNA sequence that

meets a few specific sequence requirements, namely a “GG” followed by a 20 nucleotide

unique sequence and a second “GG” that is required for cloning into the expression vector.

Co-injection of the custom gRNA and Cas9 nuclease transcript into single-cell zebrafish

embryos leads to cleavage at the targeted site, and error prone repair introduces loss-of-

function mutations early in embryogenesis. This technique is not perfect, and in some cases

may also induce off-target mutations in sequences with high similarity (109).

The Role of Retinoic Acid Signaling in Development

It has been demonstrated that the KMT2D complex is required for transcription of

RA responsive genes through direct binding to RARs (50). This has many implications for

52

development and may explain some of the phenotypes associated with KS. Retinoic acid is a

powerful morphogen that regulates apoptosis, specification and differentiation (110). RA

signaling is essential for normal craniofacial development. Inhibition of RA signaling leads to defects in viscerocranial development and reduced expression of , a marker of cranial

NCC cells that populate the pharyngeal arches (111). In RAR knockout mice, defective heart

septation leads to cardiac outflow tract abnormalities due to failed cardiac NCC

differentiation (112). RA has a posteriorizing effect on neural ectoderm, activating

expression of posterior segment genes such as hoxb1b and meis3 (113). Additionally, the presence of two RAREs in the promotor of the neuronal differentiation factor Neurog2 indicates that RA signaling has a role in the induction of neuronal differentiation. The abnormalities caused by KS genes may be due to a failure to induce expression of RA regulated genes.

Materials and Methods

Zebrafish Transgenic Strains and Husbandry

Zebrafish were raised in the University of Colorado Zebrafish Core Facility as described (58). An AB/Tupfel long fin line of wild-type zebrafish were used for most experiments, with results reproduced in the Ekkwill line. The tp53zdf1/zdf1 transgenic line was used as a control to rule out effects of apoptosis on phenotypes (38, 59). Visualization of cardiac structures was accomplished utilizing the Tg(szhand2:mCherry; cmlc2:GFP)co10

transgenic line (116). Embryos collected from pair matings were maintained in E3 medium

with 0.00001% (w/v) Methylene Blue to inhibit formation of fungal growth (embryo

media).

In-situ Hybridization Probes

In-situ probes were designed to be specific to kmt2d, kdm6a and kdm6al transcripts.

Probe sequences were PCR amplified from cDNA clones of kmt2d (accession #EV758851),

53

kdm6a (EB935627) and kdm6al (BC129198) obtained from Thermo Fisher (Waltham, MA) using the following PCR primers: kmt2d: TCATCATCATTTCTAGTCGCAGGAT and

TTATTGTAGGAGGAAACAGTGGAGG; kdm6a: CTCGTTCTACATGCATTTTGGTGTA and

AAATCCAAACAGTGTGCTGTCTATG; kdm6al: CGCACTTTATTCCCACTAAACAATG and

TGTAATGGAAGTAAACCATCCCGAG. The amplified inserts were cloned into the pGEM-T

Easy vector (Promega, Madison, WI) and sequence verified for directionality using m13 forward and reverse primers. To synthesize antisense probes, the vector was linearized with SphI (NEB, Ipswich, MA) and transcribed using Sp6 polymerase (Promega, Madison,

WI) and DIG RNA labeling mix (Roche, Basel, Switzerland) according to manufacturer recommendations. Sense RNA probe controls were synthesized in the same manner, with the exception that PstI (NEB) was used to linearize the vector and T7 polymerase

(Promega) was used for transcription. In-situ probes for foxd3, sox9 and sox2 were obtained as hybridization ready probes. Probes for asb2 were transcribed from a cDNA clone (Clone

ID 6962877, GE Dharmacon, Lafayette, CO) as described above.

Whole Mount In-situ Hybridization

Embryos at the required stage were incubated in a solution of 1mg/mL pronase for

10 minutes, then swirled gently to remove chorions. Fixation was carried out in 4% para- formaldehyde (PFA) overnight at 4°C. Embryos were washed 4 x 5 minutes in PBS + 0.1%

Tween-20 (PBST) at room temperature (RT). Twenty-four hour post-fertilization (hpf) embryos were permeabilized by incubation in a solution of 10µg/mL proteinase K in PBT for 8 minutes or 48 hpf embryos for 20 minutes at 37°C. Permeabilized embryos were fixed again in 4% PFA for thirty minutes and washed in PBST 4 x 5 minutes at RT.

Prehybridization was carried out for four hours in hybridization buffer (50% formamide, 5X

SSC, 0.1% Tween-20, heparin 50µg/mL, tRNA 500µg/mL, pH6.0 with citric acid) at 65°C.

Hybridization was carried out with .5-1µg of RNA probe/1mL hybridization buffer

54

overnight at 65°C. Embryos were washed 4 x 20 minutes in wash solution (50% formamide,

2X SSC pH 4.5 and 0.1% SDS) at RT. Embryos were blocked in blocking buffer (MABT

[100mM maleic acid, 150mM NaCl, 0.1% Tween-20, pH 7.5], 2% sheep serum [heat

inactivated at 56°C for 30 minutes] and 2% blocking reagent [Roche]) for 2 hours at RT.

Incubation in Anti-Digoxigenin-AP, Fab fragments (Roche) diluted in blocking buffer at

1:1000 was carried out overnight at 4°C. Post-block washing was performed in MABT 6 x 1 hour with one additional wash overnight at 4°C. The next day, embryos were washed 3 x 5 minutes in NTMT (100mM NaCl, 100mM Tris pH9.5, 50mM MgCl2, 0.1% Tween-20) at RT.

Incubated in BM purple at RT and observed until the color reaction was complete. Washing for storage was performed 2 x 5 minutes in PBT at RT and samples were stored in 4% PFA at 4°C until ready for imaging.

Antisense Morpholino and mRNA Injections

All Morpholinos were acquired from Gene Tools LLC (Philomath, OR). One translation start-site blocking (TSS) antisense MO was developed per gene for: kmt2d

CGCAGTTTGATTTCTGCTCGTCCAT; kdm6a: CCGACACTCCGCACGATTTCATAGA; and kdm6al:

CCACCGACACTCGGCACGGCTTCAT. Two splice-site (SS) blocking MOs were developed for kmt2d, SS1: GGTATAGCAGCAATGACAAACCATT; SS2: GGTCCCTAAAATGAGACAACAGCTC; one each were developed for kdm6a: GTGTGCTGCAAATAGAGGACAACAC; and kdm6al:

TTTCCACAAGCATCTTTACCTTCAC. Morpholinos were mixed with a 10 mg/ml Dextran dye

(Invitrogen, Grand Island, NY)/0.04M KCL solution at a 1:1 ratio. 3nL of MO mixture were injected into the yolk of embryos at the 1-4 cell stage, while 1nL injections were used for mRNA rescue directly into the single-cell zygote using a PLI-100 Pico-Injector from Harvard

Apparatus (Holliston, Massachusetts). Poorly injected embryos were removed by sorting at

24 hpf for even distribution of fluorescent Dextran dye under a dissecting microscope.

Injection of MOs into the tp53zdf1/zdf1 strain was performed to control for off-target effects

55

(36). To ensure maximum specificity, TSS and SS MOs were injected independently to

ensure similar phenotypes were obtained before co-injection with sub-phenotypic

concentrations of TSS and SS Morpholino in combination. A mixture of 3ng of each kmt2d

MO was used to knock down kmt2d. Kdm6a was knocked down with 6ng of the TSS and 3ng

of the SS MO, while the kdm6al injection mixture contained 5ng each of the TSS and SS MOs.

Ten nanograms of standard control MO: CCTCTTACCTCAGTTACAATTTATA were injected as

a control. Co-injection with 4ng of the Danio rerio p53 MO: GCGCCATTGCTTTGCAAGAATTG was used as a control when transgenic lines were used. At no time was more than 3nL injected into any embryo to prevent toxicity (117). Rescue experiments were performed by

injecting 300pg of human KDM6A mRNA directly into the single-cell. The human KDM6A mRNA was derived from a full-length cDNA clone obtained from Open Biosystems

(Lafayette, CO). In-vitro transcription was performed with the mMESSAGE mMACHINE SP6

Kit (Invitrogen, Grand Island, NY).

Phenotyping of Morphants

Alcian blue and alizarin red acid-free staining was performed on 5 days post-

fertilization (dpf) wild-type and morphant embryos (118). Fixed 5 dpf embryos were

washed in 100mM Tris pH 7.5/10mM MgCl2 and transferred to alcian blue stain consisting of 0.02% alcian blue, 200mM MgCl2 and 70% ethanol and rocked overnight at room temperature. The embryos were washed progressively from 50% ethanol and 50% 100mM

Tris pH 7.5 to 100% 100mM Tris (pH 7.5) solutions. Embryos were washed in

25%glycerol/0.1% KOH and added to 0.5% alizarin red stain in water and rocked for 30 min at room temperature. Embryos were destained in 50% glycerol/0.1% KOH and stored in 50% glycerol/0.1% KOH at 4°C. Morphants were scored for loss of craniofacial structures or dysmorphic features, and statistical analysis of phenotypes was carried out using Welch's t-test or the Fisher's exact test in R. Cardiac phenotypes of morphant and wild-type

56

embryos were assessed in Tg(szhand2:mCherry;cmlc2:GFP)co10 transgenic embryos, and the

looping angle was quantified. Dorsal images of the hearts of 48 hpf embryos were taken

with a fluorescent microscope and the angle between the midline axis and the junction of the atria and ventricle was measured (119). To determine brain phenotypes, 48 hpf embryos were embedded head-down in a heated solution consisting of 1.5% Agar and 5%

Sucrose in 0.4M phosphate buffer pH7.4 and cut into blocks. Blocks were placed in a 30% sucrose solution prepared with 0.1M phosphate buffer pH7.4 overnight at 4°C until osmotic equilibrium caused the blocks to sink. The blocks were frozen on dry ice and transverse sections were obtained at 7µm thickness for H&E staining and 15µm thickness for immunofluorescence on a Leica CM 1950 cryostat (Leica Biosystems, Nussloch, Germany).

Hematoxylin and Eosin Staining

Hematoxylin and eosin (H&E) staining was performed. Slides were fixed in 245mL of 95% Ethanol with 5mL PFA for two minutes. Next they were rehydrated by dipping in

95% ethanol for 30 seconds, 80% ethanol for 30 seconds and rinsed for one minute in running tap water. To stain nuclei, slides were dipped in Gill’s Hematoxylin for one minute, rinsed in tap water for one minute, placed in bluing solution for 30 seconds and rinsed in running tap water for one minute. Dehydration was again performed by dipping in 80% ethanol for 30 seconds, 95% Ethanol for 30 seconds and the slides were dipped in eosin Y alcoholic counterstain for one minute. Slides were washed once in 95% ethanol for 15 seconds and washed twice in 100% ethanol for 15 seconds. Slides were cleared by dipping twice in Histo-Clear for 30 seconds and cover slips were mounted using Permount.

Immunofluorescence

Slides were rehydrated in PBS for 30 minutes. 100µL of blocking solution consisting of 5% goat serum and 5% BSA in PBS were applied to slides and covered with Parafilm for

30 minutes. Rabbit anti-SOX2 and mouse anti-HuC primary antibodies were diluted 1:200

57

in blocking buffer and 100µL was applied to slides in a hydration chamber and incubated

overnight. Slides were washed twice in PBS for 15 minutes and Alexa-fluor 488 goat anti-

mouse and Alexa Fluor 594 goat anti-rabbit secondary antibodies were diluted 1:200 in

blocking buffer and 100µL was applied to each slide for one hour. Slides were washed three

times for 15 minutes in PBS and mounted with Vectashield fluorescent mounting medium.

Validation of Splice-site Morpholino Effects

Each of the splice-site blocking Morpholinos was tested for efficacy in interrupting splicing. 9ng of kmt2d SS1, kmt2d SS2, kdm6a SS or kdm6al SS Morpholino were injected into newly fertilized eggs. At 24 hours post-fertilization total RNA was extracted from wild- type or morphant embryos with the RNeasy Mini Kit (Qiagen, Valencia, CA). Coding DNA fragments were synthesized using the GoScript Reverse Transcription System (Promega) according to manufacturer recommendations using the following primers designed to bind one exon upstream and one exon downstream of the Morpholino binding site: kmt2d SS1: AGCTCCTAGCATGTGCTCAGTGT and AAGCAGCTCCTCCTCCATGAAGTT; kmt2d

SS2: GGTACACAGTGATTGTGCCTCAC and AAAGATTCCTCCACACCCTGCTCA; kdm6a:

GACTCATTCATGTCGAGGACAATGTG and AGACGTCATAGCAGCGAACAG; and kdm6al:

GTTCCATGAAGCCGTGCCGAG and GATGAGAGACTCGTAGCAGCGAAC. The cDNA products were run on a 1% agarose gel containing 0.5µg/mL ethidium bromide for 1 hour at 130V and visualized under UV illumination. The most prominent band in the morphant lanes that was not present in the wild-type controls was excised and extracted using the QIAquick gel extraction kit (Qiagen). The extracted products were purified using SPRI beads and sequenced on an ABI PRISM 3730xl using BigDye Terminator v3.1 chemistry by Beckman-

Coulter Genomics (Danvers, Massachusetts). Sequences were aligned to the reference genes using CodonCode Aligner (CodonCode Corporation, Centerville, MA) and the type of splicing defect determined.

58

Imaging

Imaging experiments were performed in the University of Colorado Anschutz

Medical Campus Advance Light Microscopy Core. Full-body imaging was performed on a

Zeiss dissecting scope equipped with the QIClick imaging system (QImaging, Surrey, BC).

Imaging of the heart and brain was performed on a 3I VIVO Upright Spinning Disk confocal from Zeiss (Oberkochen, Germany). Slide images were obtained on a Zeiss Axio Observer with a QImaging Retiga Exi digital color camera. Slidebook Imaging software version 5

(Intelligent Imaging Innovations, Denver, CO) and Volocity 3D Image Analysis Software

(PerkinElmer, Waltham , MA) were utilized for image capture and for Z-stack compression.

Image J version 1.44 (NIH, Bethesda, MD) and Photoshop (Adobe Systems, San Jose, CA) were utilized for image manipulation including brightness, levels, contrast and false- coloring.

Zinc-finger Nucleases

ZFN arrays were designed in ZiFit (120). The insert sequences were synthesized by

IDT (Coralville, IA) and cloned into expression vectors pMLM290 (+) & pMLM292 (-)

(Sequences available in Appendix E). ZFN mRNAs were transcribed using the mMessage mMachine T7 Ultra Kit (Life Technologies, Carlsbad, CA) and ethanol precipitated with LiCl.

2nL of a solution containing 125ng/µL of both the + and – nucleases was injected into single-cell embryos.

CRISPR/Cas9 Nucleases

Guide RNAs for knocking out kmt2d, kdm6a and kdm6al were designed in ZiFit software (120). Oligos were synthesized and annealed to allow direct cloning into BsaI digested pDR274. The primers used for kmt2d exon5 targeting:

TAGGATCCACTTGCAGCCGAGC and AAACGCTCGGCTGCAAGTGGAT; kmt2d exon50:

TAGGACATACTCGTGCTATTGA and AAACTCAATAGCACGAGTATGT; kdm6a exon 17:

59

TAGGGACGGGGCTCGCAGGGAC and AAACGTCCCTGCGAGCCCCGTC; kdm6al exon 17:

TAGGTGGGGTCACCGGGGCGCG and AAACCGCGCCCCGGTGACCCCA. Ligated plasmids were

cloned into XL1-Blue cells and colonies were picked and sequenced to verify correct inserts

were obtained. Guide RNAs were transcribed using the MAXIscript T7 RNA Transcription

Kit (Life Technologies, Carlsbad, CA). The gRNA was phenol/chloroform precipitated. Cas9

transcript was synthesized from the pMLM3613 plasmid using the mMessage mMachine T7

Ultra Kit and phenol/chloroform precipitated. Single cell embryos were injected with 2nL

sortedof solution at 24 containing hpf and non 12.5ng/μL-deformed of embryogRNA ands were 300ng/μL genotyped. of Cas9 mRNA. Embryos were

Incubation of Morphants in Exogenous All-trans Retinoic Acid

A series of dilutions of all-trans retinoic acid from 1µM to 0.01nM were prepared in embryo media. Embryos were injected with morpholinos as previously described, and morphants were incubated in the RA media starting immediately after injection or at 11 hpf to allow for gastrulation to occur. Developmental progress was followed over several days.

RESULTS

Whole Mount In-situ Hybridization of KS Genes

Whole mount in-situ hybridization (WISH) demonstrates that kmt2d, kdm6a and kdm6al are similarly ubiquitously expressed throughout the embryo at 24 hours post- fertilization, with the highest expression observed in the head (Figure 3.1). By 48 hpf, transcript levels are reduced, but are still present within the brain. The spatial and temporal overlap in their expression patterns support the idea that they exist in a complex and in the zebrafish, as has been shown in yeast and humans. Their coexistence also suggests that their functions are coupled as well.

60

Figure 3.1. In-situ hybridization demonstrating kmt2d, kdm6a and kdm6al expression. Lateral and top-down anterior views of 24 hour (left) and 48 hour (right) post- fertilization zebrafish embryos. Expression of kmt2d, kdm6a and kdm6al mRNA transcripts were visualized with BM purple stain.

Morpholino Effects on mRNA Transcripts

The efficacy of splice-site MOs was demonstrated by PCR amplification sequencing of the major mRNA transcripts in morphant embryos. The kmt2d SS1 MO resulted in retention of intron 15-16, the kmt2d SS2, kdm6a SS and kdm6al SS MOs resulted in skipped exons 10, 3 and 2, of their respective transcripts (Figure 3.2). Western blotting of wild-type and morphant embryos was attempted repeatedly to determine the efficacy of TSS MOs.

The only validations of these antibodies were western blots to detect the same antigens.

Even though protein samples were prepared in a denaturing buffer, the antigen was probably obscured from the binding sites due to the extreme size of the proteins.

The only successful western blotting of KMT2D that has been performed required cloning of a FLAG-tag onto the end of the protein, which is untenable in zebrafish.

Analysis of Craniofacial Development in Morphants

To determine the developmental effects of kmt2d, kdm6a and kdm6al knockdown on craniofacial structures, we stained the cartilage of whole-mount 5 dpf wild-type and

61

Figure 3.2. Validation of splice-site morpholino effects. (A) Agarose gel showing results of reverse transcription PCR (RT-PCR) of cDNA obtained from embryos treated with Morpholino (MO) targeting splice-sites (SS) of respective genes. The RT-PCR products from wild-type (WT) uninjected control embryos are also shown. The novel bands observed in morphant embryos (shown using arrows) were excised and sequenced. (B) The results obtained from sequencing the novel bands is demonstrated.

62

morphant embryos with alcian blue and bone with alizarin red. Severe hypoplasia of

structures within the viscerocranium was observed in kmt2d morphants (N=41) compared

to uninjected controls, including complete loss of branchial arches 3-7 (68%), Meckel’s

cartilage (22%), and the ceratohyal (44%) from the cartilaginous structures, while the bony

cleithrum (85%) and opercles (42%) were commonly absent as well (Figure 3.3). In many

of these morphant embryos the structures were present but incompletely formed or clefted,

and in the case of the ceratohyal cartilage joint, inverted in a posterior orientation.

Structures of the neurocranium generally developed normally, although shortening of the

trabeculae and ethmoid plate were observed in the most severely affected morphants.

Rescue experiments using human KMT2D transcript or the zebrafish ortholog were not

possible due to unavailability of full-length cDNA clones, most likely due to the extremely

large size of KMT2D transcript (>19Kb).

We also observed hypoplasia of the viscerocranium in 5 dpf kdm6a morphants

(N=131). This included absent branchial arches (92%), Meckel’s cartilage (18%), ceratohyal

(40%), cleithrum (72%), and opercles (87%) at similar levels to kmt2d morphants (Figure

3.3). Coinjection of kmt2d MO with kdm6a MO was embryonic lethal prior to 4 dpf (data not shown). We co-injected kdm6a MO with in-vitro synthesized human KDM6A transcript

(hKDM6A) (N=30). This led to a partial rescue of the craniofacial phenotype, with a reduced number of embryos missing the branchial arches (13%), Meckel’s cartilage (3%), ceratohyal

(13%), cleithrum (7%), and opercles (30%), demonstrating that knockdown of kdm6a is causing these specific defects and that that the human homolog is sufficient to rescue the morphant phenotype (Figure 3.3). Interestingly, the kdm6al morphant (N=31) did not exhibit craniofacial defects. Co-injection of kdm6a and kdm6al MO did not alter the severity of the phenotype compared to kdm6a morphants alone. Together, these data suggest that

63

Figure 3.3. Craniofacial defects observed in morphant zebrafish embryos. 5 day post fertilization embryos were treated with alcian and alizarin dyes, which stain cartilage blue and bone red. Both ventral and left sagittal views focused on the visceral cranial cartilages at 10X magnification are shown. (A) Un-injected, wild-type control; (B) A typical kmt2d morphant; (C) A typical kdm6a morphant; (D) A kdm6a morphant coinjected with human KDM6A; (E) A typical kdm6al morphant. Bb, basibranchial; Cb, ceratobranchial arches 3-7; Ch, ceratohyal; O, opercle; Eth, ethmoid plate; M, Meckel’s cartilage.

64

both kmt2d and kdm6a are necessary for patterning of the viscerocranium during

embryonic development, and that kdm6al does not function in this role.

Analysis of Neural Crest Cell Specification in Morphants

We suspected that craniofacial defects and certain other KS phenotypes are due to

defects in Neural Crest Cell (NCC) development. An early marker of NCC specification is

foxd3, a forkhead box (121). The effects of kmt2d, kdm6a and kdm6al

knockdown on NCC specification were determined by WISH of foxd3 in 11 hpf embryos

Figure 3.4). This experiment was only performed once due to time constraints.

Interpretation of the results were problematic because staining of the wild-type embryos was not visible. Comparison of the morphant embryos to published images of wild-type expression of foxd3 reveals very similar expression patterns. This indicates that early NCC specification is probably not affected by the knockdown of kmt2d, kdm6a and kdm6al.

Figure 3.4. Expression of foxd3 in 48 hpf wild-type and morphant embryos. The anti- sense probe control for kdm6a morphant is shown at right.

Analysis of Cranial Neural Crest Cell Differentiation in Morphants

Alcian blue staining of kmt2d and kdm6a Morphants demonstrated that these genes are essential for proper development of the cartilaginous structures of the viscerocranium by 5 dpf. The viscerocranial structures are derived from cranial NCCs (CNCCs). It is unclear

65

if the absence of these structures was due to a failure of CNCCs to migrate or if the failure was in differentiation of the CNCCs to chondrocytes. To determine if CNCCs are present in the pharyngeal arches at 48 hpf, WISH was attempted using a sox9 antisense probe. Similar to foxd3, this WISH experiment was only attempted once due to time constraints. The integrity of the embryos was compromised by repeated washing steps, and the majority of embryos were not intact. One wild-type embryo was partially intact demonstrating strong expression of sox9 in the pharyngeal arches (Figure 3.5). This expression was strongly reduced in the kdm6a morphant and reduced in the kdm6al morphant. This experiment must be repeated in order to verify these conclusions, but this experiment suggests that expression of sox9 is downregulated in kdm6a morphants.

Figure 3.5. Expression of sox9 in 48 hpf embryos.

Analysis of Heart Development in Morphants

Structural cardiac defects are common in KS. To determine the role of kmt2d, kdm6a and kdm6al in cardiac development, we knocked these genes down in the zebrafish transgenic line Tg(szhand2:mCherry; cmlc2:gfp)co10, which expresses GFP under the cmlc2 promoter to visualize developing cardiac tissues. Following MO knockdown, we observed

66

morphological defects in cardiac development in kmt2d and kdm6al morphants at 48 hpf, while kdm6a morphants were only mildly affected. Morphants in all three experimental groups exhibited abnormal development of the atria and/or ventricle, which included abnormal bulging of the myocardial wall. This was most pronounced in the kmt2d morphants, while effects in kdm6a and kdm6al morphants were more subtle.

Additionally, we observed that the organization of the morphant hearts at 48 hpf was more linearly arranged along the midline when compared to wild-type embryos, whose hearts exhibited the S-shape characteristic of proper looping morphology at this timepoint. In order to quantify the magnitude of this abnormal cardiac morphology, we measured the extent of cardiac looping involution at 48 hpf. The “looping angle” was determined by measuring the angle of the atrioventricular junction relative to the midsagittal plane (Figure

3.6, inset). The looping angle decreases as cardiac development progresses, and is used as a measure of cardiac maturation (119). At 48 hpf, the average wild-type embryo had an atrioventricular angle of 32° (N=49). We found that progression through looping morphogenesis was significantly lower in kmt2d morphants (71°, N=30 =1.05x10-08) and

kdm6al morphants (54° N=50 =4.17x10-05) compared to wild-type, but ρnot significantly

different in kdm6a morphants ρ(39° N=34 =0.096) (Figure 3.6). The cardiac phenotype

observed in kdm6al morphants was partiallyρ rescued by coinjection with human KDM6A

KDM6A mRNA did not significantly alter themRNA already (41° mildN=31 cardiac ρ=0.001). phenotype Coinjection of kdm6a of human morphants. This suggests that progression of cardiac looping involution in morphants is defective compared to wild-type embryos. To determine if loss of both kdm6a and kdm6al would increase the severity of the looping angle phenotype, embryos were co-injected with MOs targeting both kdm6a and kdm6al. This resulted in an additive impairment of looping progression to levels approaching kmt2d

67

Figure 3.6. Defects in heart looping observed in morphant zebrafish embryos. Images shown of hearts from 48 hours post fertilization zebrafish embryos in transgenic line Tg(szhand2:mCherry; cmlc2:gfp)co10 which expresses green fluorescent protein (GFP) under the cmlc2 promoter were analyzed. The looping angle of the heart in the embryos was measured as shown (inset). The measured looping angles are shown as a bar graph for the wild-type and morphant embryos along with a kdm6al morphant coinjected with human KDM6A mRNA. The number of embryos (n) tested in each category are shown above. Error bars represent the standard error of the mean. The bottom panel shows representative GFP-expressing hearts from each of the respective categories in the graph.

68

morphants (67° N= -8). These data suggest kmt2d and kdm6al have crucial roles in cardiac development,40 ρ=7.07x10 and that kdm6a has at least a minor role.

Analysis of Brain Development in Morphants

In order to determine the effects of the knockdown of kmt2d, kdm6a and kdm6al on brain development, we obtained transverse serial sections of the brains of wild-type and morphant embryos at 48 hpf. Sections were stained with hematoxylin and eosin to observe

cell morphology, and comparable sections through the hypothalamus, optic tectum and midbrain tegmentum were chosen for direct comparison (Figure 3.7). When compared to wild-type embryos, the cross-sectional areas of the brains of morphants were notably reduced. This phenotype was similar in kmt2d, kdm6a and kdm6al morphants, and included a reduced cell layer thickness within the hypothalamus, optic tectum and midbrain tegmentum. Co-injection with hKDM6A mRNA resulted in a partially restored cell layer thickness in both kdm6a and kdm6al morphants. This reduced cell layer thickness was observed to a lesser extent in sections through the hindbrain, including a marginally reduced cell layer observed in the medulla oblongata (Figure 3.8)

A layer of unusually elongated nuclei was observed in cells lining the ventricles of all morphants that were markedly distinct from the typically rounded nuclei present in wild- type embryos. Within the CNS, elongated nuclei are a hallmark of actively dividing neural precursor cells (NPCs), and are not observed in post-differentiated neurons (122). We postulated that an increase in NPCs was responsible for the differences between wild-type and morphant brains. We therefore evaluated expression of the NPC marker sox2 and the post-mitotic neuronal marker huc in an additional series of 48 hpf transverse sectioned embryos (122,123).

In wild-type embryos, a thin layer of sox2-positive NPCs were observed lining the ventricles of the optic tectum and midbrain tegmentum, and in the central portion of the

69

hypothalamus (Figure 3.9). The majority of cells in these structures instead express huc, indicating that they have undergone differentiation into early neurons. In contrast, a much larger population of cells within the brains of kmt2d, kdm6a and kdm6al morphant embryos continue to express sox2 at this timepoint. Only a small number of cells expressed huc in the morphants, indicating that NPCs are defective in their ability to differentiate. The cells with elongated nuclei appear to be stalled in division prior to differentiation, which may explain the reduced brain size. These experiments highlight that kmt2d, kdm6a and kdm6al all have essential roles in normal vertebrate brain development.

Analysis of Neural Progenitor Cell Specification

The expression analysis of sox2 and huc at 48 hpf demonstrated that knockdown of kmt2d, kdm6a and kdm6al cause defects in NPC differentiation. Although this might account for the reduced brain size of morphants, there may also be defects early in NPC specification that contribute to reduced brain size as well. Because sox2 is expressed by specified NPCs, this marker was also utilized for WISH at 14 hpf (124). The results of this WISH experiment were also inconclusive and require repeating. Knockdown of kmt2d, kdm6a and kdm6al appear to result in a minor impairment of NPC specification, as the eye field of the telencephalon is filled with sox2 expressing cells in the wild-type embryo, and is not occupied by these cells in the morphants (Figure 3.10). This may be a spurious conclusion as the morphant embryos are also smaller, indicating a possible developmental delay at this timepoint.

Mutagenesis of kmt2d, kdm6a and kdm6al

Injection of zinc-finger nuclease mRNAs resulted in embryos that were

phenotypically similar to mll2 morphants. It has been reported that severely affected

mutants are usually embryos in which both copies of the gene have been knocked out,

70

Figure 3.7. Defects in brain morphology are observed in morphant zebrafish embryos, and partially rescued by coinjection of human KDM6A mRNA. Hematoxylin and eosin (H&E) stained transverse sections of the zebrafish containing elements of both fore- and midbrain at 48 hours post fertilization. (A) Wild-type control; (B) A typical kmt2d morphant; (C) A typical kdm6a morphant; (D) A typical kdm6a morphant coinjected with human KDM6A mRNA; (E) A typical kdm6al morphant; (F) A typical kdm6al morphant coinjected with human KDM6A mRNA. H, hypothalamus; MT, midbrain tegmentum; OT, optic tectum.

71

Figure 3.8. Knockdown of kmt2d, kdm6a, and kdm6al leads to mild impairments in hindbrain development. Hematoxylin and eosin (H&E) stained transverse sections of the zebrafish hindbrain at 48 hours post fertilization. (A) Wild-type control; (B) A typical kmt2d morphant; (C) A typical kdm6a morphant; (D) A typical kdm6al morphant. MO, medulla oblongata; OC, otic capsule; CD, corda dorsalis.

72

Figure 3.9. Defects in NPC differentiation are observed in morphant zebrafish embryos, and partially rescued by coinjection of human KDM6A mRNA. Immunostaining of sox2 (green) and huc (red) in transverse sections of the zebrafish containing elements of both fore- and midbrain at 48 hours post fertilization. (A) Wild-type control; (B) A typical kmt2d morphant; (C) A typical kdm6a morphant; (D) A typical kdm6a morphant coinjected with human KDM6A mRNA; (E) A typical kdm6al morphant; (F) A typical kdm6al morphant coinjected with human KDM6A mRNA.

73

Figure 3.10. Expression of sox2 at 14 hpf in wild-type and morphant embryos. The eye field of the telencephalon is indicated by a black arrow. suggesting that the ZFN may have worked as intended. DNA was extracted from single embryos and the regions surrounding the specified cut site were amplified by PCR.

Sequencing resulted in clean sequences with no indication that mutations were taking place.

Attempts to optimize the protocol were unsuccessful.

After the failure of zinc-finger nucleases to induce mutations, the CRISPR/Cas9 system was attempted as well. Single-cell injections of Cas9 mRNA and the gRNA constructs again lead to a phenotype similar to kmt2d knockdown, although mutations again could not be identified. These experiments were ultimately shelved due to time constraints.

Effects of Exogenous Retinoic Acid on Morphant Development

Incubation of morphants in all-trans retinoic acid caused an amplification of the gross morphological defects that were observed in a small number of morphants by 72 hpf.

These defects consisted of prominent cardiac edema, posterior truncation with an upward arching of the spine and reduced or even absent eyes (Figure 3.11). These defects were observed in all three morphants at all concentrations tested. At the highest RA

74

concentration of 1µM, the morphant embryos were severely swollen and malformed, while

the uninjected embryos had obvious developmental defects. At 0.1nM and 0.01nM RA

concentrations, no apparent effects were observed in the uninjected embryos by 72 hpf,

while the morphants were still severely malformed compared to the morphants incubated

in RA-free embryo medium. These defects were not photographed because it was thought the experiments had failed and that further optimization of the protocol would need to be performed.

Figure 3.11. A typical kdm6al morphant. The kdm6al knockdown demonstrates the posterior developmental defect and arched spine that were exacerbated by incubation in all-trans retinoic acid.

Discussion

Roles of KS Genes in Development

We utilized Morpholino-based knockdown of kmt2d, kdm6a and kdm6al in zebrafish to better understand their roles in the development of structures relevant to KS. The kdm6a and kdm6al genes have been knocked down in zebrafish previously, yet the only reported phenotype was a reduced posterior body length and abnormal curvature of the spine, with a more pronounced effect in the kdm6al morphants (59). While this phenotype was observed in our morphants, we focused our analysis on three abnormalities associated with KS.

75

Knockdown of kmt2d and kdm6a lead to nearly identical craniofacial defects at 48 hpf. These defects included hypoplasia of the pharyngeal arches and the ceratohyal and dysplasia of the Meckel’s cartilage. Rescue of the kdm6a morphant was robust, demonstrating that the morphant phenotype was caused by specific knockdown of kdm6a and not off-target effects.

Atrial and ventricular septal defects and coarctation of the aorta are the most common heart defects associated with KS. In this study we have demonstrated a requirement for both kmt2d and kdm6al in cardiac looping morphogenesis, with a smaller contribution from kdm6a. Distinct roles were observed for kdm6a in craniofacial development, and kdm6al in cardiac development. This suggests that these two roles have diverged in the teleost lineage. There does not seem to be a divergence in their roles in neuronal development, however. Haploinsufficiency of kmt2d, kdm6a or kdm6al is sufficient to induce a reduction in brain size at 48 hpf. This reduced brain size may be attributed in whole or in part to a defect in the ability of NPCs to undergo differentiation into mature neurons. The reduced brain volume in zebrafish is similar to clinical microcephaly that is reported in as many as 25% of KS subjects (30). Although there was an accumulation of neural precursors compared to wild-type embryos, the overall brain size of morphants was considerably reduced. The defects in both brain size and in NPC differentiation were partially rescued by injection with human KDM6A mRNA. The capacity of the human KDM6A mRNA to rescue defects caused by knocking down the zebrafish kdm6a or kdm6al transcripts demonstrates a high level of functional conservation between the human and zebrafish proteins. Our data provide the first direct evidence of the overlapping, functional roles of KMT2D and KDM6A in the development of tissues and organs affected in KS subjects. Although it would have strengthened our conclusions of the roles of KMT2D and KDM6A in development, stable knockout mutations by zinc-finger

76

nucleases or by CRISPR/Cas9 mutagenesis were not obtained. Both of these methods are

reported to be robust and yield mutations in a high percentage of cells, yet mutations in the genes that I targeted were unable to be induced. The CRISPR/Cas9 protocol was eventually

optimized to successfully induce mutations in the lab after many months of additional

experiments, but mutagenesis of kmt2d, kdm6a and kdm6al was not pursued by this time.

These optimizations included an improved zebrafish codon optimized Cas9 construct and a

drastic increase in Cas9 mRNA injected.

Malformations in KS Implicate Neural Crest Cell Derivatives

Only two structures of the heart are commonly affected in KS, the ventricular and atrial septa of the heart and the cardiac outflow tracts, which include the aorta. These are also the only structural aspects of the heart with a contribution from NCCs (125). NCCs are a transient population of pluripotent cells that develop at the periphery of the neural plate border during gastrulation (126). There are four subpopulations of NCCs that are delineated both by position and cell fate. Cranial NCCs migrate dorsolaterally to form the craniofacial mesenchyme that develops into the cartilaginous structures of the midface, cranial neurons, glia and other connective tissues (127). Cardiac NCCs contribute to a number of derivatives, most importantly the outflow tracts and septa of the heart (125).

Vagal NCCs innervate the enteric system (128). Trunk NCCs migrate dorsolaterally to become melanocytes, dorsal root ganglia, sympathetic ganglia, nerve clusters of the aorta and the adrenal medulla (128). Ablation experiments have shown that partial removal of cardiac NCCs prior to migration results in ventricular septal defects and aortic malformations (125). The craniofacial phenotype of KS may be caused by defects in NCC development as well, because the midfacial skeleton is derived almost entirely from NCCs

(127). The distinctive craniofacial abnormalities associated with KS including midfacial hypoplasia, cleft lip and palate and abnormal dentition may be explained by defects in NCC

77

development. Although NCC specification appears to occur normally in morphants, specific

NCC populations are affected by knockdown of kmt2d, kdm6a and kdm6al. There appears to be a reduced number of differentiated cranial NCCs at 48 hpf. The craniofacial defects are similar to those described in genes important for cranial NCC development, such as PRDM family members (129). The role of cardiac NCCs was not explored in the heart, although cardiac development was defective.

Defects in Neuronal Development

The brain abnormalities associated with KS probably have a different pathogenic mechanism, although a role for NCCs in patterning of the fore and hindbrain has been suggested (127). KS gene morphants demonstrated a defect in neuronal differentiation, which should affect both the number of neurons and their potential connectivity. The different structural brain abnormalities described in KS do not indicate that any specific neuronal developmental program is regulated by the KS genes, but this may or may not be the case. Stochastic differences in the neuronal programs of different cell types may lead to different phenotypes with the same general mechanism. While aberrant neuronal differentiation probably plays a role, migration may be affected as well. The brain malformations detected in KS subjects may be informative in understanding the pathogenic mechanisms that cause ID in these individuals. The reports of polymicrogyria in KS subjects suggest that late waves of neuronal migration are affected during cortical development

(130).

Effects of Exogenous Retinoic Acid on Morphant Development

Retinoic acid signaling may be another role of the KS associated genes that

contributes to the pathogenesis of the disorder. Association with retinoic acid receptors

(RARs) differentiates the KMT2C/D complex from the other KMT2 family members. The

KMT2D complex has been shown to be required for activation of retinoic acid induced

78

transcription of Hox genes and the RA responsive gene ASB2 (50,131). This has many potential developmental consequences because regulation of transcription by retinoic acid has been implicated in many developmental processes. Retinoic acid has been shown to be important for both the specification and maintenance of NCCs (132). A gradient of RA patterns the hindbrain during development (133). Higher levels of RA have a posteriorizing effect on the brain (134). Segments of the brain that are exposed to unusually high RA take on the identity of more posterior brain regions.

I originally postulated that haploinsufficiency of KMT2D or KDM6A decreases the ability of progenitor cells to respond to retinoic acid signaling due to a decreased

concentration of competent KMT2D complex. If this were the case, application of exogenous all-trans RA should increase the concentration of internalized RAR and shift the binding equilibrium of the KMT2D complex toward the RAR-bound state, potentially rescuing the morphant phenotype by restoring transcription of retinoic acid responsive genes. This was nearly the exact opposite of the effect observed. Increasing amounts of all- trans RA in morphants actually made the embryo more susceptible to developmental defects. This was most evident by the presence of malformations in morphants at concentrations of RA that had no effect on uninjected embryos. One reason for this may be that the KMT2D complex regulates genes independently of RA, and sequestering most of the complex to a RAR-bound state leaves too little of the complex to respond to other signaling molecules.

79

CHAPTER IV

CONCLUSIONS

The Etiology of Kabuki Syndrome

The Genetic Basis of Kabuki Syndrome

Kabuki syndrome is a complex disorder. It has a high variability in phenotypic presence and severity. Even in cases of identical twins and parent to child inheritance where the mutation is identical, there is often a vastly different presentation. This suggests that the developmental processes regulated by the KS genes are very tightly regulated, and even small differences in gene expression within a few progenitor cells can mean the difference between the presence and absence of a developmental defect.

There is little doubt that mutations in KMT2D are the major cause of KS. The extreme size of KMT2D may make it more susceptible to chance mutations compared to

KDM6A, which is only a third of its size. The fact that the mutations reported consist of only point mutations and small indels indicates that there are not any recurrent mutations or genomic rearrangements that affect KMT2D. On the other hand, we have detected two deletions of KDM6A with one common breakpoint. Taken with the additional reported microdeletions and complex rearrangements of chromosome X associated with KS in the literature, it appears that the genomic sequence near KDM6A is more susceptible to recurrent chromosomal abnormalities. The majority of mutations associated with KS are loss-of-function mutations, suggesting that haploinsufficiency is the major cause of KS.

Missense mutations have been demonstrated for the disorder as well, although the mechanism of their pathogenesis has not been explored.

In females, KDM6A is expressed from both alleles because it is one of the few genes to escape X-inactivation, although it has been demonstrated that expression is higher from the active chromosome (135). Interestingly, in both subjects with a KDM6A deletion,

80

Lederer et al. demonstrated a skewed X-inactivation pattern in which the chromosome containing the deleted allele was overwhelmingly inactivated (35). This could be a compensatory mechanism by which the majority of the cells are able to recognize and inactivate a defective chromosome. Conversely, it may be a matter of natural selection in

which the cells containing the normal gene in the active chromosome have a selective

advantage and are able to outcompete the cells with lower KDM6A expression. Perhaps an

X-inactivation pattern skewed in the other direction is embryonic lethal.

The finding that Kdm6c is able to partially compensate for the loss of Kdm6a in male

mice is interesting. Kdm6a knockout male mice have 25% viability, and survivors were much smaller than wild-type individuals, a hallmark of KS (136). A dual knockout of KDM6A and KDM6C had 100% lethality. The authors of this report were unable to demonstrate demethylase activity for KDM6C and assumed it to be catalytically dead. A CoIP of HA- tagged Kdm6a in mice revealed binding to RBBP5, suggesting that KDM6C may take the place of KDM6A in the KMT2D complex. The authors interpreted this to mean that there is a demethylase independent role for KDM6A/C in the KMT2D complex and that demethylation is dispensable for viability. However, a recent report has demonstrated that KDM6C has demethylase activity in vitro, although it is notably lower than KDM6A due to reduced substrate binding (54). The author’s interpretation may have been correct, but it is also possible that the molecular environment inside the cell increases substrate binding by recruitment of the KMT2D complex to histones, thereby bringing the H3K27me3 substrate into close contact with the KDM6C . Although no mutations in KDM6C have been associated with KS, it may be because the expression of KDM6A from the single fully active allele is sufficient to allow normal development and KDM6C mutations would have a very mild or no phenotype.

81

The mutations detected in KMT2C are very interesting. Mutations in another KMT2 family member, KMT2A, cause a disorder with some phenotypic overlap with KS,

Wiedermann-Steiner syndrome (137). KMT2C mutations would be expected to have much more phenotypic overlap with KMT2D because it is a closer relative, even sharing the same protein complex constituents. While our initial sequence analysis indicated that 4/5 exome sequencing subjects contained mutations in KMT2C, only two subjects had variants that could be validated. The two potentially damaging missense mutations are not enough evidence to validate that KMT2C mutations are part of the etiology of KS, but they are provocative because they were validated in 40% of the subjects sequenced. Further experimentation is necessary to increase our confidence in this candidate gene. The same validation steps that were taken to demonstrate KDM6A as a candidate gene would work well for KMT2C. The rest of the KMT2D and KDM6A mutation negative subjects in our KS cohort should be sequenced for KMT2C mutations. Traditional Sanger sequencing would be appropriate for the 40 exons that do not lie within a segdup. The 88kb segdup could be subcloned into a BAC, removing the contribution of the other duplicated segments and allowing traditional PCR amplification and Sanger sequencing to be performed. Analysis of the effects of KMT2C knockdown on zebrafish development would be performed in the same way as the kmt2d, kdm6a and kdm6al knockdowns.

Developmental Roles of KS Genes

The most recognizable aspect of KS is the distinctive face. Structural abnormalities of the face may include cleft lip and palate, midfacial hypoplasia and hypodontia. All of these structures are derived from the cranial NCC, and defects in cranial NCC development probably contribute to the facial phenotype of KS. Similarly, the heart defects that are most often described in KS affect two very specific structures of the heart, the outflow tracts and the septa separating the heart chambers. These structures are the only components of the

82

heart with a NCC contribution. With this clinical evidence it is interesting that KS is not recognized as a neurocristopathy, a disorder originating from defects in neural crest development (138). The phenotypically similar CHARGE syndrome has been suggested as a

neurocristopathy, which leads one to wonder why the more common KS has not (139).

We have performed some early experiments to determine the exact defects in NCC

development that are caused by haploinsufficiency of kmt2d and kdm6a with ambiguous

results. The expression analysis of foxd3 at 14 hpf needs to be repeated to verify that early

specification of NCCs is not affected.

Future Directions

Although this research revealed many roles for the genes that cause KS in

developmental processes, there is still much to be elucidated. The craniofacial and cardiac

defects observed were merely the end result of abnormalities in NCC development.

Determining the mechanism of the NCC defect will be critical to our understanding of the

regulatory systems that control NCC development. There are also aspects of neuronal

development that should be explored further as well. While the defect that was discovered

in NPC differentiation is undoubtedly critical to the development of the neurological

abnormalities associated with KS, there may be other aspects of neurodevelopment that are

affected as well.

Mechanisms of Neural Crest Cell Defects Caused by KS Gene Knockdown

Development of the craniofacial skeleton and septation and outflow tract formation in the heart are dependent on NCC derivatives. The presence of defects in KS that exactly correspond to these NCC derived structures allows us to conclude NCC involvement, but the mechanism of these defects was not elucidated in our research. A number of experiments are still required to determine the exact cellular populations and the timing of NCC developmental defects that lead to the KS phenotype in humans. The craniofacial defects

83

that occur when kmt2d, kdm6a and kdm6al are knocked down include hypoplasia of the viscerocranium structures. There are similar defects exhibited by barx1 and prdm1a knockdowns that may direct future experiments. In both prdm1a and barx1 morphants, the pharyngeal pouches 3-7 that give rise to the ceratobranchial arches are incompletely developed, due to reduced proliferation of post-migratory cranial NCCs (140,141).

Similarly, ablation of a portion of the cardiac NCCs prior to migration leads to cardiac defects that are nearly identical to those observed in KS subjects (142). The conclusion from this experiment was that it is the quantity of NCCs that reach the heart that are important, not the quality. Reduced numbers of cardiac NCCs might be caused by defects in cardiac NCC proliferation, migration or cell death.

To determine if the defects in NCC development caused by KS gene knockdown are mechanistically similar to barx1 and prdm1a knockdowns, it will first be necessary to rule out earlier effects on NCC development. Analysis of foxd3 expression at 12 hpf suggests that

NCC specification is not affected. The next step is to determine if migration of NCCs proceeds normally. This can be achieved by live imaging of Tg(:GFP) embryos to compare the numbers and migratory routes of GFP positive NCC cells. If migration appears normal, then proliferation can be considered. Actively proliferating cells can be labeled using a fluorescently tagged phosphohistone H3 antibody, and the numbers of proliferating cells can be quantified at the location of the pharyngeal pouches (140). Increased apoptosis must also be considered, which is detected by TUNEL or caspase 3 antibody staining.

Assays to determine the effects of kmt2d and kdm6a knockdown on NCC proliferation, cell death and migration should also be performed. If these steps in NCC development are normal, the WISH analysis of sox9 expression in 48 hpf morphants also needs to be repeated to determine if NCCs are failing to undergo differentiation. We may find that certain populations of NCCs require KMT2D and KDM6A for differentiation as well.

84

Effects of KS Gene Knockout on Neuronal Outgrowth and Pruning

The sox9 and huc labeling experiments have shown that knockdown of kmt2d, kdm6a and kdm6al cause defects in NPC differentiation. While this may account for the neurological defects associated with KS, it is possible that other aspects of neurogenesis are affected as well. The presence of polymicrogyria (PMG) in KS subjects suggests that neuronal migration may be affected, because PMG is thought to be caused by abnormalities in the late stages of neuronal migration that occur during fetal brain development

(9,11,130). Some of the genes most highly regulated by KMT2D may contribute to migration defects by affecting extracellular matrix (ECM) remodeling and cell adhesion. During neuronal migration, movement is facilitated by the coordinate buildup of ECM at the leading process and breakdown at the trailing edge of the cell. Cell adhesions anchor the cell to their surroundings, allowing the transduction of mechanical force through microtubules and actin microfilaments to move the cell body in a unidirectional manner (130).

A number of genes regulated by KMT2D that may affect neuronal migration include:

LOXL1 (lysyl oxidase-like 1), important for extracellular matrix remodeling; CSPG4

(chondroitin-sulfate proteoglycan 4), with roles in cell polarization and motility; GPR56 (G protein-coupled receptor 56), with roles in cellular adhesion, intercellular signaling and neuronal development; LAMB3 (laminin beta-3 chain), a component of extracellular matrix involved in cell adhesion; VTN (vitronectin), a cell adhesion molecule and PCDH7 (BH- protocadherin), a cell adhesion molecule and regulator of neuronal connectivity (49). The most effective way to study the effects of KS genes on neuronal migration would be to selectively knock out KMT2D and KDM6A in the mouse brain using Cre-Lox recombination

(143). The Cre-recombinase would be expressed under control of the HuC promoter so that only neurons would be affected, and to allow normal differentiation to occur (144).

These neurons would be subjected to a number of experiments to assess their ability to

85

undergo axonal growth and migration, responsiveness to both chemoattractant and

chemorepulsive signals and their ability to prune projections that are no longer required.

Embryos would be sacrificed at postnatal day 0 (P0) and brains would be sagittal sections

made at 50µM. DAPI staining is used to delineate the six layers of the cerebral cortex and would be used to assess cortical organization. Many of the cortical layers have distinct markers such as Ctip2 (Layers II-III) and Foxp2 (Layer VI) that can be used to further describe differences in the populations and organization of neurons (145,146). Cell birth- dating by transient BrdU incorporation can be used at multiple timepoints to label specific neurons for determining the extent of migration (146). Both wild-type and mutant cortical pyramidal neurons would be isolated and cultured, then exposed to chemoattractants such as HGF and chemorepellants such as thrombin to determine if growth cone extension and guidance requires expression of KS genes (147). Changes in the extent of dendrite arborization would be assessed concurrently (148).

Significance

This study has added to our understanding of the genetic landscape and developmental roles of the genes that cause Kabuki syndrome. We have increased the number of KMT2D and KDM6A mutations identified in KS. The mutation spectrum of the disorder has been explored in our cohort and compared to previous findings. Our data supports the implication suggested by Banka et al. that subjects that best fit the stereotypical KS phenotype are most likely to have mutations in KMT2D (33). A new candidate gene has been identified in subjects diagnosed with KS that fits well with our understanding of the functions of the previously discovered KS genes. Our results suggest that the KMT2C gene should be sequenced in those subjects that do not have mutations in

KMT2D or KDM6A. Although heterozygous Kdm6a knockout mice have been created, and kdm6a and kdm6al morphants have been analyzed in zebrafish, there is still a dearth of

86

evidence of the roles of KMT2D and KDM6A in development (59,88,136). We have

demonstrated that both kmt2d and the co-paralogs kdm6a and kdm6al regulate development of the structures that make up the viscerocranium, heart and brain in the zebrafish, a common model organism that shares many developmental processes with humans. The similarity of these phenotypes support the notion that KMT2D and KDM6A both contribute to the same processes as part of a complex, which requires the presence of both proteins to function properly.

87

REFERENCES

1. Niikawa N, Matsuura N, Fukushima Y, Ohsawa T, Kajii T. Kabuki make-up syndrome: a syndrome of mental retardation, unusual facies, large and protruding ears, and postnatal growth deficiency. J Pediatr. 1981 Oct;99(4):565–9.

2. Kuroki Y, Suzuki Y, Chyo H, Hata A, Matsui I. A new malformation syndrome of long palpebral fissures, large ears, depressed nasal tip, and skeletal anomalies associated with postnatal dwarfism and mental retardation. J Pediatr. 1981 Oct;99(4):570–3.

3. Niikawa N, Kuroki Y, Kajii T, Matsuura N, Ishikiriyama S, Tonoki H, et al. Kabuki make- up (Niikawa-Kuroki) syndrome: a study of 62 patients. Am J Med Genet. 1988 Nov;31(3):565–89.

4. Adam M, Hudgins L. Kabuki syndrome: a review. Clinical Genetics. 2005;67(3):209–19.

5. Ming JE, Russell KL, Bason L, McDonald-McGinn DM, Zackai EH. Coloboma and other ophthalmologic anomalies in Kabuki syndrome: Distinction from charge association. American Journal of Medical Genetics. 2003 Dec 15;123A(3):249–52.

6. Mhanni AA, Cross HG, Chudley AE. Kabuki syndrome: description of dental findings in 8 patients. Clin Genet. 1999 Aug;56(2):154–7.

7. Matsune K, Shimizu T, Tohma T, Asada Y, Ohashi H, Maeda T. Craniofacial and dental characteristics of Kabuki syndrome. American journal of medical genetics. 2001;98(2):185–90.

8. Yano S, Matsuishi T, Yoshino M, Kato H, Kojima K. Cerebellar and brainstem “atrophy” in a patient with Kabuki make-up syndrome. Am J Med Genet. 1997 Sep 5;71(4):486–7.

9. Di Gennaro G, Condoluci C, Casali C, Ciccarelli O, Albertini G. Epilepsy and polymicrogyria in Kabuki make-up (Niikawa-Kuroki) syndrome. Pediatr Neurol. 1999 Aug;21(2):566–8.

10. Ciprero KL, Clayton-Smith J, Donnai D, Zimmerman RA, Zackai EH, Ming JE. Symptomatic Chiari I malformation in Kabuki syndrome. American Journal of Medical Genetics Part A. 2005 Jan 30;132A(3):273–5.

11. Takano T, Matsuwake K, Yoshioka S, Takeuchi Y. Congenital polymicrogyria including the perisylvian region in early childhood. Congenit Anom (Kyoto). 2010 Mar;50(1):64– 7.

12. Kawame H, Hannibal MC, Hudgins L, Pagon RA. Phenotypic spectrum and management issues in Kabuki syndrome. J Pediatr. 1999 Apr;134(4):480–5.

13. Philip N, Meinecke P, David A, Dean J, Ayme S, Clark R, et al. Kabuki make-up (Niikawa- Kuroki) syndrome: a study of 16 non-Japanese cases. Clin Dysmorphol. 1992 Apr;1(2):63–77.

14. Hughes HE, Davies SJ. Coarctation of the aorta in Kabuki syndrome. Arch Dis Child. 1994 Jun;70(6):512–4.

88

15. Armstrong L, Moneim AAE, Aleck K, Aughton DJ, Baumann C, Braddock SR, et al. Further delineation of Kabuki syndrome in 48 well-defined new individuals. American Journal of Medical Genetics Part A. 2005 Jan 30;132A(3):265–72.

16. Lin J-L, Lee W-I, Huang J-L, Chen PK-T, Chan K-C, Lo L-J, et al. Immunologic Assessment and KMT2D mutation detection in Kabuki Syndrome: Immunologic and genetic KMT2D analysis in Kabuki syndrome. Clinical Genetics. 2014 Aug.

17. Ming JE, Russell KL, McDonald-McGinn DM, Zackai EH. Autoimmune disorders in Kabuki syndrome. American Journal of Medical Genetics Part A. 2005 Jan 30;132A(3):260–2.

18. Milunsky JM, Huang XL. Unmasking Kabuki syndrome: chromosome 8p22-8p23.1 duplication revealed by comparative genomic hybridization and BAC-FISH. Clin Genet. 2003 Dec;64(6):509–16.

19. Milunsky JM, Maher TA, Zhao G, Huang X-L, Wang Z, Zou Y. A re-examination of the chromosome 8p22-8p23.1 region in Kabuki syndrome. Clin Genet. 2008 May;73(5):502–3.

20. Jardine PE, Burvill-Holmes LC, Schutt WH, Lunt PW. Partial 6q monosomy/partial 12q trisomy in a child with features of Kabuki make-up syndrome. Clin Dysmorphol. 1993 Jul;2(3):269–73.

21. Fryns JP, Van den Berghe H, Schrander-Stumpel C. Kabuki (Niikawa-Kuroki) syndrome and paracentric inversion of the short arm of chromosome 4. Am J Med Genet. 1994 Nov 1;53(2):204–5.

22. Galán-Gómez E, Cardesa-García JJ, Campo-Sampedro FM, Salamanca-Maesso C, Martínez-Frías ML, Frías JL. Kabuki make-up (Niikawa-Kuroki) syndrome in five Spanish children. Am J Med Genet. 1995 Nov 20;59(3):276–82.

23. Lynch SA, Ashcroft KA, Zwolinski S, Clarke C, Burn J. Kabuki syndrome-like features in monozygotic twin boys with a pseudodicentric chromosome 13. J Med Genet. 1995 Mar;32(3):227–30.

24. Lo IF, Cheung LY, Ng AY, Lam ST. Interstitial Dup(1p) with findings of Kabuki make-up syndrome. Am J Med Genet. 1998 Jun 16;78(1):55–7.

25. Digilio MC, Marino B, Toscano A, Giannotti A, Dallapiccola B. Congenital heart defects in Kabuki syndrome. Am J Med Genet. 2001 May 15;100(4):269–74.

26. Chen C-P, Lin S-P, Tsai F-J, Chern S-R, Wang W. Kabuki syndrome in a girl with mosaic 45,X/47,XXX and aortic coarctation. Fertil Steril. 2008 Jun;89(6):1826.e5–7.

27. Su P-H, Kuo P-L, Chen S-J, Chen J-Y, Yu J-S, Liu Y-L, et al. Kabuki make-up (Niikawa- Kuroki) syndrome with mosaicism ring chromosome X and incomplete XIST gene expression. Acta Paediatr Taiwan. 2007 Feb;48(1):28–31.

89

28. Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve HI, et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet. 2010 print;42(9):790–3.

29. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, et al. Exome sequencing identifies the cause of a mendelian disorder. Nature Genetics. 2010 Jan;42(1):30–5.

30. Matsumoto N, Niikawa N. Kabuki make-up syndrome: A review. American Journal of Medical Genetics. 2003 Feb 15;117C(1):57–65.

31. Hannibal MC, Buckingham KJ, Ng SB, Ming JE, Beck AE, McMillin MJ, et al. Spectrum of MLL2 (ALR) mutations in 110 cases of Kabuki syndrome. American Journal of Medical Genetics Part A [Internet]. 2011 [cited 2012 Jun 26]; Available from: http://onlinelibrary.wiley.com/doi/10.1002/ajmg.a.34074/full

32. Micale L, Augello B, Fusco C, Selicorni A, Loviglio MN, Silengo M, et al. Mutation spectrum of MLL2 in a cohort of kabuki syndrome patients. Orphanet Journal of Rare Diseases. 2011;6(1):38.

33. Banka S, Veeramachaneni R, Reardon W, Howard E, Bunstone S, Ragge N, et al. How genetically heterogeneous is Kabuki syndrome?: MLL2 testing in 116 patients, review and analyses of mutation and phenotypic spectrum. European Journal of Human Genetics. 2011 Nov 30;20(4):381–8.

34. Paulussen ADC, Stegmann APA, Blok MJ, Tserpelis D, Posma-Velter C, Detisch Y, et al. MLL2 mutation spectrum in 45 patients with Kabuki syndrome. Human Mutation. 2011 Feb;32(2):E2018–25.

35. Lederer D, Grisart B, Digilio MC, Benoit V, Crespin M, Ghariani SC, et al. Deletion of KDM6A, a Histone Demethylase Interacting with MLL2, in Three Patients with Kabuki Syndrome. The American Journal of Human Genetics. 2012 Jan;90(1):119–24.

36. Banka S, Lederer D, Benoit V, Jenkins E, Howard E, Bunstone S, et al. Novel KDM6A (UTX) mutations and a clinical and molecular review of the X-linked Kabuki syndrome (KS2): Novel KDM6A mutations and review of X-linked KS. Clinical Genetics. 2014 Feb.

37. Miyake N, Mizuno S, Okamoto N, Ohashi H, Shiina M, Ogata K, et al. KDM6A Point Mutations Cause Kabuki Syndrome. Human Mutation. 2013 Jan;34(1):108–10.

38. Miyake N, Koshimizu E, Okamoto N, Mizuno S, Ogata T, Nagai T, et al. MLL2 and KDM6A mutations in patients with Kabuki syndrome. American Journal of Medical Genetics Part A. 2013 Jul.

39. Epstein DJ. Cis-regulatory mutations in human disease. Briefings in Functional Genomics and Proteomics. 2009 Jul 1;8(4):310–6.

40. Martin C, Zhang Y. The diverse functions of histone lysine methylation. Nature Reviews Molecular Cell Biology. 2005 Nov;6(11):838–49.

90

41. Herz H-M, Garruss A, Shilatifard A. SET for life: biochemical activities and biological functions of SET domain-containing proteins. Trends in Biochemical Sciences. 2013 Dec;38(12):621–39.

42. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2014. Nucleic Acids Research. 2014 Jan 1;42(D1):D749–55.

43. Qian C, Zhou M-M. SET domain protein lysine methyltransferases: Structure, specificity and catalysis. Cellular and Molecular Life Sciences. 2006 Sep 29;63(23):2755–63.

44. Musselman CA, Lalonde M-E, Côté J, Kutateladze TG. Perceiving the epigenetic landscape through histone readers. Nature Structural & Molecular Biology. 2012 Dec 5;19(12):1218–27.

45. Hublitz P, Albert M, Hfmpeters A, Hublitz P, Albert M, Peters AHFM. Mechanisms of transcriptional repression by histone lysine methylation. The International Journal of Developmental Biology. 2009;53(2-3):335–54.

46. Dhar SS, Lee S-H, Kan P-Y, Voigt P, Ma L, Shi X, et al. Trans-tail regulation of MLL4- catalyzed H3K4 methylation by H4R3 symmetric dimethylation is mediated by a tandem PHD of MLL4. Genes & Development. 2012 Dec 15;26(24):2749–62.

47. Xu J, Andreassi M. Reversible histone methylation regulates brain gene expression and behavior. Hormones and behavior. 2011;59(3):383–92.

48. Mo R. Identification of the MLL2 Complex as a Coactivator for . Journal of Biological Chemistry. 2006 Apr 11;281(23):15714–20.

49. Issaeva I, Zonis Y, Rozovskaia T, Orlovsky K, Croce CM, Nakamura T, et al. Knockdown of ALR (MLL2) Reveals ALR Target Genes and Leads to Alterations in Cell Adhesion and Growth. Mol Cell Biol. 2007 Mar 1;27(5):1889–903.

50. Guo C, Chang C-C, Wortham M, Chen LH, Kernagis DN, Qin X, et al. Global identification of MLL2-targeted loci reveals MLL2’s role in diverse signaling pathways. Proceedings of the National Academy of Sciences [Internet]. 2012 Oct 8 [cited 2012 Oct 10]; Available from: http://www.pnas.org/cgi/doi/10.1073/pnas.1208807109

51. Wan X, Liu L, Ding X, Zhou P, Yuan X, Zhou Z, et al. Mll2 Controls Cardiac Lineage Differentiation of Mouse Embryonic Stem Cells by Promoting H3K4me3 Deposition at Cardiac-Specific Genes. Stem Cell Reviews and Reports [Internet]. 2014 Jun 10 [cited 2014 Jun 19]; Available from: http://link.springer.com/10.1007/s12015-014-9527-y

52. Ansari KI, Hussain I, Shrestha B, Kasiri S, Mandal SS. HOXC6 Is Transcriptionally Regulated via Coordination of MLL Histone Methylase and Estrogen Receptor in an Estrogen Environment. Journal of Molecular Biology. 2011 Aug;411(2):334–49.

53. Hong S, Cho Y-W, Yu L-R, Yu H, Veenstra TD, Ge K. Identification of JmjC domain- containing UTX and JMJD3 as histone H3 lysine 27 demethylases. Proc Natl Acad Sci USA. 2007 Nov 20;104(47):18439–44.

91

54. Walport LJ, Hopkinson RJ, Vollmar M, Madden SK, Gileadi C, Oppermann U, et al. Human UTY(KDM6C) is a Male-Specific N -Methyl Lysyl-Demethylase. Journal of Biological Chemistry [Internet]. 2014 May 5 [cited 2014 Jun 24]; Available from: http://www.jbc.org/cgi/doi/10.1074/jbc.M114.555052

55. Ginalski K, Rychlewski L, Baker D, Grishin NV. Protein structure prediction for the male-specific region of the human Y chromosome. Proceedings of the National Academy of Sciences. 2004 Feb 24;101(8):2305–10.

56. Van der Meulen J, Speleman F, Van Vlierberghe P. The H3K27me3 demethylase UTX in normal development and disease. Epigenetics. 2014 Feb 21;9(5).

57. Tie F, Banerjee R, Conrad PA, Scacheri PC, Harte PJ. Histone Demethylase UTX and Chromatin Remodeler BRM Bind Directly to CBP and Modulate Acetylation of Histone H3 Lysine 27. Molecular and Cellular Biology. 2012 Jun 15;32(12):2323–34.

58. Petruk S, Black KL, Kovermann SK, Brock HW, Mazo A. Stepwise histone modifications are mediated by multiple that rapidly associate with nascent DNA during replication. Nature Communications [Internet]. 2013 Nov 26 [cited 2014 Oct 16];4. Available from: http://www.nature.com/doifinder/10.1038/ncomms3841

59. Lan F, Bayliss PE, Rinn JL, Whetstine JR, Wang JK, Chen S, et al. A histone H3 lysine 27 demethylase regulates animal posterior development. Nature. 2007 Sep 12;449(7163):689–94.

60. Copur O, Muller J. The histone H3-K27 demethylase Utx regulates HOX gene expression in Drosophila in a temporally restricted manner. Development. 2013 Jul 30;140(16):3478–85.

61. Goo Y-H, Sohn YC, Kim D-H, Kim S-W, Kang M-J, Jung D-J, et al. Activating Signal Cointegrator 2 Belongs to a Novel Steady-State Complex That Contains a Subset of Trithorax Group Proteins. Molecular and Cellular Biology. 2003 Jan 1;23(1):140–9.

62. Hughes CM, Rozenblatt-Rosen O, Milne TA, Copeland TD, Levine SS, Lee JC, et al. Menin associates with a trithorax family histone methyltransferase complex and with the locus. Mol Cell. 2004 Feb 27;13(4):587–97.

63. Lee J-H, Skalnik DG. Wdr82 is a C-terminal domain-binding protein that recruits the Setd1A Histone H3-Lys4 methyltransferase complex to transcription start sites of transcribed human genes. Mol Cell Biol. 2008 Jan;28(2):609–18.

64. Zhang P, Lee H, Brunzelle JS, Couture JF. The plasticity of WDR5 peptide-binding cleft enables the binding of the SET1 family of histone methyltransferases. Nucleic Acids Research [Internet]. 2012 [cited 2012 Jun 26]; Available from: http://nar.oxfordjournals.org/content/early/2012/01/19/nar.gkr1235.short

65. Liu Y, Huang Y, Fan J, Zhu G-Z. PITX2 associates with PTIP-containing histone H3 lysine 4 methyltransferase complex. Biochemical and Biophysical Research Communications [Internet]. 2014 Jan [cited 2014 Feb 10]; Available from: http://linkinghub.elsevier.com/retrieve/pii/S0006291X14001788

92

66. Daniel JA, Santos MA, Wang Z, Zang C, Schwab KR, Jankovic M, et al. PTIP Promotes Chromatin Changes Critical for Immunoglobulin Class Switch Recombination. Science. 2010 Jul 29;329(5994):917–23.

67. Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004;11(2-3):377–94.

68. Churbanov A, Winters-Hilt S, Koonin EV, Rogozin IB. Accumulation of GC donor splice signals in mammals. Biology Direct. 2008;3(1):30.

69. Micale L, Augello B, Maffeo C, Selicorni A, Zucchetti F, Fusco C, et al. Molecular analysis, pathogenic mechanisms, and readthrough therapy on a large cohort of Kabuki syndrome patients. Human Mutation. 2014 Mar.

70. Schaefer MH, Wanker EE, Andrade-Navarro MA. Evolution and function of CAG/polyglutamine repeats in protein-protein interaction networks. Nucleic Acids Research. 2012 Jan 28;40(10):4273–87.

71. Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. Wasserman WW, editor. PLoS Computational Biology. 2010 Dec 2;6(12):e1001025.

72. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations (PolyPhen-2). Nature Methods. 2010 Apr;7(4):248–9.

73. Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974 Sep 6;185(4154):862–4.

74. Aboderin AA. An empirical hydrophobicity sc -amino-acids and some of its applications. International Journal of Biochemistry. 1971 Oct;2(11):537–44. ale for α 75. O’Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012 Apr 4;485(7397):246–50.

76. Taub MA, Corrada Bravo H, Irizarry RA. Overcoming bias and systematic errors in next generation sequencing data. Genome Medicine. 2010;2(12):87.

77. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research. 2008 Aug 1;36(16):e105–e105.

78. Poptsova MS, Il’icheva IA, Nechipurenko DY, Panchenko LA, Khodikov MV, Oparina NY, et al. Non-random DNA fragmentation in next-generation sequencing. Scientific Reports [Internet]. 2014 Mar 31 [cited 2014 Nov 21];4. Available from: http://www.nature.com/doifinder/10.1038/srep04532

79. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010 Sep 1;20(9):1297–303.

93

80. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754–60.

81. Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, Mahurkar AA, et al. Natural genetic variation caused by small insertions and deletions in the human genome. Genome Research. 2011 Apr 1;21(6):830–9.

82. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012 Jun;6(2):80–92.

83. Baker J, Riley G, Romero MR, Haynes AR, Hilton H, Simon M, et al. Identification of a Z- band associated protein complex involving KY, FLNC and IGFN1. Exp Cell Res. 2010 Jul 1;316(11):1856–70.

84. López-Álvarez MR, Jones DC, Jiang W, Traherne JA, Trowsdale J. Copy number and nucleotide variation of the LILR family of myelomonocytic cell activating and inhibitory receptors. Immunogenetics. 2014 Feb;66(2):73–83.

85. Priolo M, Micale L, Augello B, Fusco C, Zucchetti F, Prontera P, et al. Absence of deletion and duplication of MLL2 and KDM6A genes in a large cohort of patients with Kabuki syndrome. Molecular genetics and metabolism. 2012 Jul 6;

86. Dentici ML, Di Pede A, Lepri FR, Gnazzo M, Lombardi MH, Auriti C, et al. Kabuki syndrome: clinical and molecular diagnosis in the first year of life. Archives of Disease in Childhood [Internet]. 2014 Oct 3 [cited 2014 Nov 14]; Available from: http://adc.bmj.com/cgi/doi/10.1136/archdischild-2013-305858

87. Greenfield A, Carrel L, Pennisi D, Philippe C, Quaderi N, Siggers P, et al. The UTX gene escapes X inactivation in mice and humans. Hum Mol Genet. 1998 Apr;7(4):737–42.

88. Welstead GG, Creyghton MP, Bilodeau S, Cheng AW, Markoulaki S, Young RA, et al. X- linked H3K27me3 demethylase Utx is required for embryonic development in a sex- specific manner. Proceedings of the National Academy of Sciences of the United States of America [Internet]. 2012 Jul 23 [cited 2012 Jul 25]; Available from: http://www.ncbi.nlm.nih.gov/pubmed/22826230

89. Smith E, Lin C, Shilatifard A. The super elongation complex (SEC) and MLL in development and disease. Genes & Development. 2011 Apr 1;25(7):661–72.

90. Mahajan MA, Samuels HH. Nuclear Coregulator: Role in Hormone Action, Metabolism, Growth, and Development. Endocrine Reviews. 2005 Jun;26(4):583–97.

91. Treangen TJ, Salzberg SL. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics [Internet]. 2011 Nov 29 [cited 2014 Nov 15]; Available from: http://www.nature.com/doifinder/10.1038/nrg3117

94

92. Guo C, Chen LH, Huang Y, Chang C-C, Wang P, Pirozzi CJ, et al. KMT2D maintains neoplastic cell proliferation and global histone H3 lysine 4 monomethylation. Oncotarget. 2013 Nov 3;

93. Kim J-H, Sharma A, Dhar SS, Lee S-H, Gu B, Chan C-H, et al. UTX and MLL4 Coordinately Regulate Transcriptional Programs for Cell Proliferation and Invasiveness in Breast Cancer Cells. Cancer Research [Internet]. 2014 Feb 3 [cited 2014 Feb 10]; Available from: http://cancerres.aacrjournals.org/cgi/doi/10.1158/0008-5472.CAN-13-1896

94. Van Haaften G, Dalgliesh GL, Davies H, Chen L, Bignell G, Greenman C, et al. Somatic mutations of the histone H3K27 demethylase gene UTX in human cancer. Nature Genetics. 2009 May;41(5):521–3.

95. Mansour AA, Gafni O, Weinberger L, Zviran A, Ayyash M, Rais Y, et al. The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming. Nature [Internet]. 2012 Jul 8 [cited 2012 Sep 7]; Available from: http://www.nature.com/doifinder/10.1038/nature11272

96. Lee S, Lee JW, Lee S-K. UTX, a Histone H3-Lysine 27 Demethylase, Acts as a Critical Switch to Activate the Cardiac Developmental Program. Developmental Cell. 2012 Jan;22(1):25–37.

97. Patten SA, Jacobs-McDaniels NL, Zaouter C, Drapeau P, Albertson RC, Moldovan F. Role of Chd7 in Zebrafish: A Model for CHARGE Syndrome. Winkler C, editor. PLoS ONE. 2012 Feb 20;7(2):e31650.

98. Doyle AJ, Doyle JJ, Bessling SL, Maragh S, Lindsay ME, Schepers D, et al. Mutations in the TGF- -Goldberg syndrome with aortic aneurysm. Nature Genetics. 2012 Sep 30;44(11):1249–54. β repressor SKI cause Shprintzen 99. Beunders G, Voorhoeve E, Golzio C, Pardo LM, Rosenfeld JA, Talkowski ME, et al. Exonic deletions in AUTS2 cause a syndromic form of intellectual disability and suggest a critical role for the C terminus. Am J Hum Genet. 2013 Feb 7;92(2):210–20.

100. Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013 Apr 17;496(7446):498–503.

101. Meyer A, Van de Peer Y. From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays. 2005 Sep;27(9):937–45.

102. Summerton J. Morpholino, siRNA, and S-DNA Compared: Impact of Structure and Mechanism of Action on Off-Target Effects and Sequence Specificity. Current Topics in Medicinal Chemistry. 2007 Apr 1;7(7):651–60.

103. Summerton J. Morpholino antisense oligomers: the case for an RNase H-independent structural type. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression. 1999 Dec;1489(1):141–58.

95

104. Draper BW, Morcos PA, Kimmel CB. Inhibition of zebrafishfgf8 pre-mRNA splicing with morpholino oligos: A quantifiable method for gene knockdown. genesis. 2001 Jul;30(3):154–6.

105. Morcos PA. Achieving targeted and quantifiable alteration of mRNA splicing with Morpholino oligos. Biochemical and Biophysical Research Communications. 2007 Jun;358(2):521–7.

106. Robu ME, Larson JD, Nasevicius A, Beiraghi S, Brenner C, Farber SA, et al. p53 Activation by Knockdown Technologies. PLoS Genetics. 2007;3(5):e78.

107. Pattanayak V, Ramirez CL, Joung JK, Liu DR. Revealing off-target cleavage specificities of zinc-finger nucleases by in vitro selection. Nature Methods. 2011 Aug 7;8(9):765– 70.

108. Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, et al. RNA-Guided Human Genome Engineering via Cas9. Science. 2013 Jan 3;339(6121):823–6.

109. Fu Y, Foden JA, Khayter C, Maeder ML, Reyon D, Joung JK, et al. High-frequency off- target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nature Biotechnology [Internet]. 2013 Jun 23 [cited 2013 Jul 26]; Available from: http://www.nature.com/doifinder/10.1038/nbt.2623

110. Kam RKT, Deng Y, Chen Y, Zhao H. Retinoic acid synthesis and functions in early embryonic development. Cell & Bioscience. 2012;2(1):11.

111. Reijntjes S, Rodaway A, Maden M. The retinoic acid metabolising gene, CYP26B1, patterns the cartilaginous cranial neural crest in zebrafish. The International Journal of Developmental Biology. 2007;51(5):351–60.

112. Jiang X, Choudhary B, Merki E, Chien KR, Maxson RE, Sucov HM. Normal fate and altered function of the cardiac neural crest cell lineage in mutant embryos. Mech Dev. 2002 Sep;117(1-2):115–22.

113. Kudoh T, Wilson SW, Dawid IB. Distinct roles for Fgf, Wnt and retinoic acid in posteriorizing the neural ectoderm. Development. 2002 Sep;129(18):4335–46.

114. Westerfield M. The zebrafish book. A guide for the laboratory use of zebrafish (Danio rerio). 4th ed. Eugene, Oregon: Univ. of Oregon Press; 2000.

115. Berghmans S. tp53 mutant zebrafish develop malignant peripheral nerve sheath tumors. Proceedings of the National Academy of Sciences. 2005 Jan 11;102(2):407–12.

116. Iklé JM, Artinger KB, Clouthier DE. Identification and characterization of the zebrafish pharyngeal arch-specific enhancer for the basic helix-loop-helix transcription factor Hand2. Developmental Biology [Internet]. 2012 May [cited 2012 Jun 26]; Available from: http://linkinghub.elsevier.com/retrieve/pii/S0012160612002497

117. Bill BR, Petzold AM, Clark KJ, Schimmenti LA, Ekker SC. A Primer for Morpholino Use in Zebrafish. Zebrafish. 2009 Mar;6(1):69–77.

96

118. Walker M, Kimmel C. A two-color acid-free cartilage and bone stain for zebrafish larvae. Biotechnic & Histochemistry. 2007 Jan;82(1):23–8.

119. Chernyavskaya Y, Ebert AM, Milligan E, Garrity DM. Voltage-gated calcium channel s required in the heart for control of cell proliferation and heart tube integrity. Developmental Dynamics. 2012 Apr;241(4):648–62. CACNB2 (β2.1) protein i 120. Sander JD, Zaback P, Joung JK, Voytas DF, Dobbs D. Targeter (ZiFiT): an engineered zinc finger/target site design tool. Nucleic Acids Research. 2007 May 8;35(Web Server):W599–605.

121. Stewart RA, Arduini BL, Berghmans S, George RE, Kanki JP, Henion PD, et al. Zebrafish foxd3 is selectively required for neural crest specification, migration and survival. Developmental Biology. 2006 Apr;292(1):174–88.

122. Gaete M, Muñoz R, Sánchez N, Tampe R, Moreno M, Contreras EG, et al. Spinal cord regeneration in Xenopus tadpoles proceeds through activation of Sox2-positive cells. Neural Dev. 2012;7:13.

123. Park H-C, Kim C-H, Bae Y-K, Yeo S-Y, Kim S-H, Hong S-K, et al. Analysis of Upstream Elements in the HuC Promoter Leads to the Establishment of Transgenic Zebrafish with Fluorescent Neurons. Developmental Biology. 2000 Nov;227(2):279–93.

124. Ellis P, Fagan BM, Magness ST, Hutton S, Taranova O, Hayashi S, et al. SOX2, a Persistent Marker for Multipotential Neural Stem Cells Derived from Embryonic Stem Cells, the Embryo or the Adult. Developmental Neuroscience. 2004;26(2-4):148–65.

125. Besson WT, Kirby ML, Van Mierop LH, Teabeaut JR. Effects of the size of lesions of the cardiac neural crest at various embryonic ages on incidence and type of cardiac defects. Circulation. 1986 Feb 1;73(2):360–4.

126. LaBonne C, Bronner-Fraser M. Molecular mechanisms of neural crest formation. Annu Rev Cell Dev Biol. 1999;15:81–112.

127. Le Douarin NM, Brito JM, Creuzet S. Role of the neural crest in face and brain development. Brain Res Rev. 2007 Oct;55(2):237–47.

128. Kuo BR, Erickson CA. Regional differences in neural crest morphogenesis. Cell Adhesion & Migration. 2010 Oct;4(4):567–85.

129. Ding H-L, Clouthier DE, Artinger KB. Redundant roles of PRDM family members in zebrafish craniofacial development. Developmental Dynamics. 2013 Jan;242(1):67–79.

130. Barkovich AJ, Guerrini R, Kuzniecky RI, Jackson GD, Dobyns WB. A developmental and genetic classification for malformations of cortical development: update 2012. Brain. 2012 May 1;135(5):1348–69.

131. Shahhoseini M, Taghizadeh Z, Hatami M, Baharvand H. Retinoic acid dependent histone 3 demethylation of the clustered HOX genes during neural differentiation of human embryonic stem cells. Biochemistry and Cell Biology. 2013 Apr;91(2):116–22.

97

132. McCaffery PJ, Adams J, Maden M, Rosa-Molinar E. Too much of a good thing: retinoic acid as an endogenous regulator of neural differentiation and exogenous teratogen. European Journal of Neuroscience. 2003 Aug;18(3):457–72.

133. Begemann G, Meyer A. Hindbrain patterning revisited: timing and effects of retinoic acid signalling. BioEssays. 2001 Nov;23(11):981–6.

134. Blumberg B. An essential role for retinoid signaling in anteroposterior neural specification and neuronal differentiation. Seminars in Cell & Developmental Biology. 1997 Aug;8(4):417–28.

135. Xu J, Deng X, Watkins R, Disteche CM. Sex-Specific Differences in Expression of Histone Demethylases Utx and Uty in Mouse Brain and Neurons. Journal of Neuroscience. 2008 Apr 23;28(17):4521–7.

136. Shpargel KB, Sengoku T, Yokoyama S, Magnuson T. UTX and UTY Demonstrate Histone Demethylase-Independent Function in Mouse Embryonic Development. Wysocka J, editor. PLoS Genetics. 2012 Sep 27;8(9):e1002964.

137. Jones WD, Dafou D, McEntagart M, Woollard WJ, Elmslie FV, Holder-Espinasse M, et al. De Novo Mutations in MLL Cause Wiedemann-Steiner Syndrome. The American Journal of Human Genetics. 2012 Aug;91(2):358–64.

138. Bolande R. The neurocristopathiesA unifying concept of disease arising in neural crest maldevelopment. Human Pathology. 1974 Jul;5(4):409–29.

139. Sanlaville D, Verloes A. CHARGE syndrome: an update. Eur J Hum Genet. 2007 Apr;15(4):389–99.

140. Birkholz DA, Killian ECO, George KM, Artinger KB. Prdm1a is necessary for posterior pharyngeal arch development in zebrafish. Developmental Dynamics. 2009 Oct;238(10):2575–87.

141. Sperber SM, Dawid IB. barx1 is necessary for ectomesenchyme proliferation and osteochondroprogenitor condensation in the zebrafish pharyngeal arches. Developmental Biology. 2008 Sep;321(1):101–10.

142. Van den Hoff M. Cardiac neural crest: the holy grail of cardiac abnormalities? Cardiovascular Research. 2000 Aug;47(2):212–6.

143. Sauer B, Henderson N. Site-specific DNA recombination in mammalian cells by the Cre recombinase of bacteriophage P1. Proc Natl Acad Sci USA. 1988 Jul;85(14):5166–70.

144. Akamatsu W, Okano HJ, Osumi N, Inoue T, Nakamura S, Sakakibara S, et al. Mammalian ELAV-like neuronal RNA-binding proteins HuB and HuC promote neuronal development in both the central and the peripheral nervous systems. Proc Natl Acad Sci USA. 1999 Aug 17;96(17):9885–90.

145. Ori-McKenney KM, Vallee RB. Neuronal migration defects in the Loa dynein mutant mouse. Neural Development. 2011;6(1):26.

98

146. Hevner RF. Layer-specific markers as probes for neuron type identity in human neocortex and malformations of cortical development. J Neuropathol Exp Neurol. 2007 Feb;66(2):101–9.

147. Sanford SD, Gatlin JC, Hkfelt T, Pfenninger KH. Growth cone responses to growth and chemotropic factors. European Journal of Neuroscience. 2008 Jul;28(2):268–78.

148. Jan Y-N, Jan LY. Branching out: mechanisms of dendritic arborization. Nature Reviews Neuroscience. 2010 May;11(5):316–28.

99

APPENDIX A

CLINICAL MANIFESTATIONS OF KABUKI SYNDROME FROM 14 STUDIES

Clinical findings Sum of patients % Gender Male 135 (251) 54 Craniofacial abnormality Characteristic face 115 (115) 100 Microcephaly 47 (179) 26 Long palpebral fissure 135 (136) 99 Epicanthus 63 (138) 46 Lower palpebral eversion 132 (143) 92 Ptosis 26 (52) 50 Strabismus 54 (152) 36 Blue sclerae 38 (124) 31 Short nasal septum 72 (78) 92 Arched eyebrow 165 (193) 85 Prominent ear 145 (172) 84 Preauricular dimple/fistula 40 (180) 22 Depressed nasal tip 106 (128) 83 Malformed ear 87 (100) 87 Abnormal dentition 116 (171) 68 High-arched palate 64 (89) 72 Micrognathia 37 (93) 40 Cleft palate/lip and 68 (196) 35 Lower lip pit 4 (15) 27 Low posterior hair line 38 (67) 57 Bone abnormality Skeletal abnormality 142 (162) 88 Short finger (V) 135 (170) 79 Clinodactyly (V) 56 (112) 50 Short middle phalanx (V) 60 (76) 80 Short metacarpal 18 (51) 35 Cone-shaped epiphysis 6 (47) 13 Coarse carpal bone 8 (48) 17 Deformed vertebra/rib 32 (101) 32 Scoliosis 58 (168) 35 Sagittal cleft of vertebral 20 (55) 36 Rib anomaly 10 (55) 18 Spina bifida occulta 11 (59) 19 Pilonidal sinus 5 (6) 83 Hip dislocation 32 (178) 18 Foot deformity 13 (55) 24 Adapted with permission from (30).

100

A CONT.

Clinical findings Sum of patients % Dermatoglyphic finding Abnormal dermatoglyphic 76 (79) 96 Presence of fingertip pad 170 (190) 89 Neurological abnormality Mental retardation 157 (188) 84 Hypotonia 32 (47) 68 Neonatal hypotonicity 23 (81) 28 Seizure 33 (194) 17

Brain atrophy 2 (51) 4 Retinal pigmentation 1 (29) 3 Stature 75 (136) 55 Visceral abnormality HyperpigmentedShort stature (<−2.0 nevus SD) 12 (55) 22 Generalized hirsutism 7 (61) 11 Cardiovascular anomaly 103 (247) 42 Umbilical hernia 6 (67) 9 Kidney/urinary tract 41 (145) 28 Undescended testis 18 (75) 24 Small penis 3 (29) 10 Malrotation of colon 2 (33) 6 Anal atresia/rectovaginal 4 (74) 5 Inguinal hernia 5 (68) 7 Other Joint laxity 58 (78) 74 Recurrent otitis media 73 (116) 63 Hearing loss 48 (180) 27 Early breast development 13 (46) 28 Neonatal 14 (71) 20 Obesity 11 (58) 19 Anemia 5 (56) 9 Polycythemia 2 (56) 4 Neonatal hypoglycemia 4 (58) 7 Autoimmune hemolytic 1 (58) 2 GH deficiency 1 (58) 2 TBG deficiency 1 (58) 2 Cystic fibrosis 1 (58) 2 Primary ovarian cyst 1 (58)

101

APPENDIX B

KDM6A AMPLIFICATION AND SEQUENCING PRIMERS

Exon Forward Primer Reverse Primer Size (bp) 1 AACCAGCACAACCTAACAGGAAGCTCCCTC GAAAAGAGCGATTTCGCAAGGGAGCAAGCA 1276 3 GGGGAGGAAATGTATTTTAGGG AAGCACAAATAATTACAATCCAACAC 237 4 GTGGGTGTGGTGGGAATC GGCAATAATCTGCCCAAAAC 200 5 CCTAAAAATCTTTTTCCCTTCC TTTACTGAAAATGACTGAATTACTGG 130 6 TTCATGCACGTGTTAAAAAGTTAAG TTCCAACATTTCTCAATGCTC 266 7 GATGCTTTTGTGTGACTCTAAATTG ATTTGCCATGAAGTGATCGG 170 8 ATTTGCCCCAAATTCTGCTG TGATTTCAGTAAAGCATGATAGGAG 171 9 CTCAGTGTTTCTAGGCAGACAGATA TCTGGCACAAACGAAGTATTCAATC 743 10 TTGGTTTGTTTTCTGCTTCG TTGTTTAACTTTCAGGAATCTTGG 362 11-12 TTTCAGCCGATTAATTTGTTTC TCTGCCTGAGTGTTCTGGG 594 13 CTTTCACTCTTGCTTGGAGTTTCTT AAATGCACAAAGGTTCTAAGTTGGT 594 14 TCATTTGGCCTCCTCTAACC AACTTTAACACAGGACGCGG 357 15 TCATTTCAGGGAAAGGTTGC CAAAATGCTAACAAATTTCAGGG 229 16 TTCCCTATAGATTACACAACCAGC GCCAACAAGGAAGCTAGTCC 477 17A TTTGATAACTTTAGGACTTGGGTC GCTGAAGATGGTGAAGAGGC 683 17B CAGATGCTGTTTGCAGTCC TGATCATGCCAGTAAGTCAAAC 490 18 GGATCCACATCCCACATCTC CGCCCCCAAACTATTTAATC 399 19 GCTCTGTTTTCCTGAGATCTAACC TCTGCCAGTGCTGGAAAAG 224 20 GGATACAGTGCCGTAAAATGC GTGTCCTTTCAAAACTCCAAAG 389 21-22 GAACACTAAACTAGACTGCTTTTTGC AACTCTTTGCATCAGTTCACAG 295 23 TGGTAAACTTCCACAGGTATTTG TGGCTGTCTTTGCATGTTTC 284 24 GCTAACCAATTGCACCACTG TCCAGCTTGGTAAGTTGTCG 243 25 TCTTAATGTAGTTGATCCATTTGC CGAATTACAATTCTATGCAAGGAG 366 26 TCTTTTGGTACTTTGGGTTGC CTCCCCAAAAAGAGAGGGAC 234 27 TGCTGGTCACAAATAATTTCTCC GAAACCAACAGTGGAGAGGG 223 28 GATCACTGTCCACAATTTCATTC AGAGCAAACACTGCTGCTTC 339 29 GCAGACTATATGTTTGTAGCCATGAG GCAGAACTGGGTTATTTCCTG 166

102

APPENDIX C

KMT2D AMPLIFICATION AND SEQUENCING PRIMERS

Exons Primer Name Forward Primer Reverse Primer Size (bp) 1-2 Ex1_2_PCR_1 GATGCCTTCTTCCCAGGATT TTCCCCAACACTCATTTTCC 626 3-5 Ex3_5_PCR_1 GTTTGAGGGCACATGAGGAT CCTGGTGCTCACAAAGTTCA 1063 3-5 Ex3_5_Seq_1 CTGGTGGGCTTCTGAGAGTC CCTCAGTGTCAGCCAGCTCT 6-9 Ex6_9_PCR_1 GCAATGTGCTGAGGCTTACA ACAGAAAGTGTGGGGTCTGG 1231 6-9 Ex6_9_Seq_1 CCCTGATTCTGCCCTATTGT GCATTGGTCAGACAGCAAAG 10 Ex10_PCR_1 CCCTGAAATTCATCCCCTTT TGTGCCATGAAGAGTTACAGC 1715 10 Ex10_Seq_1 AAGAGTCACCCCCATCTCCT AAATGGTGGGAACAGACGAG 10 Ex10_Seq_2 CCTGAGGACTCACCTGCTTC GGACAGATGTGGTCCCTCAG 11 Ex11_PCR_2 GCTGTAACTCTTCATGGCACA AGCTCTAGCCCAAACCCATT 1463 11 Ex11_Seq_1 CAGCCTTGGAACCCAGTG GCACAGGGGAGCCTTTAAGT 12-14 Ex12_14_PCR_1 AGTGGGACTCCTGGGCTTAT CCACCGTTGAGTTCCAAAGT 1552 12-14 Ex12_14_Seq_1 TGACTCTGGTCGCAAATCAG TCCAGTTTTCCCATCTATCCTC 15-18 Ex15_18_PCR_1 CTGGGGAACAAGAGCAAAAC AAGCTAGGGGGTTGGAGCTA 1049 15-18 Ex15_18_Seq_1 TGACAGAGGCTGGGTTTAGG CAGAGCTTTAGCACCCAACC 19-21 Ex19_21_PCR_1 GGTTGAAACTTGCAGTTCTGG GTCAGACTCGGGTTGAGAGC 1019 19-21 Ex19_21_Seq_1 AGTGGCTCTGAGGCAAGGTA TGTCATCCTGCCACTGAGAG 22-25 Ex22_25_PCR_1 CTCATTGAAAGGGCCAAGAG AGGACTCCCCACCAGAGAAG 1161 22-25 Ex22_25_Seq_1 TGGGAGTGAGTGGTGTGAGA ATCTGATGCCCAGAACAGGT 26-27 Ex26_27_PCR_1 CTTCTCTGGTGGGGAGTCCT CCCAAAAGAGGAGGGTCACT 568 28-30 Ex28_30_PCR_1 TCCCCATTCCCTTGTTAGTG AGACCAGGCATAGGGCAGT 910 28-30 Ex28_30_Seq_1 ATGGATTAGCGTGGGAACTG CACTCCCTACCCAGAAGCAG 31 Ex31_PCR_3 CCCTAAGGCTGTGTCCCATA GCAGCTGTTTCCTTCTCCTG 2193 31 Ex31_Seq_1 GCAGGACCCCTTTGGACT CAGGTGGGGTAGTGTGGAAT 31 Ex31_Seq_2 CTCGGGCATCTCAGGTAGAG GAGCACAGCAGCTCTCAGG 31 Ex31_Seq_3 TTCACTTTCCCTCAGGCAGT GTTTGTGCTTTGAGGCTTGC 32-33 Ex32_33_PCR_1 CCCCCTATATCGCTCCTGTC GCAGTGAGGGAGAAAAGGAA 620 34 Ex34_PCR_2 TCCTTCCTCACTGCCCTAAG TCTAGCCTCAGTGCCCATTT 2019 34 Ex34_Seq_1 TCAGAGACCCCGTTTTTCAC GTGGGGTGTTGGATGAAGAC 34 Ex34_Seq_2 GAGACCAATGACCCCCACTT CAAGGGTCCTGGCTCCAC 34 Ex34_Seq_3 TGCTCATTGAGGACCTGTTG CTCATGTGGCAAAGACATGG 35-38 Ex35_38_PCR_1 GCACGGTGCAAGTAAAAACA AGGGTCGGAGAGGTCAGG 1167 35-38 Ex35_38_Seq_1 GTGGTCAGGTGGGAGTAGGA TGCAATGAGAGAGGCTGCTA 39 Ex39_PCR_1 ACTTCAGCCTAGCACCCAGA TTGGACAAGCAGGAGTTGTG 1585 39 Ex39_Seq_2 CTTCTTCCCTGGCAACCTT GGATTGCCACCTGTCCTAGA 39 Ex39_Seq_3 GGACACAGGCTGGTCACAG GCTGCTGAAGCTGCTGTAAA 39 Ex39_PCR_3 TAGACCCAGCCGTTTCTTCA ACCCAGGCTCACTCATTCTG 1521 39 Ex39_Seq_4 TTAAGTCCTCAGCAGCAGCA TGTCTGTGGTCCAGGGAAG 39 Ex39_Seq_5 CAGAAACCCAGAAGCCAGAG GCCTCCCTCTTCACTGACTG 40-42 Ex40_42_PCR_1 AGCCTGGGTCAGACAGAAGA ACCTCAGGTGCCCTGTTATG 960 40-42 Ex40_42_Seq_1 GAAGTTCTTGGGAAGGTGAGG GCAAGATGGCATAGGGAGAC 43-45 Ex43_45_PCR_1 CAAACTGGTAGGTGGGAGGA TCTAGCCCAGGCTTTCACAT 919 43-45 Ex43_45_Seq_1 GAAATGGGGATGAGGAACAA CAGACTCCCCTCCCAAATCT 46-47 Ex46_47_PCR_1 CCCACCCAGCTGGTAGTAGA CTCCCAAAGCACTGGGATTA 510 48 Ex48_PCR_2 GAGGCTGTCTAGGGCAAAGA GGGAAGGAGGATCATTCACA 1375 48 Ex48_Seq_2 TTCTGTCATGAGGAGGGTGA CAGGTCCAGGTTCAGCAGAC 48 Exon48_Seq_1 TGCCCCAATGTCTACCATTT ACCTCGTCCCGCTCAATGTA 49-50 Ex49_50_PCR_1 GCAGTTCTGGATTGGGGTTA GACCAGAGGATCCCTGTCAA 512 51-54 Ex51_54_PCR_1 CAGAGGAGGTGGGTGGTATG CTGGCTGCTACCTCTCTTCC 1253 51-54 Ex51_54_Seq_1 CTCCTACCTGATCCCACAGC GTCAGGGATGTCAGGCAACT

103

APPENDIX D

ANALYSIS PIPELINE WITH COMMANDS

MAPPING Index fasta reference file bwa index -a bwtsw ./grch37/human_g1k_v37.fasta

If Fastq is in Illumina 1.5+ format convert to Sanger Fastq perl ./Perl/perl_scripts/illumina_to_Fasta_downloaded.pl sample_raw.fastq sample.fastq

Align reads to reference bwa aln -t 2 -q 15 ./grch37/human_g1k_v37.fasta sample1.fastq > sample1.sai

Convert temp sai file to SAM If single end: bwa samse ./grch37/human_g1k_v37.fasta sample.sai sample.fastq > sample_aligned.sam

If paired-end: bwa sampe -P ./grch37/human_g1k_v37.fasta sample_pe1.sai sample_pe2.sai sample_pe1.fastq sample_pe2.fastq > sample_aligned.sam

Convert SAM to BAM java -Xmx4g -jar ./Picard/picard-tools-1.64/SamFormatConverter.jar INPUT=sample_aligned.sam OUTPUT=sample_aligned.bam VALIDATION_STRINGENCY=LENIENT

Fix headers with Picard (If Necessary) java -Xmx4g -jar ./Picard/picard-tools-1.64/AddOrReplaceReadGroups.jar INPUT=sample_aligned.bam VALIDATION_STRINGENCY=LENIENT OUTPUT=sample_reheadered.bam RGID=1 (Read Group ID) RGLB=1 (Library ID) RGPL=Illumina (Platform) RGPU=Barcode-clipped (Barcode if necessary) RGSM=Sample_Code (Sample Name) RGCN=BGI (Sequencing Center)

Fix Mate Information (For Paired-end reads) java -Xmx4g -jar ./Picard/picard-tools-1.64/FixMateInformation.jar INPUT=sample_reheadered.bam OUTPUT=sample_mate_fixed.bam SORT_ORDER=coordinate MAX_RECORDS_IN_RAM=2500000 VALIDATION_STRINGENCY=LENIENT

Mark Duplicates java -Xmx4g -jar ./Picard/picard-tools-1.64/MarkDuplicates.jar INPUT=sample_mate_fixed.bam Output=sample_deduped.bam METRICS_FILE=sample_deduped.metrics REMOVE_DUPLICATES=TRUE MAX_RECORDS_IN_RAM=2500000 VALIDATION_STRINGENCY=LENIENT

Index BAM file java -Xmx4g -jar ./Picard/picard-tools-1.64/BuildBamIndex.jar INPUT=sample_deduped.bam MAX_RECORDS_IN_RAM=2500000 VALIDATION_STRINGENCY=LENIENT

104

D CONT.

Get Depth of Coverage Statistics java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -T DepthOfCoverage -R ./grch37/human_g1k_v37.fasta -I sample_recal.bam -o coverage.txt -L ./grch37/SeqCap_EZ_Exome_v2.intervals

Get info about alignment java -Xmx4g -jar ./Picard/picard-tools-1.64/CollectAlignmentSummaryMetrics.jar INPUT=sample_deduped.bam OUTPUT=sample_deduped.metrics REFERENCE_SEQUENCE=./grch37/human_g1k_v37.fasta MAX_RECORDS_IN_RAM=2500000 VALIDATION_STRINGENCY=LENIENT

LOCAL REALIGNMENT Determining intervals in need of realignment java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ./grch37/human_g1k_v37.fasta -o sample_deduped.intervals -I 4_deduped.bam --known ./grch37/Mills_and_1000G_gold_standard.indels.b37.vcf -known ./grch37/-nt 2

Local realignment around indels (create the temp directory before running) mkdir /tmp/gatk.tmp java -Xmx4g -Djava.io.tmpdir=/tmp/gatk.tmp -jar ./GATK/GenomeAnalysisTK /GenomeAnalysisTK.jar -I sample_deduped.bam -R ./hg19_files/ucsc.hg19.fasta -T IndelRealigner -targetIntervals sample.intervals -o sample_realn.bam -known ./hg19_files/1000G_biallelic.indels.hg19.vcf --sortInCoordinateOrderEvenThoughItIsHighlyUnsafe (only if it is not a paired-end run)

BASE QUALITY RECALIBRATION Count Covariates before recalibration java -Xmx16g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -R ./hg19_files/ucsc.hg19.fasta -knownSites ./hg19_files/dbsnp_132.hg19.vcf -I sample_realn.bam -T CountCovariates --standard_covs -recalFile sample_recal_data1.csv -nt 8

Analyze Covariates before Recalibration mkdir before java -Xmx16g -jar ./GATK/GenomeAnalysisTK/AnalyzeCovariates.jar -outputDir ./before/ -recalFile sample_recal_data1.csv -ignoreQ 5

Table Recalibration java -Xmx16g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -R ./hg19_files/ucsc.hg19.fasta -I sample_realn.bam -T TableRecalibration -o sample_recal.bam -recalFile sample_recal_data1.csv

Count Covariates after recalibration java -Xmx4g -jar ./GATK/GenomeAnalysisTK/GenomeAnalysisTK.jar -R ./grch37/human_g1k_v37.fasta -knownSites ./grch37/dbsnp_135.b37.vcf -I sample_recal.bam -T CountCovariates --standard_covs -recalFile sample_recal_data1.csv -nt 2

105

D CONT.

Analyze Covariates after Recalibration mkdir after java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/AnalyzeCovariates.jar -outputDir ./after/ -recalFile sample_recal_data1.csv -ignoreQ 5

Using UnifiedGenotyper to call SNPs java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -R ./grch37/human_g1k_v37.fasta -T UnifiedGenotyper -I sample1_recal.bam -I sample2_recal.bam -I sample3_recal.bam -I sample4_SNP_recal.bam -I sample5_SNP_recal.bam --dbsnp ./grch37/dbsnp_135.b37.vcf -o combined_SNP_calls.vcf -stand_call_conf 20.0 -stand_emit_conf 10.0 -dcov 500 --intervals ./grch37/SeqCap_EZ_Exome_v2.intervals -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A RMSMappingQuality -glm SNP

Run snpEff on SNP Calls java -Xmx4g -jar ./snpEff/snpEff_v2_0_5d/snpEff.jar eff -c ./snpEff/snpEff_v2_0_5d/snpEff.config -v GRCh37.64 -s snpEFF_summary_SNPs.html -onlyCoding true -i combined_SNP_calls.vcf -o vcf > combined_SNP_effects.vcf

Annotate SNP Metrics with VariantAnnotator java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -R ./grch37/human_g1k_v37.fasta -T VariantAnnotator -I sample1_recal.bam -I sample2_recal.bam -I sample3_recal.bam -I sample4_SNP_recal.bam -I sample5_SNP_recal.bam -o combined_SNP_annotated.vcf --variant combined_SNP_calls.vcf --dbsnp ./grch37/dbsnp_135.b37.vcf --snpEffFile sample_SNP_effects.vcf

Variant quality score recalibrator: GenerateVariantClusters for SNPs java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -T VariantRecalibrator -R ./grch37/human_g1k_v37.fasta -input combined_SNP_annotated.vcf -resource:hapmap,VCF,known=false,training=true,truth=true,prior=15.0 ./grch37/hapmap_3.3.b37.sites.vcf -resource:omni,VCF,known=false,training=true,truth=false,prior=12.0 ./grch37/1000G_omni2.5.b37.sites.vcf -resource:dbsnp,VCF,known=true,training=false,truth=false,prior=8.0 ./grch37/dbsnp_135.b37.vcf -recalFile ./combined_SNP.recal -tranchesFile ./combined_SNP.tranches -rscriptFile ./combined_SNP.plots.R -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an DP -mode SNP --maxGaussians 4 -percentBad 0.05

Apply Recalibration to Filter out Calls for SNPs java -Xmx12g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -T ApplyRecalibration -R ./hg19_files/ucsc.hg19.fasta -input combined_SNP_annotated.vcf --ts_filter_level 99.0 -tranchesFile ./combined_SNP.tranches -recalFile ./combined_SNP.recal -o ./combined_filtered_SNP_calls.vcf --mode SNP

106

D CONT.

INDEL SPECIFIC CALLS Using UnifiedGenotyper to call Indels java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -R ./grch37/human_g1k_v37.fasta -T UnifiedGenotyper -I ./sample1/sample1_recal.bam -I ./sample2/ sample2_recal.bam -I ./sample3/sample3_recal.bam -I ./sample4/ sample4_SNP_recal.bam -I ./sample5/ sample5_SNP_recal.bam --dbsnp ./grch37/dbsnp_135.b37.vcf -o combined_indel_calls.vcf -stand_call_conf 30.0 -stand_emit_conf 10.0 -dcov 500 --intervals ./grch37/SeqCap_EZ_Exome_v2.intervals -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A RMSMappingQuality -glm INDEL

Run SnpEff on Indel Calls java -Xmx4g -jar ./snpEff/snpEff_v2_0_5d/snpEff.jar eff -c ./snpEff/snpEff_v2_0_5d/snpEff.config -v -s combined_indels_snpEFF_summary.html -i vcf -o vcf GRCh37.64 combined_indel_calls.vcf > combined_indel_effects.vcf

Annotate Indels with VariantAnnotator java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -R ./grch37/human_g1k_v37.fasta -T VariantAnnotator -I combined_indel_effects.vcf -o combined_indels_annotated.vcf --variant combined_indel_calls.vcf --dbsnp ./grch37/dbsnp_135.b37.vcf -A DepthOfCoverage -A MappingQualityZero -A QualByDepth -A AlleleBalance -A RMSMappingQuality -A HaplotypeScore -A FisherStrand -A HomopolymerRun -A MappingQualityRankSumTest -A ReadPosRankSumTest -A BaseQualityRankSumTest -A SnpEff -A HomopolymerRun --snpEffFile combined_indel_effects.vcf

GenerateVariantClusters for Indels java -Xmx4g -jar ./GATK/GenomeAnalysisTK-1.5/GenomeAnalysisTK.jar -T VariantRecalibrator -R ./grch37/human_g1k_v37.fasta -input,VCF 12_INDELs_annotated.vcf -resource:mills,VCF,known=true,training=true,truth=true,prior=12.0 ./grch37/Mills_and_1000G_gold_standard.indels.b37.vcf -recalFile ./12_INDELs.recal - tranchesFile ./12_INDELs.tranches -rscriptFile ./12_INDELs.plots.R -an QD -an FS -an HaplotypeScore -an ReadPosRankSum --maxGaussians 4 -percentBad 0.05 -mode INDEL

Apply Recalibration to Filter out Calls for Indels java -Xmx12g -jar ./GATK/GenomeAnalysisTK-1.5 /GenomeAnalysisTK.jar -T ApplyRecalibration -R ./grch37/human_g1k_v37.fasta -input 12_INDELs_annotated.vcf -- ts_filter_level 90.0 -tranchesFile 12_INDELs.tranches -recalFile 12_INDELs.recal -o 12_INDELs.recalibrated.filtered.vcf --mode INDEL

107

APPENDIX E

ZINC-FINGER NUCLEASE INSERT SEQUENCES

kmt2d ZFN Insert +: GAAAAAAATCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGCATGCGGAACTTTTCGTTG CGTACCTCTTTGGTTCGTCATACCCGTACTCATACCGGTGAAAAACCGTTTCAGTGTCGGATCTGT ATGCGAAATTTCTCCGACTCTTCTGTTTTGCGTCGTCATCTACGTACGCACACCGGCGAGAAGCCA TTCCAATGCCGAATATGCATGCGCAACTTCAGTCAGGGTCGTTCTTTGCGTGCACACCTAAAAAC CCACCTGAGGGGATCCAAGAAGGA

kmt2d ZFN Insert -: GAAAAAAATCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGCATGCGGAACTTTTCGCGT CATCAGCATTTGAAATTGCATACCCGTACTCATACCGGTGAAAAACCGTTTCAGTGTCGGATCTG TATGCGAAATTTCTCCCGTCAGGACAACTTGGGTCGTCATCTACGTACGCACACCGGCGAGAAGC CATTCCAATGCCGAATATGCATGCGCAACTTCAGTGTTAAACATGGTTTGGGTCGTCACCTAAAA ACCCACCTGAGGGGATCCAAGAAGGA

kdm6a ZFN Insert (+): GAAAAAAATCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGCATGCGGAACTTTTCGCAT AACGGTACCTTGAAACGTCATACCCGTACTCATACCGGTGAAAAACCGTTTCAGTGTCGGATCTG TATGCGAAATTTCTCCCAGCGTTCTTCTTTGGTTCGTCATCTACGTACGCACACCGGCGAGAAGCC ATTCCAATGCCGAATATGCATGCGCAACTTCAGTCATGGTCATCGTTTGAAAACCCACCTAAAAA CCCACCTGAGGGGATCCAAGAAGGA

kdm6a ZFN Insert (-): GAAAAAAATCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGCATGCGGAACTTTTCGCGT CGTACCCATTTGCGTGTTCATACCCGTACTCATACCGGTGAAAAACCGTTTCAGTGTCGGATCTGT ATGCGAAATTTCTCCGACCATTCTTCTTTGAAACGTCATCTACGTACGCACACCGGCGAGAAGCC ATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTACCGACTTGTTGCGTCGTCACCTAAAAA CCCACCTGAGGGGATCCAAGAAGGA

kdm6al ZFN Insert (+): GAAAAAAATCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGCATGCGGAACTTTTCGCGT CGTCAGGCATTGGAATATCATACCCGTACTCATACCGGTGAAAAACCGTTTCAGTGTCGGATCTG TATGCGAAATTTCTCCCGTCGTGAAGTTTTGGAAAACCATCTACGTACGCACACCGGCGAGAAGC CATTCCAATGCCGAATATGCATGCGCAACTTCAGTCGTCGTGACCATTTGTCTTTGCACCTAAAA ACCCACCTGAGGGGATCCAAGAAGGA kdm6al ZFN Insert (-): GAAAAAAATCTAGACCCGGGGAGCGCCCCTTCCAGTGTCGCATTTGCATGCGGAACTTTTCGTCT AAACAGGCATTGGCAGTTCATACCCGTACTCATACCGGTGAAAAACCGTTTCAGTGTCGGATCTG TATGCGAAATTTCTCCCAGTCTACCACCTTGAAACGTCATCTACGTACGCACACCGGCGAGAAGCC ATTCCAATGCCGAATATGCATGCGCAACTTCAGTTTGAAACATGACTTGCGTCGTCACCTAAAAA CCCACCTGAGGGGATCCAAGAAGGA

108