<<

FUNCTIONAL GENOMIC ANALYSIS OF NOVEL MICRODELETIONS AND

MICRODUPLICATIONS ASSOCIATED WITH

by

Chansonette Badduke

M.Sc., Andrews University, 2008

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Pathology and Laboratory Medicine)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

December 2015

© Chansonette Badduke, 2015 Abstract

Intellectual disability (ID) is a diagnosis given to persons who have life-long cognitive and adaptive impairments that begin early in life. ID affects about 1-3% of the population.

Extremely small losses and gains, called microdeletions and microduplications respectively (or collectively Copy Number Variants, CNVs), are the cause of ID in ~15% of cases and their identification has helped to pinpoint genomic regions that contain ID-.

The objective of my PhD research was to search for ID candidate genes in subjects with

ID, focusing on the functional genomic analysis of genes from CNVs and in the rest of the . I studied individuals with unique de novo pathogenic CNVs at chromosomal position

2p15-16.1 or with familial CNVs at chromosomal position 1q21.1. I used a multi-faceted approach that included the study of candidate genes’ 1) expression, 2) sequence variants, 3) knock down consequence in C. elegans and 4) imprinting potential.

My results showed that the best candidate genes from the 2p15-16.1 CNV are XPO1,

USP34 and REL because their expression is reduced in individuals with deletions. In case of the

1q21.1 CNV, I identified two candidate genes (CHD1L and PRKAB2) from the CNV that had altered expression and cellular function. I also identified a pathogenic sequence change in ATF6 in individuals with a familial 1q21.1 duplication. ATF6 is located outside the 1q21.1 CNV and is part of the (ER) stress response pathway which may contribute to the phenotypic variability in this family. Finally, I identified 3 CNVs in children with ID that overlap putative imprinted regions.

The results of my study therefore led to the identification of genes which could contribute to ID as their function is altered in patients with the CNV or their characteristics suggest that they can be sensitive to copy number changes. This work contributes to an improved

ii

understanding of how CNVs and additional genetic changes in the rest of the genome can lead to

ID.

iii

Preface

All studies involving subjects were approved by the University of British

Columbia Clinical Research Ethics Board (C01-0509).

Chapter 2 contains the results of multiple studies done by me and in collaboration with others. Extraction of clinical information from publications and charts for individuals with 2p15-

16.1 deletions was initially done by Dr. Ying Qiao (research associate) and was checked by me,

Dr. Evica Rajcan-Separovic, and Dr. Suzanne Lewis. Chromosomal Microarrays (CMAs) for our

2p15-16.1 cases were run in house by Sally Martell (technician), Ying Qiao or by me, or were provided by clinical cytogenetics service including Royal Columbian Hospital’s

Cytogenetics Laboratory. Breakpoint comparison, genomic overlap, generation of frequencies and addition of haploinsufficiency scores was done by me. Exome sequencing for one subject was done by Otogenetics. I performed the initial analysis of variants in the 2p15-16.1 region based on Otogenetics data and subsequently supervised data analysis using Golden Helix software (performed by my co-op student Flamingo Tang). The Illumina Expression BeadChip array (HumanRef-8 v3.0) was run by the CFRI core facility. I generated the background- corrected intensity values for each probe using GenomeStudio software (Illumina). Subsequent normalization and expression/copy number correlation analysis was done in collaboration with

Dr. Paul Pavlidis (UBC). After normalization, I generated the expression ratios and calculated the fold changes for all genes. Lymphoblasts were transformed on a service basis, and the resulting lymphoblastoid cell lines (LBCs) were grown and maintained by me. RNA and extractions for downstream analysis were done by me. QMPSF confirmation of small CNVs in

Case No. 2 was done by Sally Martell with my advice on where to position primers.

Microsatellites were designed and parent of origin experiments were run by me.

iv

Immunohistochemistry using anti-XPO1 and anti-USP34 was done by the histochemistry lab

(Department of Pathology and Laboratory Medicine, UBC). Image analysis was done with the help of neuropathologist, Dr. Chris Dunham. Real time qPCR for XPO1 and USP34 was done by me as were the Western blots for XPO1. Western blots for c-REL and USP34 were performed by

Dr. Jiadi Wen (research associate) and in Dr. Marc O’Driscoll’s lab respectively. The animal model experiments in C. elegans were done in Dr. Harald Hutter’s lab. The RNAi experiments and design of the constructs for the transgenic strains were performed by me. Jessie Jie Pan engineered the working constructs and did the microinjections, Dr. Harald Hutter did the image capture and supervised my image analysis. Lastly, all bioinformatics analyses in this chapter, including the use of WebGestalt for functional enrichment analysis, were done by me. The manuscript deriving from this work and subsequent extension of it using additional patients and a knock-out model is in preparation for publication and I share the first co-authorship with Hani Bagheri.

The majority of work from Chapter 3 was published in my first-author publication

(Harvard et al., 2011). Published results included genomic and clinical assessment of 3 families with 1q21.1 CNVs and whole genome expression experiments for 2 families. The Illumina

Expression BeadChip array (HumanRef-8 v3.0) was run by the CFRI core facility. I performed data analysis from the data in collaboration with Dr. Paul Pavlidis (UBC) who the gene expression copy number correlation analysis and Mr. Eloi Mercer who ran the statistical tests for the over-representation analysis. Functional analysis for two candidate genes from the 1q21.1 CNV with highest expression/copy number correlation in patient cells (CHD1L and PRKAB2) was performed in the laboratory of our collaborator Dr. Mark Driscoll (University of Sussex, UK). Subsequent to the paper, whole exome data for members of 2 families carrying a

v

1q21.1CNV was generated at BGI and I played a major role in data processing and analysis that resulted in the identification of variants in two genes (ATF6 and DARS1) that warranted follow- up including confirmation by Sanger sequencing and real time qPCR evaluation of gene expression. I supervised the Sanger confirmation of variants (done by my co-op student,

Flamingo Tang) and did the qPCR experiments to confirm gene expression. Western blotting for

ATF6 was conducted in Dr. Alan Volchuk’s lab at the University of Toronto. Finally, I performed an extensive literature analysis in order to identify genes involved in ER stress and used these lists to do a global analysis of genes expression for genes from the ER stress response pathway using my whole genome expression data in members of 2 families (A & C). I also assessed the response of patient lymphoblast cells to ER stress induced by two chemicals

(tunicamycin and thapsigargin).

Chapter 4 contains the results of a FISH-RT study that was designed and performed by me. I extracted known and predicted imprinted genes from multiple online databases and compared their positions with CNVs from individuals in our ID cohort in order to identify candidate imprinted regions for FISH-RT studies. Dr. Evica Rajcan-Separovic and I both counted replication timing patterns for each FISH-RT experiment. Putative imprinted differentially methylated regions (DMRs) were provided by Dr. Courtney Hanna based on her work in Dr. Wendy Robinson’s Lab and I performed CNV/DMR overlap analysis to identify

CNVs that contain putative imprinted DMRs. Finally, a comparison analysis of the DMR fraction in putatively pathogenic CNVs from individuals with ID (de novo and familial) and in randomly generated genomic regions was run in collaboration with Mr. Eloi Mercer.

vi

Table of Contents

Abstract ...... ii

Preface ...... iv

Table of Contents ...... vii

List of Tables ...... xiv

List of Figures ...... xv

List of Abbreviations ...... xvii

Acknowledgements ...... xviii

Dedication ...... xix

Chapter 1: Introduction ...... 1

1.1 Intellectual Disability ...... 1

1.2 Genomic Abnormalities and ID ...... 2

1.2.1 Chromosomal Abnormalities ...... 3

1.2.2 DNA Copy Number Variations (CNVs) ...... 5

1.2.2.1 CNV Categories ...... 8

1.2.2.2 Clinical Relevance of CNVs in ID...... 10

1.2.2.3 Mechanisms by Which CNVs Cause Abnormal Phenotypes ...... 13

1.2.2.3.1 CNVs Affect Gene Expression ...... 13

1.2.2.3.2 CNVs Affect Imprinted Genes ...... 14

1.2.2.3.3 Use of Animal Models to Study CNVs Dosage Effects...... 17

1.3 Single Gene and ID ...... 18

1.3.1 Next Generation Sequencing (NGS) ...... 18

1.3.2 Clinical Relevance of WES in ID ...... 20

vii

1.4 Research Goal, Hypothesis, Objectives, and Overall Significance ...... 20

Chapter 2: Investigations of the 2p15-16.1 Microdeletion ...... 22

2.1 Background ...... 22

2.1.1 Clinical Findings, Published Cases ...... 23

2.1.2 Genomic Findings, Published Cases ...... 23

2.2 Chapter Goals...... 27

2.3 Materials and Methods ...... 27

2.3.1 Clinical and Genomic Data Acquisition ...... 27

2.3.2 DNA, RNA and Protein Extractions ...... 28

2.3.3 Chromosomal Microarrays (CMA) ...... 32

2.3.4 Confirmation of Small CNVs by QMPSF ...... 32

2.3.5 Whole Genome Expression Analysis...... 33

2.3.6 Candidate Gene Selection and Functional Analysis ...... 34

2.3.6.1 Quantitative Real Time PCR (qPCR) ...... 35

2.3.6.2 Western Blotting ...... 37

2.3.6.3 Immunohistochemistry ...... 39

2.3.6.4 Study of Candidate Genes in C. elegans ...... 39

2.3.7 Whole Exome Sequencing ...... 41

2.3.8 CNV Parent of Origin Analysis Using Microsatellites ...... 42

2.3.9 Bioinformatic Data Acquisition and Analysis ...... 43

2.3.9.1 Repeating Elements by RepeatMasker (RMSK elements) ...... 43

2.3.9.2 ENCODE Regulatory Elements ...... 43

2.3.9.3 VISTA Enhancer Elements ...... 44

viii

2.3.9.4 WebGestalt Functional Enrichment Analysis ...... 44

2.3.9.4.1 Pathway Commons Enrichment Analysis in WebGestalt ...... 45

2.3.9.4.2 Protein-Protein Interaction Enrichment Analysis in WebGestalt ...... 45

2.3.9.4.3 Hierarchical Human Protein Interaction Network Modules ...... 45

2.4 Results ...... 46

2.4.1 Clinical Findings (Summary of New and Published 2p15-16.1 Deletion Cases) ..... 46

2.4.2 Genomic Findings (Summary of New and Published 2p15-16.1 Deletion Cases) ... 47

2.4.3 Whole Genome Expression Analysis in Individuals with 2p15-16.1

Microdeletions ...... 59

2.4.4 Candidate Gene Selection for Functional Analysis ...... 60

2.4.4.1 Functional Studies of 2p15-16.1 Candidate Genes in Human Cells ...... 60

2.4.4.2 Functional Studies of 2p15-16.1 Candidate Genes in C. elegans ...... 69

2.4.4.3 Additional Laboratory Investigations ...... 70

2.4.5 Bioinformatic Investigations ...... 72

2.5 Discussion ...... 78

2.5.1 Summary of Genomic and Phenotypic Findings ...... 78

2.5.2 Phenotype-Genotype Correlations ...... 79

2.5.3 Reasons for Phenotypic Similarities in 2p15-16.1 Deletion Carriers ...... 81

2.5.3.1 Multiple Haploinsufficent Genes are Located in the 2p15-16.1 Region ...... 81

2.5.3.2 Contribution of Regulatory Elements in the 2p15-16.1 Region ...... 82

2.5.3.3 Shared Pathways & Protein-Protein Interactions ...... 85

2.6 Conclusion ...... 86

ix

Chapter 3: Investigations of 1q21.1 Copy Number Variations (CNVs) ...... 87

3.1 Background ...... 87

3.2 The 1q21.1 CNV ...... 87

3.2.1 Contributors to the Phenotypic Variability in 1q21.1 CNV carriers ...... 89

3.2.1.1 Number of DUF1220 Copies in the 1q21.1 CNV...... 89

3.2.1.2 Presence of Dosage Sensitive Genes within the 1q21.1 CNV Region ...... 89

3.2.1.3 Single Nucleotide Changes within 1q21.1 CNVs and in the Rest of

the Genome ...... 90

3.3 Chapter Goals...... 92

3.4 Materials and Methods ...... 92

3.4.1 Subjects ...... 93

3.4.2 Chromosomal Microarrays (CMA) ...... 94

3.4.3 RNA Extraction ...... 94

3.4.4 Whole Genome Expression...... 95

3.4.5 In silico Functional Analysis of the Top 100 Genes ...... 96

3.4.6 Functional Studies for CHD1L and PRKAB2 ...... 96

3.4.7 Whole Exome Sequencing ...... 96

3.4.7.1 Variant Filtering Strategy for 1q21.1 Families ...... 97

3.4.7.2 Confirmation of Selected Variants by Sanger Sequencing ...... 97

3.4.7.3 Gene Expression Studies for Selected Variants ...... 98

3.4.7.4 ATF6 Functional Follow-up ...... 98

3.5 Results ...... 101

3.5.1 Genomic and Clinical Features of 1q21.1 CNV Carriers ...... 101

x

3.5.2 Whole Genome Expression Analysis in 1q21.1 CNV Carriers ...... 103

3.5.3 Functional Analysis of Candidate Genes from Whole Genome Expression ...... 106

3.5.3.1 Functional Assays for CHD1L/ALC1 ...... 106

3.5.3.2 Functional Assays for AMPKβ2 ...... 107

3.5.4 Detection and Selection of Variants Using Whole Exome Sequencing ...... 108

3.5.5 ATF6 Functional Analysis ...... 113

3.5.5.1 Expression of ATF6 Downstream Genes ...... 114

3.5.5.2 ER Stress Response in LBCs of ATF6 Carriers ...... 116

3.5.5.3 Search for Variants in ER Stress Response Genes with Relaxed Filters

in Family A ...... 117

3.6 Discussion ...... 117

3.7 Conclusion ...... 123

Chapter 4: Imprinting Potential of CNVs and their Integral Genes ...... 124

4.1 Replication Timing as a Marker of Imprinting ...... 125

4.1.1 Background ...... 125

4.1.2 Assessment of Replication Timing Using FISH ...... 126

4.1.3 Replication Timing and Genomic Changes ...... 128

4.2 Differentially Methylated Regions as Markers of Imprinting ...... 129

4.3 Chapter Goals...... 129

4.4 Materials and Methods ...... 132

4.4.1 Lymphoblast Cultures ...... 132

4.4.2 BrDU Labeling and Detection ...... 132

4.4.3 Cell Harvest and Fixation ...... 133

xi

4.4.4 FISH ...... 134

4.4.5 Detection of Replication Timing ...... 134

4.4.6 Scoring FISH Signals ...... 135

4.4.7 CNVs and Control Regions Selected for Replication Timing Experiments ...... 136

4.4.8 Overlap of de novo and Familial CNVs with Differentially Methylated Regions . 138

4.5 Results ...... 139

4.5.1 FISH Results for Selected BAC Probes in Control Individuals ...... 139

4.5.2 Commercial FISH Probe Results for SNRPN (Synchronous) and 15qter

(Asynchronous) in Patient and Control Cells ...... 141

4.5.3 Overlap of de novo and Familial CNVs with Putatively Imprinted DMRs ...... 145

4.6 Discussion ...... 147

4.6.1 FISH Replication Timing Assay ...... 147

4.6.2 Overlap of de novo and Familial CNVs in Individuals with ID with Putative

Novel DMRs ...... 150

4.7 Conclusion ...... 151

Chapter 5: Discussion ...... 152

5.1 Overview ...... 152

5.2 Summary and Significance ...... 152

5.3 Strengths and Limitations ...... 155

5.4 Future Directions ...... 156

5.4.1 Future Studies for Candidate Genes from 2p15-16.1 and 1q21.1 CNVs ...... 156

5.4.2 Future Studies for 2p15-16.1 Deletions ...... 157

5.4.3 Future Studies for 1q21.1 CNVs ...... 157

xii

5.4.4 Future Studies for CNVs Overlapping Novel Putative Imprinted Regions ...... 158

5.5 Conclusion ...... 158

Works Cited ...... 160

Appendices ...... 185

Appendix A Supplementary Tables and Figures for Chapter 2 ...... 185

Appendix B Supplementary Tables and Figures for Chapter 3 ...... 199

Appendix C Supplementary Tables and Figures for Chapter 4 ...... 217

xiii

List of Tables

Table 1.1 Select Methods Used to Detect Genomic Abnormalities in Individuals with ID ...... 3

Table 2.1 Primer Sequences for QMPSF Confirmation of Small CNVs in New Case No. 2 ...... 33

Table 2.2 Primers Used for Gene Expression Confirmation ...... 37

Table 2.3 Extension of an Intergenic CNV from Case No. 2 into VISTA Enhancer Element hs1142 ...... 49

Table 2.4 Gene Content Overlap for Published and Newly Recruited 2p15-16.1 Microdeletion

Cases (hg 19)...... 54

Table 2.5 Function of Genes Found in >60% of 2p15-16.1 Microdeletions ...... 58

Table 2.6 Parental Allele Identification Using Microsatellites ...... 72

Table 2.7 Positive VISTA Enhancer Elements in the 2p15-16.1 Deletion Region ...... 74

Table 2.8 Enriched Pathways for the 13 Most Commonly Deleted Genes in the 2p15-16.1

Microdeletion Region ...... 75

Table 3.1 Summary of Tests Run for 1q21.1 CNV Carriers ...... 93

Table 3.2 Genes from the 1q21.1 CNV Affected by Copy Number ...... 105

Table 3.3 Summary of UPR Genes Over and Under-Expressed in Subjects with 1q21.1

CNVs ...... 115

Table 3.4 ER Stress Induced Response Target Genes in ATF6 -/+ LBCs ...... 116

Table 4.1 Summary of FISH-RT Assay Counts for Selected BAC Probes in Control Individuals

...... 140

Table 4.2 Summary of FISH Assay for Commercial Control Probes on in

Controls and Subjects with CNVs at 1q21.1, Xq21.1 and 4p14 ...... 143

xiv

List of Figures

Figure 2.1 Genomic Overlap of Published 2p15-16.1 Microdeletion Cases ...... 26

Figure 2.2 xpo-1 Orthologue in C. elegans ...... 40

Figure 2.3 New Case No. 2 Additional CNVs at 2p16.1 ...... 50

Figure 2.4 Genomic Overlap of 15 Published and 8 Newly Recruited 2p15-16.1 Microdeletion

Cases ...... 55

Figure 2.5 mRNA Expression of Candidate Genes USP34 & XPO1 ...... 62

Figure 2.6 Investigation of XPO1 Protein Expression in Cell Lines with 2p15-16.1 Deletions .. 64

Figure 2.7 Investigation of USP34 Protein Expression in Cell Lines with 2p15-16.1 Deletions . 65

Figure 2.8 Investigation of c-REL Protein Expression in Cell Lines with 2p15-16.1 Deletions . 66

Figure 2.9 XPO1 Expression in Human Fetal and Mouse Brain ...... 67

Figure 2.10 USP34 Expression in Human Fetal Brain (A-F) ...... 68

Figure 2.11 XPO1 Expression in Transgenic XPO-1::GFP C. elegans ...... 70

Figure 2.12 Hierarchial Protein-Protein Interaction Network Modules ...... 77

Figure 3.1 ER Stress Response Pathway ...... 100

Figure 3.2 Comparison of Genomic Overlap for 1q21.1 CNVs ...... 102

Figure 3.3 Correlation of Expression and Copy Number for Probes from ...... 104

Figure 3.4 Familial Transmission of the 1q21.1 CNV ...... 109

Figure 3.5 Number of Variants Remaining After Each Filtering Step for 1q21.1 CNV Carriers

...... 110

Figure 3.6 Variant Detection, Confirmation and Gene Expression Study for a Prioritized Variant in Family C ...... 113

xv

Figure 3.7 Genes from the UPR Pathway with Expression Changes on the Whole Genome

Expression Array ...... 115

Figure 3.8 Fold Change of Target Genes During ER Stress Response ...... 117

Figure 4.1 Replication of DNA During the S Phase of the Cell Cycle ...... 125

Figure 4.2 Consequences of a CNV Affecting an Imprinted Gene, Expressed from the Maternal

Chromosome ...... 129

Figure 4.3 Proposed Uses of Replication Timing to Assess Epigenetic Characteristics of CNVs

...... 131

Figure 4.4 Identification of Proliferating Cells Using BrDU ...... 133

Figure 4.5 FISH Probe Hybridization Signals ...... 135

Figure 4.6 Replication Timing Patterns for Selected BAC Probes in Control Individuals ...... 141

Figure 4.7 Replication Timing Patterns for Control Probes on Chromosome 15 in Controls and

Subjects with CNVs at 1q21.1, Xq21.1 and 4p14 ...... 144

Figure 4.8 Putative Imprinted DMRs Found in de novo and Familial CNVs...... 147

xvi

List of Abbreviations

ASD Disorder

CMA Chromosomal Microarray

CNV Copy Number Variants

DDR DNA Damage Response

DGV Database of Genomic Variants

DMR Differentially Methylated Region

ER Endoplasmic Reticulum

FISH Fluorescence in situ Hybridization

ID Intellectual Disability

LBC Lymphoblastoid Cell Lines

NAHR NonAllelic

NDD Neurodevelopmental Disorders

QLT Quantitative Trait Loci qPCR Quantitative Real Time PCR

SCZ Schizophrenia

SNP Single Nucleotide Polymorphism

SNV Single Nucleotide Variant

UPR Unfolded Protein Response

WGE Whole Genome Expression

WES Whole Exome Sequencing

xvii

Acknowledgements

I am grateful to the faculty, staff and fellow students from the Department of Pathology and Laboratory medicine at UBC who have laughed with me, cried with me, and have given me the their unwavering support during my studies. I am particularly grateful for the support of my supervisor, Dr. Evica Rajcan-Separovic and the members of my committee, Dr. Bruce Verchere,

Dr. Wendy Robinson, Dr. Wan Lam and Dr. Cheryl Wellington who have spent considerable time and effort making sure I stayed the course.

I would also like to acknowledge all the members of the Rajcan-Separovic Lab, past and present, who have offered their friendship and support. In particular, I would like to thank Dr.

Ying Qiao, Dr. Jiadi Wen, Sally Martell, Emma Strong and Flamingo Tang for their help with my project. In addition, I offer my sincerest thanks to all of my collaborators who have added depth to my project in so many ways.

I am thankful for the financial support from the Department of Pathology and Laboratory

Medicine (Four Year Fellowship for PhD Students) and from external sources (Effie I Lefeaux

Scholarship in Mental Retardation, Pacific Century Graduate Scholarship) that allowed me to pursue a PhD degree. I would also like to thank the multiple funding agencies (Wellcome Trust,

CFRI Trainee Travel Grant, IHDCYH Child and Youth Health Travel Award) that provided me opportunity to travel and present my work.

Finally, I would like to thank my parents, siblings, and many friends who have supported and encouraged me. You have made all the difference.

xviii

Dedication

I would like to dedicate my dissertation to the individuals with ID and their families who have made my research possible, thank-you so much for your contribution!

xix

Chapter 1: Introduction

1.1 Intellectual Disability

Intellectual disability (ID) is defined by the American Association on Intellectual and

Developmental Disabilities (AAID) as significant limitation in both intellectual functioning and in adaptive behavior originating before the age of 18 (Luckasson et al., 2002). Intellectual function (learning, reasoning, and problem solving), measured by IQ tests generally results in test scores near or lower than 70-75 in persons with ID (American Association on Intellectual and Developmental Disabilities., 2010; American Psychiatric Association. and American

Psychiatric Association. DSM-5 Task Force., 2013). Persons with ID also experience limitations in adaptive behavior which include social (interpersonal skills, social responsibility), practical

(daily living, healthcare) and conceptual skills (literacy, money, time) (American Association on

Intellectual and Developmental Disabilities., 2010). In Canada, the AAID definition is used for diagnosing and providing services to individuals with ID (Ouellette-Kuntz et al., 2015).

The World Health Organization (WHO) estimates that between 1-3% of the general population has some form of with ID being the most common form of developmental disability (World Health Organization., 1995). Of the approximated 154 million people worldwide living with ID in 2013, 61.5% are of unknown origin (idiopathic) (Global

Burden of Study, 2015). The global incidence of idiopathic ID is predicted to be

~1.25%, although the reported incidence (0.394% to 1.12%) varies between studies (Global

Burden of Disease Study, 2015; Maulik et al., 2011; World Health Organization., 2013) and the incidence is reported to be higher (1.11% to 2.17%) in lower income countries (Maulik et al.,

2011). While the majority of individuals with idiopathic ID (~50%) have borderline and mild forms of ID, a smaller percentage are moderately (9.61%), severely (3.86%) and profoundly

1

(1.38%) impaired (IQs 35-49, 20-34, and <20 respectively) (Global Burden of Disease Study,

2015; World Health Organization., 2013). In Canada, the prevalence of individuals with intellectual and developmental disabilities is estimated to be 0.6% to 0.78% (Lin et al., 2014;

Statistics Canada., 2013). ID is a heterogeneous disorder caused by a variety of genetic and non- genetic factors (e.g. environment) that interfere with normal development or function of the (Harris, 2005; van Bokhoven, 2011). The most commonly known non-genetic factors include oxygen deprivation and infections (pre and postnatal), poor nutrition, exposure to drugs (e.g. fetal alcohol ) or neurotoxic compounds, metabolic and endocrine abnormalities, premature birth, and trauma (Harris, 2005; van Bokhoven, 2011).

1.2 Genomic Abnormalities and ID

Genetic causes of ID, estimated to be the underlying cause of 50-65% of more severe forms of ID, range from large to small (or submicroscopic) genomic imbalances and include single-gene, or monogenic, disorders (Gilissen et al., 2014; Kaminsky et al., 2011; Ropers, 2008; van Bokhoven, 2011). Large genomic imbalances are microscopically visible (>5-10 Mb) while small imbalances, called microdeletions and microduplications, cannot be detected by conventional karyotyping (<5 Mb) (Ropers, 2008; Shaffer et al., 2007a). Selected methods used to detect genomic abnormalities causing ID are listed in Table 1.1.

2

Method Resolution Benefits Limitations detects numerical and gross structural chromosomal Conventional Karyotyping aberrations (e.g. , 5-10 Mb inversions and translocations) resolution is low detects small CNVs and FISH mosaicism, rapid and ~50 kb - 400 kb inexpensive region specific localization limited to regions with probe coverage, requires multiple CGH array probes for high confidence calls, detects copy number changes differentiation between benign and 10-50 kb over entire genome pathogenic CNVs can be challenging detects SNPs and copy limited to regions with number changes over the probe coverage, requires multiple SNP array entire genome, is useful for probes for high confidence calls, detecting regions of differentiation between benign and 10-50 kb homozygosity pathogenic CNVs can be challenging

single nucleotide variants and detects many non-pathogenic NGS indels (small duplications and variants, not yet practical for large 1-10 bp deletions) scale CNV analysis

Table 1.1 Select Methods Used to Detect Genomic Abnormalities in Individuals with ID

1.2.1 Chromosomal Abnormalities

Large chromosome abnormalities (i.e. and unbalanced chromosomal rearrangements) resulting in microscopically visible (>5 Mb) gains or losses of chromosomal material can be detected in 3-5% of individuals with ID (Miller et al., 2010; Stankiewicz and

Lupski, 2010). The first chromosomal imbalance shown to cause ID was the presence of an extra , called 21 or Down’s syndrome (MIM 190685.). Trisomy 21 was discovered in 1959 and remains the most common known to cause ID

(Lejeune et al., 1959; Metcalfe et al., 2014). Additional aneuploidies (e.g. 45, X or Turner

Syndrome) and trisomies (e.g. mosaic trisomy 21 or trisomies for other not compatible with life) are also associated with ID (Metcalfe et al., 2014; Poduri et al., 2013).

3

Small chromosomal gains or losses (<5 Mb) are the underlying cause of a number of syndromic forms of ID (e.g. Prader-Willi/, Giedion-Langer syndrome,

Williams-Beurens syndrome, and DiGeorge/Velocardiofacial syndrome) (Carvill and Mefford,

2013; Schinzel, 1988). Syndromic forms of ID involving chromosomal deletions were described as early as the 1960s (e.g. cri du chat) with several more added in the 1980s and 1990s (Lejeune,

1963; Schinzel, 1988). Chromosomal rearrangements (e.g. translocations) that had similar breakpoints or shared missing chromosome segments helped determine chromosomal regions for many (de la Chapelle et al., 1981; Greenberg et al., 1984; Ledbetter et al., 1982).

Increased resolution of chromosomal banding and the development of fluorescent in situ hybridization (FISH), a technique used to detect gains or losses of very small specific chromosomal regions, helped to refine the submicroscopic chromosomal regions responsible for different syndromes (e.g. Prader-Willi/Angelman syndrome [15q11.2], Giedion-Langer syndrome [8q24.1], Williams-Beurens syndrome [7q11.23], and DiGeorge/velocardiofacial syndrome [22q11.2]) (Ledbetter et al., 1982; Pfeiffer, 1980; Schinzel, 1988; Shaffer, 2001). In addition, gains or losses of small chromosomal regions just below the telomeres, called subtelomeric rearrangements, have also been associated with ID (Joyce et al., 2001; Lese and

Ledbetter, 2001). Overall, small genomic imbalances (<5 Mb) overlapping chromosomal regions that cause recognizable syndromes or in subtelomeric regions are detected in ~10% of individuals with ID (3-7% and 2.5-4.4% respectively) (Curry et al., 1997; Ravnan et al., 2006;

Shao et al., 2008).

Unbiased detection of microdeletions and microduplications that occur over the entire genome became possible in the early 2000’s with the development and widespread application of chromosomal microarray (CMA) technology (Brown and Botstein, 1999; Pollack et al., 1999;

4

Snijders et al., 2001). CMA is a technique in which fluorescently labeled DNA samples are hybridized to an array of small fragments of DNA (probes) from the whole genome printed on a glass slide. Prior to hybridization, patient and reference DNA samples are labeled with different fluorophores so that a patient’s sample can be compared to a reference sample (Carter, 2007).

The types of microarrays used for detection of copy number changes differ in the types of probes included on the array. Array comparative genomic hybridization (CGH) is based on the use of

Bacterial Artificial Chromosome (BAC) probes (~200 kb) and longer oligonucleotide probes

(~60 mers) to detect copy number changes. SNP arrays are based on the use of shorter oligonucleotide probes (~22-50 mers) to detect both copy number and single nucleotide polymorphisms (SNPs). Because the resolution of a particular microarray is dependent on the size and number of probes in a given region, microarrays vary in their ability to detect chromosome aberrations of different sizes. Regardless of the level of resolution, CMA has led to the discovery of a large number of novel submicroscopic chromosome gains and losses, collectively called DNA copy number variants (CNVs), that occur genome-wide in the general population and in individuals with ID (Conrad et al., 2010; Cooper et al., 2011; Redon et al.,

2006; Slavotinek, 2008).

1.2.2 DNA Copy Number Variations (CNVs)

DNA copy number variations (CNVs) are small segments of DNA (>1 kb) that have a variable copy number (i.e. are gained or lost) (Feuk et al., 2006; Redon et al., 2006). CNVs can be polymorphic in nature (variation seen in normal individuals) or pathogenic, meaning that they are implicated in a disease phenotype. In the general population, CNVs are reported to occur over 13-16% of the (Itsara et al., 2009; Stankiewicz and Lupski, 2010).

However, common CNVs found in more than 1% of individuals only occur over 0.9% of the

5

genome (Itsara et al., 2009). Large CNVs found in the general population are individually rare but collectively frequent (i.e. more than 1% of individuals carry a CNV that is >1 Mb in size)

(Itsara et al., 2009).

CNVs are often found in regions that have underlying genomic architecture that facilitate rearrangements (Hastings et al., 2009b). For example, repetitive elements called low copy repeats (LCRs) or segmental duplications (SD), and long and short interspersed nuclear elements

(LINEs and SINEs), are associated with both recurrent and non-recurrent CNVs (Hastings et al.,

2009b). Recurrent CNVs have almost identical breakpoints and are similar in size while non- recurrent CNVs have variable breakpoints and are different sizes.

Recurrent CNVs can arise through non-allelic homologous recombination (NAHR) that occurs between low copy repeats (LCRs) or segmental duplications (SD) (Bailey et al., 2001).

LCRs or SDs, especially those > 10 kb in length with >97% sequence identity separated by 50 kb to10 Mb of additional genomic sequence, can cause chromosomes to be misaligned during replication creating unequal cross over events that delete or duplicate intervening genomic sequences resulting in CNVs with similar breakpoints (Stankiewicz and Lupski, 2002). Such repeat regions are found in approximately 5% of the human genome (Samonte and Eichler,

2002). Rearrangement hotspots (130 sites) predicted solely on genomic architecture (i.e. genomic sequences >10 kb in length with 95% or higher sequence identity) were identified in the human genome highlighting regions where recurrent genomic rearrangements are likely to occur (Sharp et al., 2006). Active hotspots, or regions where CNVs have actually been observed, typically occur in areas where flanking LCRs are found in direct orientation (Cooper et al., 2011). NAHR mediated CNVs can also occur in smaller hotspots, termed micro-hotspots and mini-hotspots,

6

flanked by smaller repeat elements (1-10 kb and 100 bp respectively), and in regions enriched for

Alu repeats (Girirajan et al., 2013).

In contrast to recurrent CNVs, the majority of CNVs are individually rare and are not generally mediated by LCRs suggesting that mechanisms other than NAHR play a major role in

CNV formation (Hastings et al., 2009a; Kaminsky et al., 2011). Most non-recurrent CNVs have little to no homology near their breakpoints and occur in chromosomal regions with complex genomic architecture (i.e. are often found in regions containing LCRs in direct and inverted orientation or in regions with LINEs and SINEs) (Hastings et al., 2009a). The presence of non- recurrent CNVs near complex genomic architecture suggests that secondary structures formed during DNA replication (mediated by LCRs) and/or increased frequency of DNA breaks (seen in regions with LINEs and SINEs) may be involved in CNV formation (Hastings et al., 2009b).

Several mechanisms such as non-homologous end joining (NHEJ) and microhomology mediated repair mechanisms are proposed to contribute to the formation of non-recurrent CNVs (Hastings et al., 2009a; Kaminsky et al., 2011; Lee and Scherer, 2010) .

NHEJ is thought to be involved in CNV formation when no homology is detected at

CNV breakpoints while microhomology mediated repair mechanisms are implicated when limited homology is observed at the CNV breakpoints. In NHEJ, double stranded DNA breaks are repaired by rejoining non-homologous ends of the broken DNA strand leaving a small molecular scar (1-4 nucleotides) (Gu et al., 2008; Weterings and van Gent, 2004).

Microhomology mediated break-induction replication (MMBIR) and fork stalling and template switching (FoSTes) are two mechanisms proposed to cause CNVs in regions with limited homology (2-14 bp) (Hastings et al., 2009a; Zhang et al., 2009). In MMBIR CNVs (duplications or deletions) form when a break in the leading or lagging strand from a collapsed or broken

7

replication fork enters a new replication fork (Hastings et al., 2009a). In FoSTes, CNVs with complex structure (deletions/duplications interrupted by different copy number segments such as balanced or triplication) form when the lagging strand disengages from the template of a stalled replication fork and anneals to another replication fork in close proximity and restarts DNA synthesis (Stankiewicz and Lupski, 2010; Zhang et al., 2009). While it is not always possible to determine which DNA elements are responsible for increased genomic instability or how they facilitate CNV formation, it is clear that non-recurrent genomic rearrangements are stimulated by local genomic architecture (Hastings et al., 2009a; Stankiewicz and Lupski, 2010).

1.2.2.1 CNV Categories

CNVs occur in both phenotypically normal individuals as well as in individuals with ID or other neurodevelopmental disorders (NDDs). Databases for cataloguing CNVs that occur in the general population or those found in individuals with ID, developmental delay and/or congenital anomalies help to distinguish normal from disease-causing (pathogenic) variants. The

Database of Genomic Variants (DGV; http://dgv.tcag.ca/dgv/app/home) catalogues variants that occur in the general population while the Database of Chromosomal Imbalance and Phenotype in

Humans Using Ensembl Resources (DECIPHER; http://decipher.sanger.ac.uk/) and the

European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations

(ECARUCA; http://umcecaruca01.extern.umcn. nl:8080/ecaruca/ecaruca.jsp) are both databases that collect rare chromosomal aberrations from genetic centers world-wide (Feenstra et al., 2006;

Firth et al., 2009; MacDonald et al., 2014). Guidelines published recently recommend defining

CNVs into three categories based on their clinical relevance: benign, pathogenic, and variants of uncertain significance (VOUS) (Kearney et al., 2011; Miller et al., 2010). A brief definition of each category is provided below.

8

Benign CNVs occur in phenotypically normal individuals. They cover between 0.02 to

3.7% of the genome (i.e. 540 kb to 112 Mb of DNA can be affected by benign CNVs) (Conrad et al., 2010; Itsara et al., 2009). Thus, due to CNVs, 0.78% of the genome (~24 Mb) can vary between two individuals (Conrad et al., 2010). While the presence of a CNV in the general population may indicate that it is benign, only those that occur in the general population at a frequency greater than 1% (common polymorphism) and are reported in multiple peer reviewed publications should be considered benign (Kearney et al., 2011). In addition, it is important to note the size and dosage (e.g. duplication or deletion) of the CNV reported in the general population when interpreting CNVs found in a patient (Kearney et al., 2011). Benign CNVs show a bias away from enhancers and ultra-conserved elements and contain few (if any) known genes or regulatory elements associated with disease (Conrad et al., 2010; Itsara et al., 2009).

Pathogenic CNVs are those considered to be causative for a disease or condition. They include CNVs that overlap established clinical syndromes and/or are reported to be present in similarly affected individuals in multiple publications (Kearney et al., 2011; Miller et al., 2010).

A CNV is more likely to be pathogenic if it is large (>400 kb), de novo (not seen in either parent), is a deletion, and/or is gene rich increasing the probability that it contains known dosage sensitive or disease-causing genes (Cooper et al., 2011; Kaminsky et al., 2011; Vissers et al.,

2010b). However, smaller CNVs with relevant gene content, and/or proximity to regions that have been associated with the reported phenotype should also be considered causative (Kearney et al., 2011; Miller et al., 2010). Unique familial CNVs (CNVs inherited from a normal parent in a single family), not seen in the general population, can also be considered putatively pathogenic as they can cause an abnormal phenotype due to variable expressivity, imprinting, or because

9

they can contain recessive genes which may be mutated on the second allele in the affected proband, but not the unaffected parent (Kearney et al., 2011).

Variants of unknown clinical significance (VOUS) include CNVs that cannot be classified as either benign or pathogenic because they occur at low frequencies in the general population or are not yet associated with human disease respectively (Kearney et al., 2011; Miller et al.,

2010). VOUS are reported in 9.3% of individuals with DD/ID, ASD, and/or multiple congenital anomalies (Kaminsky et al., 2011). Duplications are more frequently reported as VOUS than deletions and 72% of identified VOUS are parental in origin (Girirajan et al., 2012; Kaminsky et al., 2011). A large CNV found in an individual with ID that contains genes not yet associated with human disease, reported in only a few individuals is an example of a VOUS because it cannot be labeled due to lack of strong evidence of association with human disease.

1.2.2.2 Clinical Relevance of CNVs in ID

Over the past 10 years, a large number of CNV studies in individuals with ID have contributed to our knowledge of CNVs associated with ID. Meta-analysis of more than 30 studies showed that pathogenic genomic imbalances were detected by CMA in 12.2% (21,698 individuals) to 14.7% (15,749 individuals) of individuals with ID (Kaminsky et al., 2011; Miller et al., 2010). The widespread use of CMA, now a first-line clinical test for individuals with ID, and cataloguing of CNVs in the databases mentioned above, has led to the identification of more than 20 new genomic disorders that cause ID (Carvill and Mefford, 2013; Shaffer et al., 2007b).

Some of the new microdeletion/microduplication disorders are due to recurrent pathogenic

CNVs while others are caused by non-recurrent pathogenic CNVs that map to the same chromosomal region (Girirajan et al., 2013; Sharp et al., 2006; Vissers and Stankiewicz, 2012).

In addition to pathogenic CNVs associated with syndromic ID, a number of CNVs are

10

considered to be predisposing to disease as they occur at higher frequencies in individuals with

NDDs and congenital anomalies than in the general population (Coe et al., 2012; Girirajan and

Eichler, 2010; Kaminsky et al., 2011).

One example of a newly discovered ID , Koolen-De Vries syndrome (MIM 610443.), is caused by recurrent deletions of a 480 kb genomic segment at chromosome position 17q21.31 (Cooper et al., 2011; Koolen et al., 2008; Koolen et al., 2006).

Individuals with 17q21.31 deletions have consistent clinical features that include developmental delay, hypotonia, distinct facial dysmorphisms (long face, pear-shaped nose, bulbous nasal tip), friendly behavior, , heart defects, and kidney/urologic abnormalities (Cooper et al.,

2011; Koolen et al., 2008). Small, atypical deletions of 17q21.31 narrowed down the critical region to a genomic segment containing 3 genes, MAPT, STH and KANSL1 (Cooper et al., 2011).

Mutations in KANSL1, found in individuals with similar phenotypes who did not have 17q21.31 deletions, confirmed that the syndrome is a monogenic disorder caused by haploinsufficiency of

KANSL1 (Koolen et al., 2012; Koolen et al., 2015; Moreno-Igoa et al., 2015). The prevalence of the 17q21.31 deletion in individuals with unexplained ID is estimated to be 0.64% (95% confidence interval 0.35% to 0.93%) (Koolen et al., 2008). Overall, no recurrent pathogenic

CNVs have been found at frequencies greater than 1% in individuals with ID (Carvill and

Mefford, 2013; Itsara et al., 2009; Ropers, 2008; Sharp et al., 2006) .

Pathogenic CNVs of variable size can also cause distinct phenotypes when they are found within the same genomic region. For example, our lab described a new syndrome (2p15-16.1 microdeletion syndrome) where partially overlapping deletions at 2p15-16.1 caused similar phenotypic consequences in two unrelated individuals with ID (Rajcan-Separovic et al., 2007).

Additional cases reported in the literature, all with different breakpoints and variable genomic

11

size, suggest that the presence of a CNV at chromosomal position 2p15-16.1 can lead to shared phenotypic features in multiple affected individuals (Chabchoub et al., 2008; de Leeuw et al.,

2008; Fannemel et al., 2014; Felix et al., 2010; Florisson et al., 2013; Hancarova et al., 2013;

Hucthagowder et al., 2012; Liang et al., 2009; Peter et al., 2014; Piccione et al., 2012; Prontera et al., 2011). A review of the genomic and clinical findings for 2p15-16.1 microdeletion carriers

(n=23) is provided in Chapter 2.

Finally, some recurrent CNVs are considered to be predisposing to disease if they are associated with variable clinical consequences, including a normal phenotype. Predisposing

CNVs can be unique to an individual (de novo) or familial (inherited), although some are also found in the general population. Predisposing CNVs are classified as putatively pathogenic if they occur at significantly higher frequencies in individuals with ID or other NDDs compared to the general population (Girirajan et al., 2013; Kaminsky et al., 2011). The cause of clinical variability associated with predisposing CNVs is unknown. However, in ~20% of cases with predisposing CNVs, a second large CNV (>500 kb) was found (Girirajan and Eichler, 2010;

Girirajan et al., 2012). Secondary CNVs are more likely to occur in individuals with ID and increase the burden of copy number affected genes which can increase the severity of observed phenotypes (Girirajan and Eichler, 2010; Girirajan et al., 2012). Predisposing CNVs transmitted from a normal parent may contain imprinted genes that have parent of origin specific expression

(see below) or unmask a mutation on the second allele that, when combined with a CNV, renders the gene non-functional (Albers et al., 2012; Kearney et al., 2011). Predisposing CNVs (both deletions and duplications) mapping to chromosomal region 1q21.1 was one area of focus during my PhD research project. Details about the 1q21.1 CNV are provided in Chapter 3.

12

1.2.2.3 Mechanisms by Which CNVs Cause Abnormal Phenotypes

1.2.2.3.1 CNVs Affect Gene Expression

Several studies have shown that CNVs cause changes in gene expression not only for genes integral to a CNV and for genes in close proximity to a CNV (“flanking” genes, within 2-7

Mb) but also genome wide (Freeman et al., 2006; Henrichsen et al., 2011; Luo et al., 2012;

Stranger et al., 2007). Stranger et al. (Stranger et al., 2007) estimated that CNVs found in the general population alter gene expression for ~18 % of genes within or in close proximity (~2

Mb) to their boundaries. In a more recent study, Luo et al. (Luo et al., 2012) found that genes within CNVs are more likely to have abnormal expression, with ~11% of CNVs studied containing at least one gene with significantly altered expression. For the majority of genes with altered expression (>90%), the observed expression change was in the same direction as the copy number (Luo et al., 2012). The same authors noted that a significantly higher number of genes with altered expression were observed in rare de novo CNVs found in individuals with ASD, supporting the idea that de novo CNVs are more likely to be pathogenic. In contrast, inherited

CNVs in individuals with ASD contained significantly fewer genes with altered expression (Luo et al., 2012). Finally, deletions contained more genes with altered expression compared to duplications supporting the idea that it is easier to compensate for increased dosage vs decreased dosage (Lee and Scherer, 2010; Luo et al., 2012).

Studies of a well-known recurrent pathogenic CNV, a deletion of 7q11.23 leading to

William-Beuren Syndrome (WBS), provide additional evidence that altered gene expression is likely a feature of genomic syndromes (Merla et al., 2006; Reymond et al., 2007). The WBS deletion has been shown to affect the levels of the majority of genes within the

13

hemizygous deletion region and for a handful of genes flanking the CNV region, but also to cause global gene dysregulation for >800 genes (Henrichsen et al., 2011; Merla et al., 2006).

Changes in gene expression seen near the boundary of a CNV or in the rest of the genome may be caused by a variety of factors including CNV position effects and/or disruption of relevant regulatory regions (i.e. long range enhancer and repressor elements) (Reymond et al.,

2007; Stranger et al., 2007). In addition the presence of a haplosensitive within the CNV (e.g. c-REL) may also lead to abnormal gene expression of target genes

(Grigoriadis et al., 2011). In addition, Gheldof et al. (Gheldof et al., 2013) propose that CNVs cause conformational changes that disrupt long-range intrachromosomal gene interactions and change gene accessibility explaining how the effects of a CNV can extend past its borders.

In Chapters 2 and 3 of my dissertation I describe the use of whole genome expression for determining the effect of two CNVs (2p15-16.1 and 1q21.1) on the expression of the genes they contain, as well as genome-wide.

1.2.2.3.2 CNVs Affect Imprinted Genes

Imprinted genes are epigenetically regulated so that gene expression preferentially occurs from the maternally or paternally inherited chromosome (Weaver et al., 2009). Many imprinted genes have important functions in growth and early development (Weaver et al., 2009).

However, the role of imprinted genes has largely been studied in mouse, and the full extent of imprinting in is less characterized. To date, approximately 150 imprinted genes have been identified in mouse while only half that many (~80) have been found in humans (Catalogue of Parent of Origin Effects., 2011; Morison et al., 2005; Williamson C.M. et al., 2013). In the case of familial CNVs, the occurrence of phenotypic abnormalities in the child may depend on

14

which parent transmits the CNV (Sharp et al., 2008). For example, when an imprinted gene that is predominantly expressed from the maternal gene copy is disrupted by a maternal CNV, transmission of the CNV to the child can cause loss of gene function (LOF). This may lead to an abnormal phenotype in the child if the gene is critical to development. Several well-known genomic syndromes are caused by generally large CNVs (~2 Mb deletions) that overlap with imprinted regions (e.g. Prader-Willi [caused by paternal deletion]/ Angelman syndrome [caused by maternal deletion] of chromosome 15q11-13) (Knoll et al., 1994; White et al., 1996).

Genomic regions that have allele-specific methylation patterns are called differentially methylated regions (DMRs). Parent of origin dependent DMRs are associated with imprinted genes and are thought to regulate parent of origin gene expression (Calaway et al., 2012; Lawson et al., 2013; Weaver et al., 2009). Determining if a CNV involves a novel imprinted gene or

DMR is challenging as the expression of imprinted genes can be tissue specific and/or development stage-dependent (Albrecht et al., 1997; Reik and Walter, 2001), so parent of origin expression analysis in the non-affected tissues (e.g. blood in case of a patient with neurodevelopmental delay) is not always reliable. Nonetheless, most primary DMRs regulating the expression of imprinted genes are typically maintained in multiple tissues even when the gene is not expressed and can be identified through comparisons of methylation patterns found in tissues with unbalanced parental genomic contributions. For example, this approach was used to identify DMRs in digynic (two copies of maternal and one of paternal genome) and diandric

(two copies of the paternal and one of maternal genome) triploid placental tissues, or in reciprocal uni-parental disomies, UPDs (Hanna et al., 2015; Nakabayashi et al., 2011; Yuen et al., 2011).

15

As an alternative to direct methylation and allelic expression analysis of potential imprinted genes in the CNV, it is possible to study chromatin conformation of specific regions on the maternal and paternal chromosome (i.e. whether the region is more or less compact) as a reflection of the chromatin “capacity’ to facilitate gene expression. For example, in the FISH

Replication Timing (FISH-RT) assay, the chromatin structure is assessed using probes located within a specific chromosomal region (Selig et al., 1992). Regions that are less compact will have two FISH signals for the probe (because DNA can be replicated earlier) while regions that are more compact DNA will have one FISH signal indicating that the DNA replicates later (Selig et al., 1992). For many imprinted genes, one parental allele replicates earlier and has an open chromatin structure, reflecting expression in comparison to the other parental allele (Kitsberg et al., 1993; Knoll et al., 1994). Therefore, FISH-RT in imprinted regions results in a large number of cells with a discrepant FISH signals (i.e. 2:1) for the chromosome where the replication timing, and therefore chromatin condensation, is different for the maternal or paternal chromosome.

In my PhD research project I used FISH-RT in selected CNV regions as a proxy for chromatin condensation and parental allele specific gene expression to assess the possibility that the CNV contained novel imprinted genes. My results are described in Chapter 4. In addition, I compared the familial CNVs and de novo CNVs detected in our ID cohort with high confidence differentially methylated regions (DMRs) detected in Dr. Robinson’s laboratory (Hanna et al.,

2015), in order to determine whether there is an overlap of our ID CNVs with possible imprinted regions.

16

1.2.2.3.3 Use of Animal Models to Study CNVs Dosage Effects

CNVs may occur in genomic regions containing known or novel ID candidate gene(s) that are sensitive to copy number changes (dosage sensitive). For example, a recurrent CNV

(microdeletion) on chromosome 8q12 in patients with CHARGE syndrome led to the identification of the dosage sensitive gene, chromodomain DNA-binding protein 7

(CHD7) as the cause of the syndrome (Vissers et al., 2004). Functional studies in knock-out mice confirmed that knockdown of CHD7 or overexpression of an inactive form of CHD7 reproduces the major phenotypes seen in CHARGE syndrome (Bajpai et al., 2010). Similarly, the CNV from

17q21.3 results in reduced expression of the gene, KANSL1, in LBCs from individuals with the

CNV and learning deficits were found in with KANSL1 mutations confirming its causative role in the Koolen-de Vries syndrome (Koolen et al., 2012).

The use of other animal models (e.g. zebrafish, C. elegans) is also gaining popularity to confirm dosage sensitivity of genes integral to CNVs. For example, Golzio et al. (Golzio et al.,

2012) used a zebrafish model to show that gene expression changes of KCTD13, a gene from the

16p11.2 CNV, resulted in a mirror phenotype (causing when KCTD13 was overexpressed and macrocephaly when it was underexpressed) comparable to head size phenotypes found in individuals with 16p11.2 CNVs. Carvalho et al. (Carvalho et al., 2014) also used a zebrafish model to identify 4 genes that affect neurodevelopment and cause phenotypes comparable to those seen in individuals with the CNV by knocking down all integral genes from the 17p13.1 CNV. Similarly, C. elegans was used to study an effect of a CNV gene knock down

(CNTNAP4) implicated in longevity (Iakoubov et al., 2013). During the course of my PhD research project I used C. elegans to study the phenotypic consequences of dosage changes for one of the genes from the 2p15-16 CNV. Results of this study are described in Chapter 2.

17

1.3 Single Gene Mutations and ID

In addition to genomic imbalances detected by chromosome or microarray-based analysis, mutations in single genes can cause ID; for review see Bamshad et al. (Bamshad et al.,

2011). Traditionally, genes responsible for ID were identified using linkage analysis in families with multiple affected members and/or through searching for genes affected by breakpoints of large chromosome abnormalities (e.g. translocations) (Lander and Botstein, 1987; Schinzel,

1988). Currently, advances in next generation sequencing (NGS) have made it possible to identify mutations in known and novel candidate genes genome-wide in a single experiment. The use of NGS has increased the diagnostic yield for ID to ~27% (Wright et al., 2015). In my PhD research project, I used NGS to complement the analysis of CNVs looking for mutations in the

CNV region and in the rest of the genome that could contribute or alter phenotype(s). My results are described in Chapters 2 and 3. Below, I briefly describe the principle and use of NGS in general and in ID diagnosis.

1.3.1 Next Generation Sequencing (NGS)

Next generation sequencing (NGS), allows a rapid high-throughput sequencing of short

DNA reads covering millions of bases in a single experiment (Mardis, 2008; Shendure and Ji,

2008). Briefly, in NGS, randomly fragmented DNA is attached to platform with specific adaptors, is clonally amplified, and then sequenced in massive parallel reactions resulting in multiple reads from each fragment in a single experiment (Mardis, 2008; Valencia, 2013). The reads, typically 90-120 bp in length, are aligned to a reference genome sequence (Metzker, 2010;

Shendure and Ji, 2008). NGS can be used to sequence an entire genome (whole genome) or the protein coding part of the genome (whole exome).

18

The human exome encompasses ~2% of the total genome sequence and is predicted to contain a high percentage (~85%) of mutations that cause deleterious functional consequences

(Bamshad et al., 2011; Botstein and Risch, 2003; Foo et al., 2012; Kryukov et al., 2007). The exome is also reported to have a higher mutation rate (~29%) than the rest of the genome (Rauch et al., 2012). Whole exome sequencing (WES) therefore allows for the identification of variants enriched for those most likely to cause phenotypic consequences and can lead to the discovery of unanticipated genes and/or pathways in smaller samples sizes than previously required (Bamshad et al., 2011; Topper et al., 2011). WES is used as a tool to look for deleterious variants in both

Chapters 2 and 3 of my dissertation.

The sheer number of variants detected by WES per genome can be overwhelming and obscure the identification of causative variants (Gilissen et al., 2012). Multiple strategies, reviewed in Bamshad et al. (Bamshad et al., 2011), Gilissen et al. (Gilissen et al., 2012),

Robinson et al. (Robinson et al., 2011), and Topper et al. (Topper et al., 2011), have been developed to prioritize variants most likely to be responsible for disease phenotypes. In most cases, prioritization consisted of filtering out variants most likely to be false positives based on quality scores (e.g. read depth for an entire region as well as number of reads showing the variant), removing variants occurring in the general population and those that are synonymous and located outside of known coding regions. Variant prioritization leaves a more manageable amount of data and identifies mutations with a large effect by focusing on unique/rare variants in the coding region of the genome with predicted consequences for a protein (Bamshad et al.,

2011; Ng et al., 2010; O'Roak et al., 2011). Additional strategies such as de novo variant identification and family based strategies that make use of inheritance patterns (dominant or recessive), can be applied to further reduce the number of candidate variants in order to facilitate

19

disease-gene identification (de Ligt et al., 2012; Gilissen et al., 2012; Musante and Ropers,

2014).

1.3.2 Clinical Relevance of WES in ID

The use of WES to discover disease genes was first reported by two research groups in

2009 (Choi et al., 2009; Ng et al., 2009). Subsequent exome studies were successful in identifying unknown causes of Mendelian disorders (e.g. Miller Syndrome (Ng et al., 2010)) including those responsible for syndromic forms of ID (e.g. Kabuki syndrome (Ng et al., 2010) and Schinzel-Giedion syndrome (Hoischen et al., 2010)). More recent WES studies have shown that rare de novo mutations, including single nucleotide variants (SNVs) and small insertions and deletions (indels) in known or novel ID candidate genes play a role in the etiology of ID in ~30-

50% of cases (de Ligt et al., 2012; Gilissen et al., 2014; Gilissen et al., 2012; O'Roak et al., 2012;

Rauch et al., 2012; Vissers et al., 2010a). Results from these studies indicate that a large number of genes (400 to 1000), many of which are still not identified, may be associated with ID (de Ligt et al., 2012; van Bokhoven, 2011). WES has helped identify mutations within a CNV and in the rest of the genome as causes of phenotypic variability in ID cases with the same CNV (Albers et al., 2012; McDonald-McGinn et al., 2013). I describe several examples in detail in Chapter 3.

1.4 Research Goal, Hypothesis, Objectives, and Overall Significance

The overall goal of my PhD research project was to identify novel genes that cause ID through the characterization of genes from CNVs found in individuals with ID.

I hypothesized that CNVs cause changes in the function of genes or regulatory elements located within the CNV; if these genes or regulatory elements play a role in nervous system development and/or function their defect will cause ID. I also hypothesized that changes in genes outside CNVs (modifiers) may cause ID alone or when combined with a CNV.

20

My objectives were a) to perform functional genomic analysis of de novo pathogenic

CNVs at chromosomal position 2p15-16.1 and familial CNVs at chromosomal position 1q21.1 in order to identify candidate genes that cause ID and b) to look for changes in the rest of the genome (modifiers) that contribute to ID in individuals with the above CNVs. I used a multi- faceted approach that included the study of:

1) Gene expression for genes integral to a CNV and genome wide,

2) Knock down effect of a gene from a CNV in C.elegans,

3) Sequence variants in genes from CNV regions and genome wide using NGS,

4) Bioinformatics analyses of CNV genes, and

5) Imprinting potential of CNVs and their integral genes,

I present my work in 3 chapters:

Chapter 2 describes my analysis of individuals with 2p15-16 CNVs

Chapter 3 describes my analysis of individuals with 1q21.1 CNVs and

Chapter 4 describes my studies of the imprinting potential of familial CNVs and their

associated genes detected in patients with ID.

Overall, this body of work will contribute to a more advanced understanding of how

CNVs affect the function of their integral genes and contribute to the identification of ID candidate genes and pathways.

21

Chapter 2: Investigations of the 2p15-16.1 Microdeletion

2.1 Background

The 2p15-16.1 microdeletion syndrome (MIM 612513.) is a recently recognized genomic disorder. It was first described in two phenotypically similar individuals with idiopathic ID studied by CMA (Rajcan-Separovic et al., 2007). The two individuals, one male and one female, shared a large deleted region (~6 Mb) that contained 25 genes (18 coding and 7 non-coding).

They had strikingly similar phenotypes affecting cognition (ID, DD, autism, speech and language delays) and a number of physical features (microcephaly, cortical dysplasia, optic nerve hypoplasia, renal anomalies, and camptodactyly) which included distinct facial dysmorphisms

(ptosis, long and straight eyelashes, shortened palpebral fissures, a broad/high nasal root and tip, smooth and long philtrum, and large ears). Since the initial report of two cases in 2007 (Rajcan-

Separovic et al., 2007), detailed clinical phenotypes for thirteen additional individuals with 2p15-

16.1 deletions were published by the end of my data collection (April 2014) (Chabchoub et al.,

2008; de Leeuw et al., 2008; Fannemel et al., 2014; Felix et al., 2010; Florisson et al., 2013;

Hancarova et al., 2013; Hucthagowder et al., 2012; Liang et al., 2009; Peter et al., 2014; Piccione et al., 2012; Prontera et al., 2011). Within my data collection period, several individuals with

2p15-16.1 microdeletions (>10) were also reported in databases that catalogue CNVs in individuals with ID, congenital anomalies, and/or ASD (DECIPHER, ISCA, UNIQUE). These cases were not included in my analysis because they were either duplicates of published cases or because the clinical findings for these cases were not detailed enough to compare to cases reported in the literature. Below I summarize the clinical and genomic findings for the published cases. The addition of new cases has helped to refine the common syndromic and genomic features and to highlight candidate genes.

22

2.1.1 Clinical Findings, Published Cases

Delayed neurocognitive development, which includes mild to severe intellectual disability (ID) and developmental delay (DD), is present in all reported individuals (15) with

2p15-16.1 microdeletions. Additionally, delayed language skills were noted in all individuals old enough to be assessed (93%) (Supplementary Table 2.1, part A; Appendix A). The next most frequently reported phenotypic finding was small head size (OCF <10th per centile): 9/15 individuals have congenital microcephaly (OCF <3rd per centile) and 4/15 individuals have head sizes well below average (OCF 5-10th per centile). Only two individuals, one with head size reported to be normal (Chabchoub et al., 2008) and the other without any mention of head size

(Peter et al., 2014), are considered to have normal head size. Other commonly reported phenotypes (>50%) reported in individuals with 2p15-16.1 microdeletions include feeding problems along with craniofacial and digital abnormalities. Commonly reported craniofacial features (>50%) include dysmorphisms of the head (most commonly bitemporal narrowing but also brachycephaly, scapocephaly, or short forehead), eyes (telecanthus, epicanthal folds, ptosis, hypertelorism, short palpebral fissures), mouth (smooth and long philtrum, everted lower lip, high narrow palate or other palate abnormalities), and nose (broad/high nasal root). Digital abnormalities (camptodactyly and/or metatarsus abductus) were also reported in ~50% of individuals. (Supplementary Table 2.1, Part A; Appendix A)

2.1.2 Genomic Findings, Published Cases

Reported microdeletions of chromosome 2p15-16.1 span ~9.8 Mb (55.58 Mb – 65.44

Mb, hg19) (Florisson et al., 2013) and are extremely variable in size, ranging from 203 kb (Peter et al., 2014) to 7.89 Mb (Rajcan-Separovic et al., 2007) with average and median deletion sizes of 3.20 and 3.14 Mb respectively (Supplementary Table 2.2, Part A; Appendix A). All 15 2p15-

23

16.1 microdeletions are de novo in origin and, for the 5 cases where inheritance was studied, occurred on the paternal chromosome (Supplementary Table 2.2, Part A; Appendix A).

Breakpoints reported for 2p15-16.1 microdeletions are highly variable and occur in genomic regions where there is little , although a few deletion breakpoints cluster around segmental duplications and/or LINE-1 elements (Chabchoub et al., 2008; Liang et al., 2009; Liu et al., 2011). In silico analysis by Liu et al. (Liu et al., 2011) showed that LCR sequences were absent from the regions flanking the 2p15-16.1 deletions in the first two reported cases (Rajcan-Separovic et al., 2007), although a large number of repetitive elements (108) were found within the regions surrounding the breakpoints. In one case (Rajcan-Separovic et. al. 2007

[2]), a LINE-1 repeat element (2,960 bp in length; 87.9% homology) was found at both ends of the deletion while in the other case (Rajcan-Separovic et al. 2007 [1]) sequence homology at the breakpoints was limited (>200 bp; ≤60%) (Liu et al., 2011). NAHR between the LINE-1 repeat elements may have caused the 2p15-16.1 deletion in one individual, but the absence of large stretches of sequence homology at the breakpoints of the other deletion points to additional CNV causing mechanisms (Liu et al., 2011). It is possible that the underlying genomic architecture in the region increases susceptibility for DNA breaks and/or replication fork stalling and that microhomology-mediated repair mechanisms (e.g. FoSTeS and MMBIR) are responsible for the variable breakpoints seen for 2p15.16.1 microdeletions (Verdin et al., 2013).

Several attempts have been made to define a critical region for the 2p15-16.1 microdeletion syndrome based on the smallest region of genomic overlap in reported cases. For example, Liang et al. (Liang et al., 2009) proposed a minimal critical region of ~2.5 Mb (59.24 to 61.79 Mb, hg19) for the 2p15-16.1 microdeletion syndrome based on comparing the genomic overlap for the first 5 reported cases (Chabchoub et al., 2008; de Leeuw et al., 2008; Liang et al.,

24

2009; Rajcan-Separovic et al., 2007) (Figure 2.1). More recently, Hucthagower et al.

(Hucthagowder et al., 2012) suggest that the critical region be redefined to a genomic segment

~1.1 Mb (60.53 to 61.60 Mb, hg19) in length based on genomic overlap of their proband with 6 previously reported cases. The addition of newer cases, one completely outside of both proposed critical regions (Prontera et al., 2011) and several that partially overlap the redefined 1.1 Mb critical region without overlapping each other, (Piccone et al. 2012 [1], Peter et al. 2013,

Hancarova et al. 2013 , Fannemel et al. 2014 ), have complicated efforts to delineate a smallest region of overlap for the 2p15-16.1 microdeletion syndrome and pinpoint candidate genes for the phenotypes seen in 2p15-16.1 microdeletions carriers. Nevertheless, several genes were frequently deleted in reported cases (e.g. USP34 & XPO1). Gene content and frequency in deletions for all reported cases included in this chapter are provided together with information on gene content and frequency for new cases in my results section (Table 2.4, Parts A & B). The function of the most frequently deleted genes in reported and new cases is also presented in my results section, Table 2.5.

25

Figure 2.1 Genomic Overlap of Published 2p15-16.1 Microdeletion Cases Microdeletions (red) for the 15 published cases included in this study and the two proposed critical regions (black) (Liang et al. 2009 and Hucthagower et al. 2012) are shown in the UCSC genome browser (hg19). Thick red bars indicate the deletion region while thin red bars indicate the possible extension of the deleted region based on the position of the next balanced qPCR probe (Florisson et al. 2013). Additional tracks shown include RefSeq genes, Duplications >1000 bases (Segmental Dups), and CNVs from the Database of Genomic Variants.

26

2.2 Chapter Goals

My goals in performing the studies in this chapter were to help further characterize the

2p15-16.1 microdeletion syndrome and to search for candidate genes that help explain phenotypic features of the syndrome. Specifically, I set out to:

a) Refine the 2p15-16.1 syndrome by comparing phenotypes of our newly recruited

individuals (N=8) to those of previously published cases (N=15),

b) Narrow down the number of candidate genes from the 2p15-16.1 region that may be

involved in causing ID and the common phenotypic features of the syndrome,

c) Explore the functional consequence of a deletion on candidate genes identified using

a multifaceted approach (RNA and protein expression and gene knock-down), and

d) Explore additional causes of CNV effect such as unmasked mutations within the

2p15-16.1 region and preferential parental origin.

2.3 Materials and Methods

2.3.1 Clinical and Genomic Data Acquisition

Clinical findings for our 2 published cases (Rajcan-Separovic et al., 2007) and 13 additional published cases were taken from case descriptions and tables provided by the authors

(Chabchoub et al., 2008; de Leeuw et al., 2008; Fannemel et al., 2014; Felix et al., 2010;

Florisson et al., 2013; Hancarova et al., 2013; Hucthagowder et al., 2012; Liang et al., 2009;

Peter et al., 2014; Piccione et al., 2012; Prontera et al., 2011).

Clinical findings for new cases were initially extracted from patient charts by our research associate (Dr. Ying Qiao), double checked by me, and confirmed by a clinical cytogeneticist (Dr. Evica Rajcan-Separovic) and a clinical geneticist (Dr. Suzanne Lewis).

Microdeletion breakpoints for all reported cases were taken from the literature and converted to

27

hg19 when necessary using the UCSC liftover tool (http://genome.ucsc.edu/cgi- bin/hgLiftOver?hgsid=383167375_z3JqWACKB5caT2Jl4qyyZWoEa92t). Our first two cases

(Rajcan-Separovic et al., 2007) and the case reported by Chabchoub et al. (Chabchoub et al.,

2008) were re-run using higher resolution arrays to more accurately delineate breakpoints.

Individuals newly recruited to our study (New Cases 1-8) were run on a variety of high resolution arrays (described in section 2.3.3) either in house or by commercial companies. A summary of breakpoints, array platforms used, and additional genomic information for 2p15-

16.1 microdeletion carriers are provided in Supplementary Table 2.2, Appendix A.

A custom track was used to display the genomic overlap for all cases (N=23) in the

UCSC genome browser (hg19). The RefSeq gene track, segmental duplication track, and the track showing the variants reported in the database of genomic variants were turned on for the same display. Gene content for each case was determined by using the UCSC table browser with the following specifications (clade: mammal, genome: Human, assembly: Feb. 2009

(GRCh37/hg19), group: Genes and Gene Predictions, track: RefSeq Genes, table: refGene, region: position [breakpoints]) to return the Gene IDs in each deletion.

2.3.2 DNA, RNA and Protein Extractions

Whole Blood: DNA from individuals with 2p15-16.1 microdeletions and their family members was extracted from peripheral blood collected in EDTA tubes using the ArchivePure

DNA Purification kit (5 PRIME). Concentration and purity of DNA samples was measured using a NanoDrop spectrophotometer (ND-1000, software v.3.8.1) and quality of samples, based on high molecular weight, confirmed on an agarose gel.

RNA from whole blood was extracted from samples collected in Tempus Blood RNA

Tubes (Applied Biosystems, Cat. No. 4342792) which were stored at -20°C prior to RNA

28

extraction. Tempus Blood RNA Tubes were allowed to thaw on ice and RNA was isolated using the Tempus Spin RNA Isolation Reagent Kit (Applied Biosystems, Cat. No. 4378926) according to the manufacturer’s protocol with the optional DNase treatment (Applied Biosystems, Cat. No.

4305545) included at step 9.

For all RNA samples, a nanodrop spectrophotometer (ND-1000, software v.3.8.1) was used to determine an estimate of RNA concentration and purity (260/280 ≥ 2.0, 260/230 ≥ 1.75).

RNA samples were diluted so that they were between 200-250 ng/L and were then aliquoted to maintain RNA integrity. A 4 L RNA sample was sent to the core facility and analyzed using the

Eukaryote Total RNA Nano assay on a Bioanalyzer 2100 (Applied Biosystems). RNA samples with acceptable RNA index numbers (RIN) were used for whole genome expression arrays and/or real-time quantitative PCR (qPCR) experiments.

Transformed lymphoblasts: In order to transform and maintain cell cultures to obtain

RNA and protein the following steps were performed: 2-3 mL of whole blood collected in sodium heparin tubes was separated based on Ficoll/Hypaque and white cells washed twice with

PBS. White cells were then cultured at 37°C under 5% carbon dioxide (CO2) in a T25 flask using the following medium: 5 mL RPMI-1640, 15% Fetal calf serum, 10% conditioning medium from an epstein barr virus (EBV) producing cell line, and Cyclosporin A (final concentration of 1

g/mL). Every 5-7 days, half the medium was removed and replaced with fresh medium containing Cyclosporin A (final concentration of 1 g/mL). Transformation was assumed to occur in approximately 4 weeks and Cyclosporin A was omitted after this point. Newly transformed lymphoblastoid cell lines (LBCs) were left in medium: 5 mL RMPI-1640, 10-15%

Fetal bovine serum (FBS), and 1% Penicillin Streptomycin (Pen Strep) for roughly 1 week after omission of Cyclosporin A after which 2-3 mL of fresh media was added when necessary. Once

29

cells began to grow more quickly, half media was removed and replaced with fresh media. LBCs were then cultured in two T25 flasks until enough cells were available for further study and to freeze down.

LBCs were grown for further studies under recommended conditions. Specifically, LBCs were cultured at 37°C under 5% carbon dioxide (CO2) in upright T25 flasks with loosened lids using 10-12 mL of medium: RMPI-1640 + L-glut (Invitrogen Cat. No. 11875-093), 10-15%

Fetal Bovine Serum, FBS (Gibco Cat. No. 12483), and 1% Penicillin Streptomycin (Gibco Cat.

No. 15140). LBCs were split and fed with fresh media every 3-5 days depending on media pH indicator (cultures turned from pink to yellow). Prior to harvest when collecting RNA or protein,

LBCs were split into multiple T25 flasks and allowed to grow to log phase.

LBCs in log growth phase were collected by centrifugation (1.8 x 1000 rpm for 5 minutes) in a 15 mL tube and the media removed to 0.2- 0.5 mL after which the cell pellet was re-suspended in the remaining media. Freezing media (RPMI-1640 containing 1% DMSO &

20% FBS) was then added to bring the final volume to ~2 mL and the final volume was split between 2 cryo vials. The cryo vials were then placed in a freezing tray filled with isopropanol and placed at -80°C overnight after which they were moved to liquid nitrogen for long term storage.

RNA from LBCs was extracted using the RNeasy Plus Mini Kit (Qiagen, Cat. No.

74134) according to the manufacturer’s protocol. Briefly, LBCs from a single T25 flask were collected by centrifugation (1.8 x 1000 rpm for 5 minutes) in a 15 mL tube and the supernatant discarded. During the centrifugation step, 12 L of -ME was added per 1.2 mL RLT Plus buffer

(RLT+) and was mixed thoroughly; 1.2 mL of RLT+ was then added to the cell pellet which was re-suspended by vortexing. The lysate was then transferred to an RNase-free 1.5 mL microfuge

30

tube and passed 8-10 times through a 23-guage needle attached to an RNase-free syringe. The homogenized cell lysate was then split into aliquots (2 x 600 L) and was stored at -80°C prior to RNA extraction. Frozen cell lysates were thawed on ice prior to completing steps 4-12 of the protocol for the purification of total RNA from Animal Cells (RNeasy Plus Mini Handbook p.

21-23). In addition, the optional on-column DNase digestion was performed using the RNAse- free DNAse set (Qiagen, Cat. No. 79254) during step 7.

Protein Extraction and quality assessment: LBCs were lysed in RIPA Buffer (Thermo

Scientific, Cat. No. 89900) that contained 10 L/mL Halt Protease Inhibitor Cocktail (Thermo

Scientific, Cat. No. 87785) according to manufacturer’s instructions. Specifically, cells in log phase from a single T25 flask were collected by centrifugation (1.8 x 1000 rpm for 5 minutes) in a 15 mL tube and were washed twice with cold sterile phosphate buffered saline (PBS). RIPA containing protease inhibitor was added to the cell pellet and was pipetted several times until the pellet was re-suspended. This mixture was transferred to a 1.5 mL microfuge tube and incubated on ice for 15 minutes on a rocking platform. After incubation cell debris were removed by centrifugation (13,000 rpm for 15 minutes) and the remaining supernatant was split into aliquots

(3 x 300 L). All steps were done on ice or in a cold room. Homogenized cell lysates containing from cytoplasm, membranes and nuclear proteins were stored at -80°C prior to use.

Protein concentration was determined using the Bio-Rad™ DC Protein Assay (Biorad,

Cat. No. 500-0116) which is similar to the Lowry assay (Bradford, 1976). The standard microplate assay protocol from the manufacturer was followed without any modifications and a

BSA standard curve was run by diluting BSA [1 mg/mL] in RIPA buffer. Absorbances were read using the EnSpine 2300 Mulitlabel Reader (Perkin Elmer, Enspire Manager software v1.00

Rev2).

31

2.3.3 Chromosomal Microarrays (CMA)

2p15-16.1 deletions were identified by multiple oligo based array-CGH platforms

(Affymetrix CytoScan 750K Array (hg19), Affymetrix Genome-Wide Human SNP Array 6.0,

Affymetrix Cytogenetics Whole-Genome 2.7M Array, and Signature Genomics

SignatureChipOSTM) either in house, as described by Qiao et al. (Qiao et al., 2012), in clinical facilities, or by commercial companies. I performed the CMA testing for the first two published cases (Rajcan-Separovic et al., 2007) using the Agilent 105A array. Subsequently, these cases and all new cases recruited to our study were tested using the Affymetrix platform by the Royal

Columbian Hospital Cytogenetics laboratory and the data was sent to us for analysis. Data for

Affymetrix arrays was collected using either GeneChip® Scanner 3000 7G or GeneChip®

Scanner 3000 Dx and CEL files were analyzed using Affymetrix Chromosome Analysis Suite software (ChAS v.1.1). Data was filtered using in house parameters requiring greater than 85% confidence level and greater than 50 kb in size for both duplications and deletions, except for the

2p15-16.1 deletion region where CNVs smaller than 50 kb were also assessed.

2.3.4 Confirmation of Small CNVs by QMPSF

Quantitative multiplex PCR of short fluorescent fragments (QMPSF) was performed to confirm array findings for the two small deletions (<50 kb) detected in new Case No. 2 using previously described protocols (Casilli et al., 2002; Charbonnier et al., 2000). Briefly, short fragments were PCR-amplified from genomic DNA using primers labeled with a fluorescent dye

(Table 2.1) and fragment sizes analyzed by capillary gel electrophoresis.

32

Primer Name Primer sequence PCR product position 2p_enhancer Hs1142_F ACATGGCCAGACCTGAAAAC chr2:60855485-60855688 2p_enhancer Hs1142_R CAGAAAGGCTGAACCCTGAG 22.5kb_intergenic_F TGCATGAGGACATTGGTGAT chr2:60850362 -60850585 22.5kb_intergenic_R CTCAAGGGAAGGAGCTGTTG 2p16_F GAAGGTTCCCACGTTTTGAA chr2:60835872-60836064 2p16_R TTATTTGCCCCCAGTGAGAG BCL11A_Intron_F TTCTAGTGCTTTGGGCGAGT chr2:60710726- 60710922 BCL11A_Intron_R GGAATGCTGCAGTTGTCAGA

Table 2.1 Primer Sequences for QMPSF Confirmation of Small CNVs in New Case No. 2 The primer sequences, forward (F) and reverse (R), are listed along with the position of the PCR product for QMPSF confirmation in New Case No. 2.

2.3.5 Whole Genome Expression Analysis

RNA from whole blood (Tempus) was used to study gene expression in 5 subjects with overlapping 2p microdeletions (Rajcan-Separovic et al. 2007 [1] and Rajcan-Separovic et al.

2007 [2], Case No. 3, Case No. 7, and Case No.8) and for 3 normal controls (Male Ref #1, Male

Ref #2, Female Ref #1).

Transcript levels were assayed using a commercial whole genome expression array,

HumanRef-8 v3.0 Expression BeadChip, using standard protocols (Illumina). Briefly, 2 µL of total RNA was quantified using Quant-iT™ RiboGreen® RNA reagent (Invitrogen) prior to

RNA amplification. Five microliters of total RNA (50-500 ng) was then used in the first- and second-strand reverse transcription step followed by a single in vitro transcription (IVT) amplification. Array hybridization, washing, blocking, and streptavadin-Cy3 staining were also done according to standard protocols (Illumina). The BeadChip was then scanned using an

Illumina BeadArray Reader to quantitatively detect fluorescence emission by Cy3. Eight arrays were run in parallel on a single BeadChip. Each array contained ~ 24,500 well-annotated

33

transcripts (NCBI RefSeq database Build 36.2, Release 22), present multiple times on a single array.

Background-corrected intensity values were generated for each probe using

GenomeStudio software (Illumina). Subsequent analyses were carried out in R (http://www.R- project.org/) (R Core Team., 2015). The data were quantile normalized and differential expression with respect to copy number analyzed using a student’s t-test. Next, to manually compare the expression of genes from the 2p deletion region between deleted and control samples, an expression ratio was generated by dividing the quantile normalized intensity value

(log2) for each sample by the averaged quantile normalized intensity values of the 3 controls

(log2).

2.3.6 Candidate Gene Selection and Functional Analysis

Genes were selected for follow-up based on the following criteria 1) the gene(s) were included in the majority of 2p deletions (>65%), 2) the gene(s) showed changes in expression when deleted (whole genome expression data), and/or 3) the gene(s) are predicted/known to be haploinsufficient (HI) (Huang et al., 2010).

Haploinsufficiency is defined as an inability of a gene to function properly when only one copy is present (Huang et al., 2010). HI genes are often associated with developmental disease and can be predicted based on gene length and on the amount of conserved coding sequences and promoters (Huang et al., 2010). Haploinsufficiency scores predict the likelihood that a gene will be HI; genes likely to be haplosufficient have higher scores (50-100%) while genes likely to be

HI have lower scores (<10%) (Huang et al., 2010).

34

2.3.6.1 Quantitative Real Time PCR (qPCR)

Quantitative real time PCR (qPCR) was used to confirm the microarray detected expression of selected genes in individuals with 2p15-16.1 deletions. RNA from each available sample (LBCs and/or whole blood) was reverse-transcribed using the EasyScript™ cDNA

Synthesis Kit (Applied Biological Materials Inc., Cat. No. G234) with provided Oligo (dT) according to the manufacturer’s protocol. RNA amounts (1 g -1.5 g) were kept consistent for each independent experiment and completed cDNA reaction volume diluted to 25 ng/L using

TE. For each subsequent qPCR run, cDNA was diluted to 0.67 ng/mL using molecular grade

H2O and ~2.68 ng of cDNA was run per reaction. Quantitative PCR (qPCR) runs were performed on an ABI7000 qPCR using the EvaGreen qPCR Mastermix containing ROX

(Applied Biological materials Inc., Cat No. Master Mix-R) to determine the mRNA expression levels of candidate genes.

Prime Time qPCR primers (IDT) overlapping exon-exon boundaries were used for candidate genes, XPO1 and USP34, and for two endogenous control genes, ATF6 and B2M. Two sets of primers were ordered for each candidate gene and one set for each endogenous control.

Prime Time IDs and primer sequences are listed in Table 2.2. Prime Time Primers were re- suspended in IDTE buffer (IDTE; 1X TE solution pH 8.0) as recommended to yield a 20X stock.

Primers for the endogenous control gene, B2M, were ordered separately and came normalized to

100μM in IDTE (pH 8.0). Re-suspended or already suspended primer stocks were divided into aliquots to reduce freeze thaw cycles. B2M primers were diluted to a final working solution (1

M). Primer test runs were done to ensure that efficiencies between reactions were approximately equal (within 5%) and melt curves and agarose gel runs were analyzed for each reaction to confirm the presence of a specific product. After primer validation was complete,

35

each reaction was run on a 96 well plate in triplicate with three separate runs done per gene per sample with the endogenous control gene included in reach run.

Relative quantification of mRNA levels was done using the ΔΔCt method (Livak and

Schmittgen, 2001). Briefly, the Cts for the gene of interest were adjusted in relation to the endogenous control gene Ct for all samples (test and normals). The resultant ΔΔCt value was then expressed as a fold difference in expression between all samples and the average of the 2-5 normals in each run (2- ΔΔCt). P-values were calculated using one-way analysis of variance

(ANOVA) for independent samples (http://vassarstats.net/) and boxplots were generated in R to visualize the data.

36

XPO1, exons 2-3 (Hs.PT.51.15292830) 5’-GTG TCA GTA CTT CTT GAG CCA-3’ (F) 5’-GAC CAT GCA GCT CGT CAG-3’ (R) XPO1, exons 21-22 (Hs.PT.56a.39723281) 5’-CAA CAG AAA AGA TAT GCT GGA GAA-3’ (F) 5’-GGC TTT CAA ACA TAC TAT GAG GAA T-3’ (R) USP34, exons 1-3 (Hs.PT.51.2568294) 5’-GAC ATT GCC TCT GTG TCC A-3’ (F) 5’-CGA ACG ATG TGC GAG AAC T-3’ (R) USP34, exons 5-7 (Hs.PT.51.2086875) 5’-TGT CGT AAC TCC TGA TCC GA-3’ (F) 5’-AGC ACA TGC GTT TAT TAC AGT TG-3’ (R) ATF6, exons 11-12 (Hs.PT.51.19609651) 5’-CTT GGT CCT TTC TAC TTC ATG TCT-3’ (F) 5’-TTG CTT TAC ATT CCT CCA CCT-3’ (R) B2M, exons 2-4 (Hs.PT.58v.18759587) 5'-ACT GAA TTC ACC CCC ACT GA-3' (F) 5'-CCT CCA TGA TGC TGC TTA CA-3' (R)

Table 2.2 Primers Used for Gene Expression Confirmation PrimeTime® qPCR primer assay IDs along with forward (F) and reverse (R) primer sequences are listed for each gene.

2.3.6.2 Western Blotting

Western blotting was used to determine protein expression for candidates from the 2p15-

16.1 region. The following solutions were prepared and used to run Western blots according to standard protocols: 10X running buffer (250 mM Tris base, 1.25 M Glycine, 1% SDS), 10X

Transfer buffer (250 mM Tris base, 1.9 M Glycine), 1X Transfer buffer (10X Transfer buffer,

Methanol), 10X TBS (200 mM Tris pH 7.6, 1.37 M NaCl), TBST (10X TBS, Tween 20),

Blocking solution (TBST + 5% milk). A 4% acrylamide stacking gel on top of a 9-12%

Acrylamide running gel (1M Tris-HCl, pH 8.8, 30% acrylamide, 10% SDS, 10% APS, TEMED) was made a day prior to use and was stored at 4°C overnight.

Protein samples were removed from -80°C and thawed on ice. Each sample was divided into aliquots (30, 45, or 50 g depending on sample concentrations), mixed with RIPA buffer to equalize volumes and appropriate volume of loading dye and boiled for 5 minutes. Samples were

37

then cooled on ice for 5-10 minutes, spun briefly to collect at bottom of tube and used immediately or stored at -20°C until used.

Prepared protein samples were loaded into the stacking gel and allowed to run at low voltage (60-70 V) until they reached the running gel. Voltage was then increased to 100 V and samples run for 1-3 hours depending on the size of the protein. Protein was transferred to a membrane at 100 V in transfer buffer. Voltage was maintained using ice packs changed part way through the transfer.

Protein from whole cell-lysates (30 mg) were run at 60-70 V for 45-60 minutes through the stacking gel and were then run at 100 V on a 9% Acrylamide gel for 1-1.25 hours. Transfer of protein from the gel to the membrane was done for 1-1.5 hours and the membrane blocked at

4°C overnight (TBST + 5% milk) and washed in TBST (3 x 15 minutes). Membranes were incubated with primary and secondary antibodies for 1-1.5 hours at room temperature. For detection of CRM1/XPO1 or ACTB- (used to standardize the amounts of protein loaded into each well), membranes were first incubated with a primary rabbit polyclonal antibody against human CRM1/XPO1 (Novus Biologicals, NB100-56493) diluted 1:1000 in blocking buffer (TBST + 5% BSA) or a primary rabbit monoclonal antibody against human ACTB-

Actin (Novus Biologicals, NBP1-33778) diluted 1:3000 in blocking buffer (TBST + 5% milk) followed by incubation with a secondary goat polyclonal antibody against rabbit IgG (Novus

Biologicals, NB730-H) diluted 1:2000 in blocking buffer (TBST + 5% milk). Membranes were washed in TBST (3 x 15 minutes) after each incubation step. After the final incubation, membranes were developed using an Amersham ECL Kit (GE Healthcare Life Sciences, Product

No. RPN2232) and the resultant films analyzed by UV densitometry (GeneSnap and Gene Tools software). The absorbance values for CRM1/XPO1 were normalized to the corresponding β-actin

38

absorbance values and average values for XPO1 from three independent replicates for subjects and controls were used to generate p-values (two tailed student’s t-test).

Western blotting was performed in a collaborator’s laboratory (Dr Marc O’Driscoll,

University of Sussex) using antibodies for USP34 and ATR as a loading control. Western blotting using the c-REL antibody was performed by Dr. Jiadi Wen.

2.3.6.3 Immunohistochemistry

Immunohistochemistry using anti-XPO1 and anti-USP34 antibodies was performed on control tissues from adult mouse and human fetal and mature brain to determine the location of

XPO1 and USP34 protein expression in normal tissues. Images were captured and analyzed for staining patterns. Immunohistochemistry and image capture was performed on a service basis by the histochemistry lab (Department of Pathology and Laboratory Medicine, UBC). Image analysis was done with the help of neuropathologist, Dr. Chris Dunham.

2.3.6.4 Study of Candidate Genes in C. elegans

The study of candidate gene(s) in animal models requires the presence of orthologous gene(s). In C. elegans, no orthologues for USP34 or REL exist. However for human XPO1, there is an orthologous gene (99.3% shared homology) called xpo-1 in C. elegans located on chromosome V: 7805045-7814627. Transcript ZK742.1a.1 (1-9583) (unspliced + UTR - 9583 bp). (Figure 2.2)

39

Figure 2.2 xpo-1 Orthologue in C. elegans C. elegans xpo-1 is located on chromosome V:7805064..7814572 (http://www.wormbase.org/species/c_elegans/gene/WBGene00002078?query=xpo-1#06-9e-3).

In order to study the expression of xpo1 in the developing worm, I collaborated with Dr.

Harald Hutter and Jessie Jie Pan to create transgenic C. elegans strains. In brief, green fluorescent protein (GFP) was engineered into a fosmid (WRM0636dH12) containing the genomic region of XPO-1 (Tursun et al., 2009). The construct was then sequenced to confirm that both GFP and XPO-1 were included. GFP was inserted at the c-terminus of XPO-1 (before xpo-1 start sequence ATG) which means that GFP expression should reflect the localization of endogenous XPO-1 in C. elegans. After sequencing to confirm the insertion of GFP in the fosmid, transgenic xpo-1::GFP expressing C. elegans strains (VH2119 and VH2120) were created by microinjection (Jessie Jie Pan) of the fosmid into the gonads of adult worms. Primers used to test the insertion were designed to complement both sides of the GFP insertion are listed below:

XPO-1_F_worm 5’-CTCCAGTATGCGCAATCATC-3’

XPO-1_R_worm 5’-TGGGGAAGAAAATGGAATCA-3’

40

Microinjection of the fosmid creates a semi-stable transgenic animal that is mosaic for the transgene with some cells that lose xpo-1 expression (random). Multiple transgenic worms were studied to determine the overall endogenous expression pattern of xpo-1 in C. elegans.

In addition to the above study, I studied the effect of xpo-1 knockdown in C. elegans using an RNAi probe aligned to xpo-1 to determine the effect of the loss of xpo-1 in C. elegans

(Maine, 2008). Worm strains (NW1229, VH616, VH648) designed to have neuronal expression of GFP, provided by Harald Hutter, were used so that abnormalities in neural development for surviving animals could be studied post RNAi feeding (Schmitz et al., 2007). Each strain was studied 3-4 days after transplant of L3-L4 worms (n=5) to an agar plate (NGM plates containing

1mM IPTG and 25 g/mL carbenicillin) with bacteria expressing xpo-1 RNAi (200 L of overnight LB culture with 50 g/mL Ampicillin). Empty vector plates were used as controls.

Dilutions of RNAi with empty vector were used to mimic a 25%, 50%, 75% and 100% knockdown of xpo-1.

2.3.7 Whole Exome Sequencing

DNA extracted from the whole blood of a single individual with a 2p15-16.1 deletion,

Rajcan-Separovic et al. 2007 (1), was sent to Otogenetics for whole exome sequencing. Agilent’s

Sure Select exome capture kit (50 MB) was used to generate libraries sequenced on a Hiseq2000

(guaranteed average 30x coverage). Reads were aligned to genome assembly hg19 and a VCF file containing variant calls was generated using an in house pipeline. The VCF file was then imported into Golden Helix SNP & Variation Suite 7.7.8 for further analysis. Briefly, variant calls were filtered using quality scores of the sequence data as follows: read depth greater than or equal to 10, genotype quality scores greater than or equal to 10 and alternate allele frequency greater than or equal to 25%. Variants were then restricted to the chromosomal region

41

overlapping the 2p deletion (extracted into a shortlist) after which they were filtered to eliminate variants present in publicly available databases including thousand genome (1kG 2012-04-26) and NHLBI heart and lung foundation GO project (NHLBI ESP6500 v2) at frequencies ≥ 1% of the population. Pathogenicity and conservation scores were imported from publicly available databases and were used to evaluate the final list of variants.

2.3.8 CNV Parent of Origin Analysis Using Microsatellites

A microsatellite (24xTA, genomic size 48) overlapping 2p15 (chr2:61,755,869-

61,755,916, hg19) was selected using the UCSC genome browser (http://genome.ucsc.edu/cgi- bin/hgGateway). Primers for the forward (F) and reverse (R) reactions were designed using

Primer3plus software (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi)

(Untergasser et al., 2007) and were confirmed using the Primer Design and Search Tool

(http://bisearch.enzim.hu/?m=genompsearch) to produce a single product. The sequences for the forward (F) and reverse (R) primers are as follows:

D2S_XPO1_24TA (F) 5’-AGCCAAGATTCCCCAAGAAT-3’

D2S_XPO1_24TA (R) 5’-AAGCCATTGCTTTTTGTCAAT-3’.

DNA for probands and parents was diluted to approximately 50 ng/L and 100 ng used per 15 µL PCR reaction. PCR was performed with Hotstart Taq (Qiagen, Cat. No. 203203) according to manufacturer’s instructions and run on a Biorad T100 Thermal cycler with the following cycle conditions: DNA was denatured at 95°C for 5 minutes followed by 25 cycles of denaturation, annealing & extension (94°C for 1 minute, 55°C for 1 minute, 72°C for 1.5 minutes) with a final elongation step of 72°C for 7 min followed by a cooling step (4°C hold).

PCR products were stored at -20°C until used and were diluted when necessary (i.e. if the reaction was overloaded when run on the Genetic Analyzer). PCR products were separated by

42

capillary electrophoresis on an ABI PRISM® 310 Genetic Analyzer and peak size and area determined with GeneScan® Analysis Software v.3.1.2 (ABI).

2.3.9 Bioinformatic Data Acquisition and Analysis

2.3.9.1 Repeating Elements by RepeatMasker (RMSK elements)

Repeat elements flanking each deletion were identified using RepeatMasker. In order to do this, a user generated bed file containing 23 2p15-16.1 deletion regions, Database/Build:

Human Feb. 2009 (GRCh37/hg19) was uploaded into Galaxy v 1.0.0. Regions flanking the deletions (500 bp upstream & downstream) were generated using the Galaxy Tool

(https://usegalaxy.org/) by selecting “operate on genomic intervals” and “get flanks” from the pull down menu (Blankenberg et al., 2010; Giardine et al., 2005; Goecks et al., 2010). The following input parameters were entered: Select data 5: 2p15-16.1 23 cases, Region: Whole feature, Location of the flanking region/s: Both, Offset:0, Length of the flanking region(s):500.

The 46 flanking regions were then joined with the Repeating Elements by RepeatMasker dataset taken from UCSC (UCSC Main on Human: rmsk (genome) which contained ~5,400,000 regions in interval format, hg19) using the join function (INNER JOIN) in the Galaxy Tool with min overlap of 1 .

2.3.9.2 ENCODE Regulatory Elements

ENCODE regulatory elements, elements identified by correlating DNaseI hypersensitive sites and expression data for 112 human samples (Sheffield et al., 2013), were downloaded from the ENCODE regulatory elements database (http://dnase.genome.duke.edu/index.php). Two region searches to find the predicted regulatory elements, one using the entire 2p15-16.1 microdeletion region (chr2:55580038-65440018) and the other using the genomic segment containing the most commonly deleted genes (chr2:60678302-61765418) were performed.

43

2.3.9.3 VISTA Enhancer Elements

VISTA enhancer elements: a large number of human and mouse non-coding genomic sequences have proposed gene enhancer activity based on their conservation in multiple vertebrate species or through chromatin immunoprecipitation coupled to massively-parallel sequencing (ChIP-Seq), an experiment that allows for the identification of DNA sequences bound to protein complexes (Visel et al., 2009). The VISTA Enhancer Browser

(http://enhancer.lbl.gov/), maintained by the Lawrence Berkeley National Laboratory, currently contains experimental data for 2192 elements tested in vivo (1/20/2015) (Visel et al., 2007). Of these, 1154 have gene enhancer activity (positive enhancers) validated in transgenic mice. In order to be considered positive, a developmental enhancer has to show reproducible expression of the reporter gene (LacZ) in the same region/structure in at least 3 observed embryos. Enhancer elements for the 2p15-16.1 deletion region were downloaded from the VISTA Enhancer Browser using genomic position chr2:55580038-65440018 for human only enhancers (hg19).

2.3.9.4 WebGestalt Functional Enrichment Analysis

An online bioinformatics tool, Web-based Gene Set Analysis Toolkit (WebGestalt; http://bioinfo.vanderbilt.edu/webgestalt/) was used for functional analysis of user provided gene lists (Wang et al., 2013; Zhang et al., 2005). The 13 most frequently deleted genes from the

2p15-16.1 microdeletion region (BCL11A, PAPOLG, LINC01185, REL, PUS10, PEX13,

KIAA1841, LOC339803, C2orf74, AHSA2, USP34, SNORA70B, and XPO1) were uploaded into

WebGestalt. All User IDs were unambiguously mapped to 13 unique Gene IDs (i.e. no user IDs were mapped to multiple Entrez Gene IDs or could not be mapped to any Entrez Gene

ID). Subsequent enrichment analyses (Pathway-Commons & Protein-Protein Interactions) were based on the 13 unique Entrez Gene IDs. Parameters used for all enrichment analysis modules

44

included the use of a commonly used reference gene sets (all genes in a genome), the multiple test adjustment default method proposed by Benjamini & Hochberg (Benjamini and Hochberg,

1995), and a “Top 10” option that identifies the 10 most significant categories in each enrichment analysis described below with a “Minimum number of genes for a category” set to 2.

2.3.9.4.1 Pathway Commons Enrichment Analysis in WebGestalt

WebGestalt uses a web service called Pathway Commons API

(http://www.pathwaycommons.org/pc/webservice.do?cmd=help) to search pathway names in the

Pathway Commons website and get the detailed information for the searched pathway. User data and parameters for pathway commons analysis are as follows: User data: textAreaUpload.txt,

Organism: hsapiens, Id Type: refseq_dna_all, Ref Set: entrezgene, Significance Level: Top10,

Statistics Test: Hypergeometric, MTC: BH, Minimum: 2

2.3.9.4.2 Protein-Protein Interaction Enrichment Analysis in WebGestalt

WebGestalt uses mined protein-protein interactions in this enrichment analysis. Protein- protein interaction modules in WebGestalt are generated using data from the following public databases: HPRD (11/11/2012), BioGrid (11/11/2012), BOND (11/11/2012), DIP (11/11/2012),

IntAct (11/11/2012), MINT (11/11/2012), and Reactome (11/11/2012). Only interactions with publication support were retained. User data and parameters are as follows: User data: textAreaUpload.txt, Organism: hsapiens, Id Type: refseq_dna_all, Ref Set: entrezgene,

Significance Level: Top10, Statistics Test: Hypergeometric, MTC: BH, Minimum: 2

2.3.9.4.3 Hierarchical Human Protein Interaction Network Modules

Interaction network modules downloaded from WebGestalt include protein-protein interactions with at least one supporting publication collected from the seven public databases

45

(above). Reported interactions are combined and redundant entries were removed by WebGestalt to produce the human protein interaction network.

2.4 Results

2.4.1 Clinical Findings (Summary of New and Published 2p15-16.1 Deletion Cases)

A full summary of all clinical features reported in individuals with 2p15-16.1 microdeletions is available in Supplementary Table 2.1 (Appendix A). Phenotype percentages are shown separately for new (N=8) and previously reported (N=15) cases included in this study, along with the total phenotype percentage for the group as a whole (N=23).

Common phenotypes reported for >50% of cases from both cohorts include DD, ID, delayed language skills, postnatal feeding problems, hypotonia, small head size (OCF <3rd or

5th-10th per centile), dysmorphisms of the head (bitemporal narrowing, various head shape abnormality), eyes (epicanthal folds, telecanthus, ptosis, down slanting palpebral fissure), nose

(broad/high nasal root), mouth (smooth and long philtrum, high narrow palate or other palate abnormalities), and digits (camptodactyly and/or Metatarsus abductus). (Supplementary Table

2.1, Part C; Appendix A)

Some phenotypic features were seen at different frequencies in our newly recruited cases when compared to published cases. For example, postnatal feeding problems were seen more frequently in our new cases as was hypotonia, various reported abnormal behaviours, and frequent ear infections. Phenotypes found in lower frequencies in our newly reported cases included facial features specific to the eye (telecanthus, short palpebral fissures, hypertelorism) and mouth (smooth and long philtrum, everted lower lip). Phenotypes reported in our newly recruited cases not previously reported for other individuals with 2p15-16.1 microdeletions include bilateral tear duct obstruction or absence, cleft lip/palate, protruding tongue, and skin

46

abnormalities. (Supplementary Table 2.1, Part B; Appendix A) The significance of these differences could not be assessed because our new cohort only contains 8 individuals.

2.4.2 Genomic Findings (Summary of New and Published 2p15-16.1 Deletion Cases)

2p15-16.1 CNV breakpoints, array platforms used for their identification and additional genomic changes for all cases are shown in Supplementary Table 2.2 (Appendix A). Additional genomic changes, including genetic abnormalities within the 2p15-16.1 region and in the rest of the genome, were reported in 10 published cases. Genomic changes within the 2p15-16.1 region include: a mosaic deletion (de Leeuw et al., 2008) and a large run of homozygosity flanking a deletion (Hancarova et al., 2013). Genomic changes in the rest of the genome include: inherited

CNVs (Fannemel et al., 2014; Piccione et al., 2012; Rajcan-Separovic et al., 2007), de novo

CNVs of unknown significance (Peter et al., 2014; Piccione et al., 2012), and benign variants such as a fragile site (Fannemel et al., 2014) and one apparently balanced translocation (Prontera et al., 2011).

Only one of our 8 new cases, Case No. 2, had multiple CNVs detected by high-resolution

CMA. This case had 3 CNVs in the 2p15-16 region: a 3.5Mb CNV encompassing 2 coding genes

(FANCL and VRK2) and two very small CNVs in the BCL11A region (<20 kb). In addition, a de novo 6.5 Mb deletion of chromosome 12p11.21-q11 was detected; the gene content from this

CNV is shown in Supplementary Table 2.3 (Appendix A). The coding part of the 12p11.21-q11

CNV was ~1.5 Mb in length (the rest was in the centromere) and included 11 reference genes and 4 OMIM genes. Only one gene, DNM1L, was predicted to be HI. However, a CNV overlapping DNM1L is reported in a proband and his normal parent (Decipher ID 253252) making it an unlikely candidate for phenotypes seen in Case No. 2. In addition, the 4 OMIM genes were associated with phenotypes not noted in the proband, namely lethal Encephalopathy

47

(DNM1L), Charcot-Marie-Tooth disease type 4H (CMT4H and FGD4), arrhythmogenic right ventricular cardiomyopathy (PKP2), and myopathy, lactic acidosis, and sideroblastic anemia

(YARS2). The role of genes from the 12p11.21-q11 CNV on the phenotypes seen in Case No. 2 is not likely but cannot be excluded.

The two small CNVs in Case No. 2 (Figure 2.3), overlapped with intron 2 of BCL11A

(chr2:60836064-60863258) and with an intergenic region that contains VISTA enhancer hs1142

(chr2:60836064-60863258); both CNVs were confirmed by QMPSF. The intergenic deletion was further refined by QMPSF and extends into the VISTA enhancer (hs1142) removing at least

1/3 of the enhancer (Table 2.3).

48

A) Minimum & Maximum CNV sizes for Intergenic Deletion in Case No. 2 Start Stop Size (bp) Result 60,828,557 60,851,109 22,552 cytoscan array deletion (12 probes) 60,835,872 60,836,064 192 bal QMPSF 60,850,362 60,850,585 223 del QMPSF 60,855,485 60,855,688 203 del QMPSF (Hs1142) 60,863,258 60,863,258 - cytoscan array, next bal SNP probe (rs12474263) 60,850,362 60,855,688 5,326 minimum deletion region/size (bp) 60,836,064 60,863,258 27,194 maximum deletion region/size (bp) B) Enhancer position Start Stop Size (bp) Comment 60,855,056 60,856,888 1,832 Hs1142 60,855,485 60,855,688 203 enhancer primer positions 632 bp minimum confirmed deletion of Enhancer 1142

Table 2.3 Extension of an Intergenic CNV from Case No. 2 into VISTA Enhancer Element hs1142 Genomic positions (hg19) in A) include the CNV (deletion) detected with the cytoscan array, primers positions for amplicons tested by QMPSF and their results, and the next balanced SNP probe on the cytoscan array are shown along with the calculated minimum and maximum size of the small CNV that extend into enhancer hs1142. Genomic positions (hg19) in B) include the VISTA enhancer and amplicon confirmed deleted by QMPSF. Confirmed deleted regions are highlighted in red. At least 632 bp of VISTA enhancer hs1142 is deleted.

49

Figure 2.3 New Case No. 2 Additional CNVs at 2p16.1 Additional CNVs in intron 2 of BCL11A and the intergenic CNV overlapping enhancer element (hs1142) are shown along with the larger 2p CNV that overlaps VRK2 and FANCL (deletions are shown in red, point mutations in BCL11A found in the Deciphering Developmental Disorders (DDD) study are shown in green (Wright et al., 2015). A zoom view of the 2 additional CNVs is shown at the bottom. Thick red bars indicate the confirmed deleted region for the enhancer CNV (chr2:60850362-60855688) while thin red bars indicate the possible extension of the deleted region (chr2:60836064-60863258) based on the position of the next balanced SNP or QMPSF probe. The schematic representation of the location of the hindbrain staining taken from the VISTA gallery page (http://enhancer.lbl.gov/gallery_n.html) is shown below the enhancer element.

50

The gene content for all 2p15-16.1 microdeletions included in this chapter is shown below in Table 2.4 and the region containing the most commonly deleted genes in the 2p15-16.1 microdeletion syndrome is shown below in Figure 2.4 (blue highlighted region). My analysis identified 13 genes (10 coding and 3 non-coding) that are deleted at a frequency >60% (more than 14 of the 23 cases) when all 2p15-16.1 microdeletions were combined (Table 2.4, Part C).

These include two coding genes, USP34 and XPO1, deleted in 74% (N=17) of cases, one non- coding gene, small nucleolar RNA, H/ACA box 70B (SNORA70B), deleted in 70% (N=16) of cases while the remaining 10 genes, 8 coding (PAPOLG, KIAA1841, C2orf74, AHSA2, BCL11A,

REL, PUS10, and PEX13) and 2 non-coding genes (LOC339803 and LINC01185) are deleted in

60-65% of cases. The function of genes that are deleted in >60% of cases are provided in Table

2.5.

51

A) B) C)

Separovical. et 2007 (2) Separovical. et 2007 (1)

- -

7

isson etisson al. 2013 (2)

Case No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. No. 8

Florissonet al. 2013 (1) Rajcan Prontera et al. 2011 Rajcan Leeuwde al. et 2008 Flor etFelix al. 2010 Liangal. 2009 et Piccone al. et 2012 (2) Piccone al. et 2012 (1) Hucthagowder etal. 2012 Peter etal. 2014 Hancarova et al. 2013 Chabchoub al. et 2008 Fannemel et al. 2014 Gene Deleted (Total) Color Scale

HI Score

Published Cases New Cases

3

Gene ID Size (Mb)

6.7461 7.89184 3.52782 6.11217 3.45121 6.67773 3.34967 3.1441 2.505 0.643 2.47244 0.203 0.438 0.58333 0.2327 9.57444 2.01259 5.36243 0.97104 4.59269 0.35946 2.66722 0.79473 CCDC88A mRNA ± ± 2 (15) 0 (8) 2 (23) 2 45.90% CCDC104 mRNA + + 2 (15) + 1 (8) 3 (23) 3 87.80% SMEK2 mRNA + + 2 (15) + 1 (8) 3 (23) 3 5.00% PNPT1 mRNA + + 2 (15) + 1 (8) 3 (23) 3 41.50% EFEMP1 mRNA + + 2 (15) + 1 (8) 3 (23) 3 53.00% MIR217 ncRNA + + 2 (15) + 1 (8) 3 (23) 3 MIR216A ncRNA + + 2 (15) + 1 (8) 3 (23) 3 MIR216B ncRNA + + 2 (15) + 1 (8) 3 (23) 3 CCDC85A mRNA + + 2 (15) + 1 (8) 3 (23) 3 72.90% VRK2 mRNA + + + + + 5 (15) + + 2 (8) 7 (23) 7 67.40% FANCL mRNA + + + + + 5 (15) + + 2 (8) 7 (23) 7 35.50% LINC01122 ncRNA + + + + + + ± ± 8 (15) + + ± 3 (8) 11 (23) 11 AC007131.1 ncRNA + + + + + + + + 8 (15) + + + 3 (8) 11 (23) 11 MIR4432 ncRNA + + + + + + + + + 9 (15) + + 2 (8) 11 (23) 11 BCL11A mRNA ** + + + + + + + + + + + ± 12 (15) + + + 3 (8) 15 (23) 15 12.40% PAPOLG mRNA * + + + + + + + + + + + 11 (15) + + + 3 (8) 14 (23) 14 22.70% LINC01185 ncRNA ** + + + + + + + + + + + 11 (15) + + + + 4 (8) 15 (23) 15 REL mRNA ** + + + + + + + + + + ± 11 (15) + + + + 4 (8) 15 (23) 15 5.50% PUS10 mRNA ** + + + + + + + + + + ± 11 (15) + + + + 4 (8) 15 (23) 15 36.60% PEX13 mRNA ** + + + + + + + + ± + + 11 (15) + + + + 4 (8) 15 (23) 15 46.40% KIAA1841 mRNA * + + + + + + + + + + 10 (15) + + + + 4 (8) 14 (23) 14 59.10% LOC339803 ncRNA * + + + + + + + + + + 10 (15) + + + + 4 (8) 14 (23) 14 C2orf74 mRNA * + + + + + + + + + + 10 (15) + + + + 4 (8) 14 (23) 14 AHSA2 mRNA * + + + + + + + + + + 10 (15) + + + + 4 (8) 14 (23) 14 92.60% USP34 mRNA *** + + + + + + + + + + ± 11 (15) + + ± + ± ± 6 (8) 17 (23) 17 6.50% SNORA70B ncRNA *** + + + + + + + + + + + 11 (15) + + + + + 5 (8) 16 (23) 16 XPO1 mRNA *** + + + (+) + + + + + + ± 11 (15) + + + + + ± 6 (8) 17 (23) 17 0.50% 52

al. 2013 (1)

Separovical. et 2007 (2) Separovical. et 2007 (1)

- -

Gene ID Gene Deleted (Total) Color Scale HI Score

Florissonet Rajcan Prontera et al. 2011 Rajcan Leeuwde al. et 2008 Florissonet al. 2013 (2) etFelix al. 2010 Liangal. 2009 et Piccone al. et 2012 (2) Piccone al. et 2012 (1) Hucthagowder etal. 2012 Peter etal. 2014 Hancarova et al. 2013 Chabchoub al. et 2008 Fannemel et al. 2014 No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 FAM161A mRNA + + + + + + + + 8 (15) + + + + + 5 (8) 13 (23) 13 CCT4 mRNA + + + + + + + + 8 (15) + + + + + 5 (8) 13 (23) 13 8.10% COMMD1 mRNA ± + + + + + + + 8 (15) + + + + + 5 (8) 13 (23) 13 74.70% B3GNT2 mRNA + + + + + + 6 (15) + + + + + 5 (8) 11 (23) 11 MIR5192 ncRNA + + + + + + 6 (15) + + + + + 5 (8) 11 (23) 11 TMEM17 mRNA + + + + + 5 (15) + + + + 4 (8) 9 (23) 9 37.90% EHBP1 mRNA + ± + ± 4 (15) + + + + 4 (8) 8 (23) 8 43.60% AC009501.4 ncRNA + + 2 (15) + + + + 4 (8) 6 (23) 6 OTX1 mRNA + + 2 (15) + + + + 4 (8) 6 (23) 6 63.70% DBIL5P2 ncRNA + + 2 (15) + + + + 4 (8) 6 (23) 6 WDPCP mRNA + + 2 (15) + + + + 4 (8) 6 (23) 6 MDH1 mRNA + 1 (15) + + + + 4 (8) 5 (23) 5 53.40% UGP2 mRNA + 1 (15) + + + + 4 (8) 5 (23) 5 1.70% VPS54 mRNA + 1 (15) + + + + 4 (8) 5 (23) 5 41.60% PELI1 mRNA + 1 (15) + + + 3 (8) 4 (23) 4 6.20% LINC00309 ncRNA + 1 (15) + + 2 (8) 3 (23) 3 LGALSL mRNA + 1 (15) + + 2 (8) 3 (23) 3 AFTPH mRNA + 1 (15) + + 2 (8) 3 (23) 3 17.80% MIR4434 ncRNA + 1 (15) + + 2 (8) 3 (23) 3 AC007365.1 ncRNA + 1 (15) + + 2 (8) 3 (23) 3 SERTAD2 mRNA + 1 (15) + + 2 (8) 3 (23) 3 7.80% AC007880.1 ncRNA + 1 (15) + + 2 (8) 3 (23) 3 AC007386.2 ncRNA + 1 (15) + + 2 (8) 3 (23) 3 SLC1A4 mRNA + 1 (15) + + 2 (8) 3 (23) 3 69.00% CEP68 mRNA + 1 (15) + 1 (8) 2 (23) 2 92.90% RAB1A mRNA + 1 (15) + 1 (8) 2 (23) 2 10.70% ACTR2 mRNA 0 (15) + 1 (8) 1 (23) 1 31.10% SPRED2 mRNA 0 (15) ± 1 (8) 1 (23) 1 18.80%

53

Table 2.4 Gene Content Overlap for Published and Newly Recruited 2p15-16.1 Microdeletion Cases (hg 19) RefSeq gene IDs for the entire 2p15-16.1 microdeletion region are listed for the most distal and proximal breakpoints in 15 published and 8 newly recruited cases. Non-coding RNA genes (ncRNA) and protein coding genes (mRNA) are indicated beside the gene ID. Genes included in the deleted region for each case are indicated by a +. Genes disrupted by a deletion (but not fully included) are indicated with a ±. Genes possibly included in the deletion region based on available data are shown in red (+). The number of cases that include each gene are reported for A) cases published to April 2014 (n=15), B) newly recruited cases (n=8), and C) all cases combined (n=23). The most commonly deleted genes are shown in bold font and have stars marking the most frequently deleted genes in the combined cohort (≥ 16 cases***, ≥ 15 cases**, ≥ 14 cases*). A single color scale bar shows the most frequently deleted genes (green). Haploinsufficiency scores (Huang et al. 2010) are also shown in color with genes not likely HI(green), neutral genes (yellow) and genes likely to be HI (red).

54

Figure 2.4 Genomic Overlap of 15 Published and 8 Newly Recruited 2p15-16.1 Microdeletion Cases Microdeletions (red) for 15 cases published to April 2014, proposed critical regions (black) (Liang et al. 2009 and Hucthagower et al. 2012), and our newly recruited cases (Cases 1-8) are shown in the UCSC genome browser (hg19). Thick red bars indicate the deletion region while thin red bars indicate the possible extension of the deleted region based on the position of the next balanced qPCR probe (Florisson et al. 2013). The region containing the most commonly deleted genes (chr2:60,678,302 - 61,765,418) is highlighted in blue. Additional tracks shown include RefSeq genes, Duplications >1000 bases (Segmental Dups), CNVs from the Database of Genomic Variants, and custom tracks for the positive VISTA enhancer elements (18 total; 12 brain related enhancers, dark blue, and 6 other enhancers, light blue) and the ENCODE regulatory elements (1336 total; green).

55

Gene Name Gene Symbol Gene Function References

BCL11A is a known regulator of the globin gene, silencing fetal hemoglobin (HbF) expression in adult ethryoid cells. A common variant found at an erythroid enhancer has been shown to reduce transcription factor binding and affects BCL11A expression in erythroid but not B- (Balci et al., 2015; Basak et lymphoid cells suggesting that different forms of the gene are produced in different cells. al., 2015; Bauer et al., 2013; BCL11A is expressed in the brain and has been implicated in the development of the central Funnell et al., 2015; John et nervous system. BCL11A is required for morphogenesis of dorsal spinal neurons and for the al., 2012; Sankaran et al., B-cell CLL/ 11A correct wiring of primary sensory neurons. Finally, BCL11A is implicated causing a 2008; Sankaran et al., 2010; ( protein) BCL11A neurodevelopmental phenotype. Wright et al., 2015)

PAPOLG is a Poly(A) polymerase enzyme. PAP enzymes catalyze the addition of the poly(A) tail onto the 3’ end of a maturing mRNA, a process needed to complete normal biogenesis of most eukaryotic mRNAs. Misregulation of the step during RNA processing can lead to changes in gene expression and disease. Poly(A) tail extension can cause a mRNA to be (Bresson and Conrad, 2013; recognized and degraded by the exosome in the RNA decay pathway and is thought to be Krishnakumar and Kraus, important for mRNA quality control. Unlike other PAPs, PAPOLG is found exclusively in the 2010; Kyriakopoulou et al., Poly(A) polymerase gamma PAPOLG nucleus suggesting that it has a specialized role. 2001; Yang et al., 2014)

Long intergenic non-protein LINC01185 is a novel long intergenic non-coding RNA of unknown function. It has two coding RNA 1185 LINC01185 identified splice variants and is located on the reverse strand of .

REL encodes the protein, c-Rel, a member of the NF-kappaB family of transcription factors. It has a C-terminal transactivation domain that allows it to activate target gene expression. It is the only NF-kappaB family member considered to be an oncogene. NF-kappaB transcription factors are formed by the hetero- or homodimerization of five subunits: p50 (NF-kappaB1), p52 (Anthony et al., 2014; Bull (NF-kappaB2), p65 (RelA), RelB and c-Rel. Cellular processes regulated by NF-kappaB et al., 1990; Bunting et al., transcription factors include cell survival and inflammation although each subunit has distinct 2007; Gregersen et al., biological functions. c-Rel has been shown to play a role in proliferation and survival of 2009; Pereira and Oakley, hematopoietic cells and has been implicated in autoimmune responses that cause celiac 2008; Rao et al., 2003) disease and rheumatoid arthritis. In addition, c-Rel is required for the of V- avian gene promoters of the inflammatory cytokines, Interleukin-2 (IL-2) and Granulocyte reticuloendotheliosis viral macrophage colony stimulating factor (GM-CSF), in a process that unmasks transcription factor oncogene homolog REL binding sites.

Pus10 is a recently identified psudouridine synthase responsible for the conversion of U54 and (Blaby et al., 2011; Gurha U55 of tRNA into pseudouridine. Pseudouridines (Psi), or uridine isomers, are found in all and Gupta, 2008; Hamma organisms. They are the most common nucleoside modification seen in structurally important and Ferre-D'Amare, 2006; regions of non-coding RNAs such as transfer RNA (tRNA) and ribosomal RNA (rRNA), and more Joardar et al., 2013; King recently in mRNA. Pseudouridinylation depends on pseudouridine synthases recognition of and Lu, 2014; Ofengand et target uridines in primary and secondary RNA structures. Disrupting or removing a al., 1995; Ofengand et al., Pseudouridylate synthase 10 PUS10 pseudouridine (Psi) synthase(s) had a negative effect on growth in both E. coli and yeast. 2001; Schwartz et al., 2014)

56

Gene Name Gene Symbol Gene Function References

PEX13 encodes a peroxisomal membrane protein, PEX13, which binds to the peroxisomal targeting signal 1 (PTS1) and allows the import of peroxisomal matrix proteins. , a severe neonatal neurodegenerative disorder, is caused by homozygous (Krause et al., 2013; Liu et loss-of-function mutations in PEX genes that prevent proteins from being imported into the al., 1999; Maxwell et al., peroxiosome resulting in metabolic dysfunction. Non-functional can lead to 2003; Muller et al., 2011; increased oxidative stress and neuronal cell death. While severe neurological changes, Rahim et al., 2014; Suzuki including disordered lamination in the cerebral cortex, similar to those seen in Zellweger et al., 2001) Peroxisomal biogenesis syndrome are found in the brains of mice with homozygous PEX13 mutations, mice that are factor 13 PEX13 heterozygous for PEX13 mutations do not display any discernable phenotypes. Uncharacterized protein, KIAA1841 was isolated from cDNA in human brain tissue. Little is known about the protein (Nagase et al., 2001) KIAA1841 KIAA1841 other than there are 4 isoforms produced by Uncharacterized gene LOC339803 LOC339803 LOC339803 produces a non-coding RNA with no know function. Chromosome 2 open reading frame 74 C2orf74 C2orf74 codes for a single-pass transmembrane protein with no currently known function. Activator of heat shock 90kDa protein ATPase AHSA2 codes for the protein AHA1, a co-chaperone that modulates ATPase hydrolysis of homolog 2 AHSA2 HSP90. (Panaretou et al., 2002)

USP34 is a member of the -specific protease (USP) family of enzymes that cleave ubiquitin from polyubiquinated proteins. This process is thought to play a role in rescuing proteins from degradation and increasing cellular concentrations of free ubiquitin. Posttranslational modification of ubiquitin can lead to the stability of target proteins, so (Kowalski and Juo, 2012; Lui proteins involved in this process are involved in many cellular processes. USP34 has been et al., 2011; Mu et al., implicated in several processes including: positive regulation of the canonical Wnt signaling 2007; Quesada et al., 2004; pathway, control of the mammalian DNA damage response through the ubiquination of double Reyes-Turcu et al., 2009; stranded breaks, in nuclear accumulation of proteins that upregulate beta-catenin (CTNBB1)- Tsou et al., 2012) mediated transcription, and in the reduction of R-Smad levels, the central mediators of TGFβ Ubiquitin specific peptidase and BMP pathways. In addition, de-ubiquitinating enzymes are necessary for synapse 34 USP34 development and function and USP34 is critical to overall neuronal development in drosophila.

SNORA70B is a non-coding RNA gene whose sequence is located in the sense orientation within (Dieci et al., 2009; Kiss et Small nucleolar RNA, H/ACA the intron of the protein coding gene, USP34. While not much is known about its function, al., 2002; Reichow et al., box 70B SNORA70B snoRNAs typically play a role in rRNA modification. 2007)

57

Gene Name Gene Symbol Gene Function References

XPO1, also called Chromosome region maintenance 1 (CRM1), mediates the export of a wide variety of cargos, including proteins and mRNA, from the nucleus through the nuclear pore into (Bai et al., 2013; Bollmann the cytoplasm. More than 200 proteins with -rich nuclear export signals that bind XPO1, et al., 2013; Fornerod et al., either directly or indirectly through adaptor proteins, have been identified. XPO1 is also 1997; Fukuda et al., 1997; involved in mRNA export which is mediated by multiple adaptor proteins (e.g. HuR, Staufen, Gravina et al., 2014; and eIF4e). Overexpression of XPO1, seen in multiple types of , is thought to lead to Thomas and Kutay, 2003; cytoplasmic mislocalization and aberrant activity of XPO1 cargo proteins leading to cancer Wang et al., 2005; progression. In addition to its well-studied role as a nuclear export receptor, XPO1 has also Watanabe et al., 1999; Xu been implicated as a regulator of (based on its localization with kinteochores and et al., 2012; Zheng et al., Exportin 1 XPO1 centrosomes) and more recently in the synthesis of rRNA 2014)

Table 2.5 Function of Genes Found in >60% of 2p15-16.1 Microdeletions

58

The majority of deletions included in this chapter (21 of 23) partially or completely overlap a ~1.087 Mb region that contains the most commonly deleted genes (chr2:60,678,302 -

61,765,418; hg19) (Figure 2.4, blue highlighted region). Two cases (Prontera et al. 2011 and

New Case 2) have deletions that are located completely outside the commonly deleted region

(distal) and, although they are different in size (3.528 Mb and 2.013 Mb respectively), they overlap the same 4 genes; 2 coding (VRK2, FANCL) and 2 non-coding (LINC01122,

AC007131.1). It should also be noted that multiple deletions in the 2p15-16.1 region do not overlap each other (e.g. Piccione et al. 2012 [1], Peter et al. 2014, and Hancarova et al. 2013 versus Chabcoub et al. 2008, Case No. 6, Fannemel et al. 2014, Case No. 7, and Case No. 8).

2.4.3 Whole Genome Expression Analysis in Individuals with 2p15-16.1 Microdeletions

In order to determine if genes integral to 2p15-16.1 deletions are sensitive to copy number and are therefore good candidates for further functional testing, I performed whole genome expression analysis for 5 subjects with varying 2p15-16.1 microdeletions (Rajcan-

Separovic et al. 2007 [1], Rajcan-Separovic et al. 2007 [2], Case No. 3, Case No. 7, and Case

No. 8) and 3 controls (2 reference males, 1 reference female). Expression of genes located within the 2p15-16.1 region was determined by using the controls to generate a relative expression ratio for each gene. Because the gene content for each deletion is variable and to minimize variability introduced by using a single sample, gene expression was assessed only for genes deleted in at least 3 of 5 cases, i.e. for BCL11A, PAPOLG, REL, PUS10, PEX13, KIAA1841, AHSA2, USP34,

XPO1, FLJ13305, CCT4, COMMD1, B3GNT2, TMEM17, EHBP1, OTX1, and LOC51057. Four genes, REL, AHSA2, USP34 and COMMD1, showed expression changes consistent with 2p15-

16.1 copy number, meaning that they had lower expression (~30% to 50%) than normal controls

(Supplementary Table 2.4; Appendix A). These results suggest that multiple genes within the

59

2p15-16.1 microdeletion are dosage-sensitive and that their reduced expression (when deleted) may lead to phenotypic consequences.

I then extended my analysis to determine if any genes outside of the 2p15-16.1 deletion region had expression levels that could be correlated with the deletion. Genes were ranked based on correlation of expression changes and 2p15-16.1 copy number. After multiple test correction, no changes in gene expression for genes outside of the 2p15.16.1 region were significantly correlated with copy number.

2.4.4 Candidate Gene Selection for Functional Analysis

Candidate genes were selected based on the following criteria 1) the gene(s) were included in the majority (>65%) of 2p deletions, 2) the gene(s) showed changes in expression when deleted (whole genome expression data), and/or 3) the gene(s) are predicted/known to be

HI. Using the above criteria, I selected three candidate genes for further analysis: USP34

(prevalence, expression and haploinsufficiency score) XPO1 (prevalence and haploinsufficiency score), and REL (prevalence, expression and haploinsufficiency score). SNORA70B, a noncoding

RNA gene, was not included as a candidate in my study since it was not on the whole genome expression array and did not have a predicted HI score.

2.4.4.1 Functional Studies of 2p15-16.1 Candidate Genes in Human Cells

Relative quantitative PCR (qPCR) was used to confirm the expression values generated from the whole genome expression array for two top candidate genes, USP34 and XPO1 in all available individuals/samples (whole blood and/or cell lines). Expression for both genes was consistent with the values from the expression array for individuals included on the array i.e. USP34 expression is reduced in whole blood and LBCs of individuals with deletions overlapping the gene in keeping with expression array and XPO1 expression does not change regardless of 2p15-

60

16.1 copy number, also in keeping with expression array (Figure 2.5). Gene expression for

USP34 & XPO1 in one additional individual (Case No. 4) not included on the whole genome expression array followed the expected gene expression pattern.

61

Case = gene in 2p deletion, Controls = no 2p deletion, Other = gene not in 2p deletion * P < 0.05, ** P < 0.01, n.s. = nonsignificant

Figure 2.5 mRNA Expression of Candidate Genes USP34 & XPO1 Expression results from qPCR experiments for whole blood and cell lines are shown together with results from the whole genome expression array. Gene expression from qPCR experiments include results from two sets of probes, each designed over a different exon-exon boundary (indicated by the number to the right of the gene name). Results are shown separately for each probe tested. A) Boxplots of the expression values for USP34 in whole blood, lymphoblast cell lines (LBCs) and from the whole genome expression array are shown in separate panels. Gene expression values are significantly different between control individuals and deletion carriers for the whole genome expression array (P<0.01) and for the USP34_5 probe set in whole blood. .B) Boxplots of the expression values for XPO1 in whole blood, lymphoblast cell lines (LBCs) and from the whole genome expression array are shown in separate panels. No significant gene expression differences are seen between control individuals and deletion carriers in any of the experiments. The numbering beside each gene ID corresponds to the location of the primer (1=exon 1 F primer) and the tissue tested (T=tempus whole blood, L=LBCs, WG = whole genome expression from LBCs). Probe sequences and locations are listed in Table 2.2. 62

Western blotting (WB) to determine XPO1 protein expression in lymphoblast cells shows decreased protein in probands with 2p deletions (Rajcan-Separovic et al. 2007 [1], Case No. 3, and Case No. 8) overlapping XPO1 compared to two normal controls (Male & Female). The proband whose deletion does not include XPO1 (Case No. 4) had similar protein expression to the normal controls (Figure 2.6). This result is in contrast to the mRNA expression results for

XPO1 which show that the expression of XPO1 is not significantly different between deletion carriers and normal controls (Figure 2.5, Part B).

WB to determine USP34 protein expression in lymphoblast cells is shown in Figure 2.7.

Decreased protein amounts are seen in probands with 2p deletions overlapping USP34 deletions

(Case No. 3, Case No. 4, and Case No. 8) compared to cognitively normal controls (normal control, non-carrier female sibling, non-carrier father). The proband whose deletion does not include USP34 (Case No. 8) has similar protein expression to the normal controls (Figure 2.7).

This result is consistent with the whole genome expression results for USP34 which show a significant difference in USP34 expression in deletion carriers’ vs normal controls (Figure 2.5,

Part A).

WB to determine c-REL protein expression in lymphoblast cells shows decreased protein in probands with 2p deletions overlapping REL (Rajcan-Separovic et al. 2007 [1], Case No. 3, and Case No. 4) compared to normal controls. The proband whose deletion does not include REL

(Case No. 8) has similar protein expression to the normal controls (Figure 2.8). This result is consistent with the whole genome expression results for REL for the cases run on the array

(Rajcan-Separovic et al. 2007 [1], Case No. 3, and Case No. 8) (Supplementary Table 2.4;

Appendix A).

63

Figure 2.6 Investigation of XPO1 Protein Expression in Cell Lines with 2p15-16.1 Deletions Western blots were performed using protein isolated from lymphoblast cell lines established from probands with 2p deletions (Rajcan-Separovic et al. 2007 [1], Case No. 3, Case No. 4, and Case No. 8) and from cognitively normal controls (Male Ctrl and Female Ctrl). A) Representative blot of XPO1 and -Actin: ~30 mg of total protein is loaded in each lane. B) Densitometric ratios generated for each experiment are averaged and shown with standard error bars in the graph. Samples significantly different than the normal controls (p<0.01) are marked with an (*). C) Genomic overlap of 2p15-16.1 microdeletion cases. Deletions fully or partially include XPO1 except for Case No. 4.

64

Figure 2.7 Investigation of USP34 Protein Expression in Cell Lines with 2p15-16.1 Deletions Western blots were performed using protein isolated from lymphoblast cell lines established from probands with 2p deletions (Case No. 3, Case No. 4, and Case No. 8) and from cognitively normal controls (Normal Control, Non- carrier female Sibling, Non-carrier Father). A) Representative blot of USP34 and ATR: increasing amounts of total protein are loaded in each lane. B) Genomic overlap of 2p15-16.1 microdeletion cases. Deletions fully or partially include USP34 except for Case No. 8.

65

Figure 2.8 Investigation of c-REL Protein Expression in Cell Lines with 2p15-16.1 Deletions Western blots were performed using protein isolated from lymphoblast cell lines established from probands with 2p deletions (Rajcan-Separovic et al. 2007 [1], Case No. 3, Case No. 4, and Case No. 8) and from cognitively normal controls (Male Ctrl, Female Ctrl, Female Ctrl 2, and Female Ctrl 3). A) Representative blot of c-Rel and -Actin: ~30 mg of total protein is loaded in each lane. B) Densitometric ratios generated for each experiment (N=3) are averaged and shown with standard error bars in the graph. Samples significantly different than the normal controls (p<0.05) are marked with an (*). C) Genomic overlap of 2p15-16.1 microdeletion cases. Deletions fully or partially include Rel except for Case No. 8.

66

Protein expression in human and mouse tissue was studied using immunohistochemistry.

In human fetal brain, positivity for XPO1 was seen in a number of regions. Specifically, mild positivity was seen in the immature ependyma, Purkinje cells, inferior olive, substantia nigra and

Cajal-Retzius cells in the cortex. Positivity was more intense in some cells of the mitotically active immature ependyma (a pseudostratified epithelium) overlying the germinal matrix. The latter constitutes a dense stem cell population that sequentially gives rise to neuronal and glial precursors that eventually migrate out into the cerebrum. In the mature human brain, mild positivity was seen only in Purkinje cells and inferior olive cells. The staining was predominantly nuclear, but cytoplasmic staining was also noticeable, particularly in cells undergoing mitosis. (Figure 2.9 A-C)

In the adult mouse, XPO1 shows wide cytoplasmic expression in the brain (Figure 2.9 D) as well as in the gut, spleen, and lung (data not shown).

Figure 2.9 XPO1 Expression in Human Fetal and Mouse Brain Immunohistochemistry against XPO1 was performed on human fetal brain (A-C) and mouse brain (D). Mild positivity for XPO1 (arrows) is seen in immature ependyma or neuroepithelium (A). In the cerebral cortex (B), positivity is seen in Cajal-Retzius cells (arrows). In cells undergoing mitosis (C) positivity is seen mainly in the cytoplasm (arrows). In mouse brain (D) diffuse positive staining is seen in many neurons and is mainly cytoplasmic (arrows); nuclei (N).

67

In human fetal brain moderate positivity for USP34 is seen in grey matter but not white matter or the External Granular layer (Figure 2.10). Moderate positivity is seen in the Striatum,

Tegmentum of the pons (both cytoplasmic & nuclear), Hippocampus, Caudate, Putamen, and

Thalamus. In the cerebellar cortex the Purkinje cell layer shows strong positivity for USP34

(Figure 2.10 A). In the mature human brain, positivity for USP34 is more widespread and is visible in both the white and grey matter (data not shown). However there is still strong positivity in the Purkinje layer (Bergmann glia or radial astrocytes) in the cerebellar cortex (data not shown).

Figure 2.10 USP34 Expression in Human Fetal Brain (A-F) Moderate positivity for USP34 (arrows) is seen throughout grey matter (A-D). USP34 is diffusely expressed in neurons (A-D), and can be seen in both the nucleus (N) and cytoplasm (black arrow) for large neurons in the tegmentum of the pons (C). In the cerebellar cortex (A), positive staining is visible in the Purkinje cell layer (arrows). No staining is visible in germinal layers or in white matter (E, F).

68

2.4.4.2 Functional Studies of 2p15-16.1 Candidate Genes in C. elegans

During these investigations I studied the expression of xpo-1, an orthologous gene to human XPO1 (99.3% shared homology) in transgenic C. elegans and looked at the consequences of xpo-1 gene knockdown using RNAi.

XPO-1 is ubiquitously expressed during all stages of C. elegans development in transgenic animals (Figure 2.11 A-D). Earliest expression of XPO-1 is seen in early gastrulation stage embryos, mainly in cell nuclei (Figure 2.11 A). Ubiquitous expression of XPO-1 continues throughout embryonic and larval development into adult stages (Figure 2.11 B-F). Cellular expression of XPO-1 at all stages is enriched in the nuclei (typically excluding the nucleolus), but can also be seen in the cytoplasm. Expression of XPO-1 is visible in neuronal nuclei and cytoplasm of the adult worm (Figure 2.11 E&F).

Knock down of XPO-1 by RNAi in C. elegans was confirmed to be early embryonic lethal in all strains tested (fewer than 5 surviving worms/plate). Results were similar for all dilution experiments (25%, 50%, 75% and 100%).

69

Figure 2.11 XPO1 Expression in Transgenic XPO-1::GFP C. elegans GFP Images (left or top) and Nomarski images (right or bottom) are given for each embryonic stage shown. Images are from the strain expressing the XPO-1::GFP fusion protein (GFP at the C-terminus), i.e. the subcellular localization reflects the XPO-1 localization. A) Mid-gastrulation, B) End of gastrulation; blue arrow points to neuroblasts or hypodermal cells, C) 1.5 Fold, Tadpole‐stage; the patches of brighter cells in the head (red arrows) are most likely neurons, D) Larval Stage, E) Head of Adult Worm, F) Tail of an Adult Worm; labeled cells include gut cells (red arrows), hypodermal cells (blue arrows) and tail neurons (yellow arrow).

2.4.4.3 Additional Laboratory Investigations

Exome Sequencing: based on observations that microdeletions can unmask deleterious variants (Albers et al., 2012; McDonald-McGinn et al., 2013), I looked for deleterious variants within the breakpoints of the CNV on chromosome 2 for a single case, Rajcan-Separovic et al.

2007 [1]. No pathogenic sequence variants were detected in any genes in the 2p15-16.1 region for this case. This result further supports the theory that the 2p15-16.1 microdeletion syndrome is

70

a contiguous gene syndrome (i.e. is not caused by a mutation in a single gene unmasked by the microdeletion).

Parent of Origin Study: microsatellite markers, also called short tandem repeats (STRs), are repeat DNA sequences of nucleotides 2-7 base pairs in length that are found over the entire length of the human genome (Ciofi et al., 1998). The numbers of repeats are highly variable so at any given microsatellite multiple alleles are often seen in different individuals

(polymorphic) (Ciofi et al., 1998). I used this variability to determine which allele is deleted

(maternal or paternal) in 5 of our 2p15-16.1 microdeletion cases by targeting a dinucleotide microsatellite (24xTA) that is located at chromosome 2p15 (chr2:61,755,869-61,755,916, hg19)

(Supplementary Figure 2.1; Appendix A). In 4 of 5 probands tested I was able to determine which parent the remaining allele was inherited from (Table 2.6), and by default, which parent contributed the chromosome with the deletion. My data showed no bias in parental origin for the

2p15-16.1 microdeletion; in two cases the paternal allele was deleted (Rajcan-Separovic et al.

2007 [2], Case No. 3), in two cases the maternal allele was deleted (Case No. 7, Case No. 8), and in one case (Rajcan-Separovic et al. 2007 [1]) inheritance could not be determined using the selected microsatellite (data not informative). This data is in contrast to the parental origin of deletions in cases published to April 2014 (Felix et al., 2010; Hancarova et al., 2013; Liang et al.,

2009; Piccione et al., 2012), which when studied, were found on the paternal chromosome

(Supplementary Table 2.2; Appendix A). The equal inheritance seen in my cases suggests that it is unlikely that the 2p15-16.1 microdeletion phenotype is caused by an imprinted gene.

71

Relation Sex Size Result Rajcan-Separovic et al. 2007 (1) proband F 255.32 not informative mom F 255.26 286.46 dad M 255.40 Rajcan-Separovic et al. 2007 (2) proband M 274.38 maternal allele mom F 255.24 274.39 dad M 258.23 280.46 Case No. 3 proband M 258.32 maternal allele mom F 258.19 dad M 278.52 286.56 Case No. 7 proband M 274.50 paternal allele mom F 255.37 276.39 dad M 258.34 274.49 Case No. 8 proband M 260.21 paternal allele mom F 256.23 278.48 dad M 260.23 288.48

Table 2.6 Parental Allele Identification Using Microsatellites Identification of the non-deleted allele is possible when parental alleles have different fragment sizes.

2.4.5 Bioinformatic Investigations

Repeating Elements by RepeatMasker (RMSK elements): in order to find regions of homology flanking each deletion I analyzed the regions flanking each deletion (500 bp upstream

& downstream) and identified 56 repeat elements (Supplementary Table 2.5; Appendix A).

While some cases have similar repeat elements (family and/or class) in the same orientation in both proximal and distal breakpoints (Case No. 6, Case No. 7, Prontera et al. 2010, and

Chabchoub et al. 2008) multiple cases have a variety of repeat elements located on either side of

72

the deletion breakpoint (Rajcan-Separovic et al. 2007 [1], Florisson et al. 2013 [1], de Leeuw et al. 2008, Felix et al. 2010, Piccone et al. 2012 [1], Piccone et al. 2012 [2], Hucthagowder et al

.2012, Hancarova et al. 2012, and Fannemel et al. 2014) that are from different families, classes, and/or occur in opposite orientation. In addition, several cases have repeat elements identified only on one side of the deletion (Case No. 1 distal, Case No. 2 proximal, Case No. 3 proximal,

Case No. 4 proximal, Case No. 5 distal, Case No. 8 distal, Rajcan-Separovic et al. 2007 [2] proximal, Florisson et al. 2013 [2] distal, Liang et al. 2009 proximal) (Supplementary Table 2.5;

Appendix A). Some of the repeat elements I identified likely play a role in mediating the genomic rearrangements that cause 2p15-16.1 deletions.

Encode regulatory elements: a region search using the entire 2p region (chr2:55,580,038

– 65,440,018; hg19) showed that the entire region contained a total of 1,336 DNaseI hypersensitive sites (DHS) sites. A region search using the genomic segment containing the most frequently deleted genes (chr2:60,678,302 - 61,765,418; hg19) contained a total of 257 DHS sites. Encode regulatory elements for the 2p15-16.1 deletion region are shown in Figure 2.4.

VISTA enhancer elements: a total of 49 human enhancer elements were downloaded from the VISTA Enhancer Browser using genomic positions for the entire 2p region (chr2:55,580,038

– 65,440,018; hg19). Of these, 18 elements had positive enhancer activities (i.e. those with experimental evidence showing that they drive reporter gene expression in mouse embryos)

(Visel et al., 2007). Positive enhancer elements, listed in Table 2.7, were used to make a custom track in the UCSC genome browser (Figure 2.4).

73

Chromsome Position element # Expression Pattern Location [observed embryos/total embryos] chr2:58748340-58750140 hs1174 dorsal root ganglion[6/6] chr2:58799729-58800607 hs1071 ear[4/10] chr2:58859997-58861674 hs1152 limb[4/5] chr2:58975738-58977115 hs1067 dorsal root ganglion[3/7], limb[5/7] chr2:59102071-59103380 hs1199 other[3/6] chr2:59178992-59180242 hs1181 heart[3/8] chr2:59198905-59200529 hs393 eye[6/12] chr2:59304974-59306893 hs975 midbrain (mesencephalon)[4/7] chr2:59476604-59477955 hs1119 neural tube[6/6], hindbrain (rhombencephalon)[5/6] chr2:59540640-59541937 hs836 facial mesenchyme[5/12] chr2:59746377-59746992 hs394 midbrain (mesencephalon)[4/11] chr2:60352514-60353602 hs779 midbrain (mesencephalon)[8/9], forebrain[5/9] chr2:60441495-60442515 hs399 forebrain[5/7] chr2:60498057-60502013 hs1535 hindbrain (rhombencephalon)[4/5] chr2:60761404-60763073 hs957 forebrain[4/4] chr2:60855056-60856888 hs1142 hindbrain (rhombencephalon)[3/3] chr2:63193855-63194929 hs690 midbrain (mesencephalon)[7/11] hindbrain (rhombencephalon)[5/5], midbrain (mesencephalon)[5/5], chr2:63275695-63277103 hs1066 forebrain[5/5]

Table 2.7 Positive VISTA Enhancer Elements in the 2p15-16.1 Deletion Region This table contains the 18 positive enhancer elements found in the 2p region and includes their chromosomal position (chr:start-stop), element number, and observed expression pattern location in mouse embryos. The elements are listed in order of their position in the 2p15-16.1 deletion region from distal (telomeric) to proximal (centromeric).

74

WebGestalt functional enrichment analysis: the top 10 pathways from the Pathway

Commons enrichment analysis include multiple signaling pathways which are shown below in

Table 2.8. The top 10 pathways share two genes, XPO1 & REL, from the most commonly deleted genes. Based on the pathways identified, both genes are involved in multiple cellular processes.

Pathway Name #Gene Gene ID Statistics IL2 signaling events mediated by PI3K 2 XPO1, REL C=67;O=2;E=0.02;R=99.03;rawP=0.0002;adjP=0.0017 Aurora A signaling 2 XPO1, REL C=64;O=2;E=0.02;R=103.67;rawP=0.0002;adjP=0.0017 CD40/CD40L signaling 2 XPO1, REL C=58;O=2;E=0.02;R=114.40;rawP=0.0001;adjP=0.0017 IL23-mediated signaling events 2 XPO1, REL C=66;O=2;E=0.02;R=100.53;rawP=0.0002;adjP=0.0017 Canonical NF-kappaB pathway 2 XPO1, REL C=35;O=2;E=0.01;R=189.57;rawP=4.96e-05;adjP=0.0017 Endogenous TLR signaling 2 XPO1, REL C=57;O=2;E=0.02;R=116.40;rawP=0.0001;adjP=0.0017 Signaling events regulated by Ret tyrosine kinase 2 XPO1, REL C=69;O=2;E=0.02;R=96.16;rawP=0.0002;adjP=0.0017 IL12-mediated signaling events 2 XPO1, REL C=113;O=2;E=0.03;R=58.72;rawP=0.0005;adjP=0.0023 Polo-like kinase signaling events in the cell cycle 2 XPO1, REL C=109;O=2;E=0.03;R=60.87;rawP=0.0005;adjP=0.0023 IL2-mediated signaling events 2 XPO1, REL C=115;O=2;E=0.03;R=57.69;rawP=0.0005;adjP=0.0023

Table 2.8 Enriched Pathways for the 13 Most Commonly Deleted Genes in the 2p15-16.1 Microdeletion Region This table lists the top 10 pathways for the WebGestalt pathway commons enrichment analysis. The first column lists the pathway, the second and third columns indicate the number and gene ID for the genes involved, and the last row contains the statistics from the enrichment analysis: number of reference genes in the category (C), number of genes in the gene set and also in the category (O), expected number in the category (E), Ratio of enrichment (R), p value from hypergeometric test (rawP), and p value adjusted by the multiple test adjustment (adjP).

After multiple test correction, no protein-protein interactions remained significant in the

Protein-Protein Interaction (PPI) enrichment analysis. However, hierarchical protein interaction network modules (protein-protein interactions with at least one supporting publication) for the proteins from the 2p15-16.1 deletion region showed that there are many protein-protein interactions linking the proteins from the 2p region (Figure 2.12). Nine of the ten genes from the most commonly deleted region code for proteins that have protein-protein interactions with at least one supporting publication. Six of the proteins from the most commonly deleted region

75

(XPO1, USP34, c-REL, BCL11A, PEX13 and K1AA1841) are linked to other proteins from the most commonly deleted region through shared protein interactions. XPO1 and c-REL have the most shared interactions; XPO1 is linked to 4 other proteins from the commonly deleted region and c-REL linked to 3. One protein, ubiquitin C (UBC), has reported protein-protein interactions with 4 proteins (XPO1, USP34, BCL11A, PEX13) from the most commonly deleted 2p15-16.1 region.

76

Figure 2.12 Hierarchial Protein-Protein Interaction Network Modules This figure shows protein-protein interactions for proteins coded by genes from most commonly deleted region for 2p15-16.1 microdeletions (green).

77

2.5 Discussion

2.5.1 Summary of Genomic and Phenotypic Findings

I performed phenotype and genotype analysis for 23 individuals with 2p15-16.1 microdeletions (15 reported and 8 new) and showed that individuals with this microdeletion share common clinical features, helping to define the syndrome.

I also identified a subset of frequently deleted genes. More specifically, I determined that the most commonly deleted genes are found in a genomic segment ~1.087 Mb (chr2:60,678,302

- 61,765,418; hg19) in length that contains 13 genes: 10 coding (USP34, XPO, PAPOLG,

KIAA1841, C2orf74, AHSA2, BCL11A, REL, PUS10, PEX13) and 3 non-coding (SNORA70B,

LOC339803 and LINC01185). The most frequently deleted genes, USP34 and XPO1, are deleted in ~74% (N=17) of cases and have reduced protein expression in keeping with the predicted effect of their deletion. In addition, REL, the third candidate gene with a high haploinsufficiency score (

XPO1 is a regulator of mitosis (Wang et al., 2005), and c-REL is a transcription factor in the NF- kB pathway (Gregersen et al., 2009) which has been associated with learning and memory (Ahn et al., 2008); see Table 2.5 for additional gene functions. The interaction and involvement of

REL and XPO1 in a number of common biological processes previously associated with brain function and cognition (e.g. NF-B), which I determined using bioinformatics, further supports their role in the syndrome and ID in subjects with 2p15-16 deletions.

All deletions included in my study are de novo in origin and occur on both maternal and paternal chromosome. I did not identify a pathogenic sequence on the intact allele in the 2p

78

deletion region with exome sequencing, reducing the possibility that the cause of the syndrome is a single gene with an autosomal recessive (“second hit”) type of inheritance. In addition, the reason for recurrence of CNVs of variable size in the 2p15-16 chromosomal region remains to be resolved as my analysis of repeat elements flanking the CNVs in 23 cases did not uncover common repeat sequences that would drive the genomic rearrangements.

2.5.2 Phenotype-Genotype Correlations

Phenotype-genotype correlations for the 2p15-16.1 region have been difficult to establish although individuals with 2p15-16.1 deletions often share common features (e.g. DD, ID, delayed language skills, postnatal feeding problems, hypotonia, small head size (OCF <3rd or

5th-10th per centile), dysmorphisms of the head (bitemporal narrowing, various head shape abnormality), eyes (epicanthal folds, telecanthus, ptosis, down slanting palpebral fissure), nose

(broad/high nasal root), mouth (smooth and long philtrum, high narrow palate or other palate abnormalities), and digits (camptodactyly and/or metatarsus abductus).

One reason for this difficulty is that a large number of shared phenotypes are seen in individuals with non-overlapping deletions (Fannemel et al., 2014; Hancarova et al., 2013;

Prontera et al., 2011). In addition, individuals with closely overlapping deletions (Felix et al.,

2010; Liang et al., 2009) have multiple phenotypes reported for one individual that are not seen in the other (e.g. postnatal feeding problems, camptodactyly, long, straight eyelashes, large ears, receding short forehead, widened internipple distance, abnormal EEG, and seizures reported positive in one case [Liang et al. 2009] but negative in the other [Felix et al. 2010]; short height

(<3rd per centile), attention deficit behaviour, and optic nerve hypoplasia reported positive in one case [Felix et al.2010] but negative in the other [Liang et al. 2009]). Furthermore, the size of a 2p15-16.1 deletion is not a good predictor of phenotypic severity. For example, Hancarova et

79

al. (Hancarova et al., 2013) report a smaller than average deletion (0.438 Mb) however, the total number of abnormalities reported for this individual is similar to the number seen in the individual with the largest deletion (New Case No. 1). Finally, deletions outside of the most commonly deleted region can result in the common phenotypic features reported for the 2p15-

16.1 deletion syndrome. For example, the deletion in the patient reported by Prontera et al.

(Prontera et al., 2011) and the deletion in our Case No. 2 both occured distal to the region with the most commonly deleted genes and included 4 genes (2 coding, VRK2, FANCL; and 2 non- coding, LINC01122, AC007131.1). However, the phenotypes seen in these individuals are in keeping with the other reported 2p15-16.1 deletions and Prontera et al. (Prontera et al., 2011) suggest that the 2p15-16.1 CNV exerts pathogenic effects through several genes and/or involves regulatory elements as Florrison et al. (Florisson et al., 2013) previously suggested.

The smallest deletion (0.203 Mb at 2p16.1) reported in this chapter overlaps a single gene, BCL11A, which has allowed for specific genotype-phenotype comparisons for this gene

(Peter et al., 2014). The individual with this deletion is reported to have mild intellectual and language delays but no microcephaly, growth retardation or additional phenotypic abnormalities.

The authors therefore postulate that delayed language and mild cognitive impairment can be caused by heterozygous loss of the BCL11A gene while other developmental and physical abnormalities are caused by other genes in the region (Peter et al., 2014).

The second smallest deletion (0.233 Mb at 2p15) reported in this chapter partially overlaps 2 coding genes, XPO1 and USP34, and fully overlaps the non-coding gene SNORA70B

(Fannemel et al., 2014). The individual with this deletion is reported to have mild intellectual delay and language delay along with many of the commonly reported phenotypes (small head size [OFC 10th per centile], broad/high nasal root, smooth and long philtrum, ptosis, bitemporal

80

narrowing, hypertelorism, high narrow palate) in individuals with 2p15-16.1 deletions. The authors therefore suggest that XPO1 and USP34 play a role in the distinct cranio-facial phenotypes and ID observed in 2p15-16.1 deletion carriers (Fannemel et al., 2014).

The genomic and phenotypic heterogeneity seen for 2p15-16.1 microdeletion carriers along with the observations above are puzzling. Below I discuss several possibilities for how different deletions can cause similar phenotypes.

2.5.3 Reasons for Phenotypic Similarities in 2p15-16.1 Deletion Carriers

Phenotypic similarities in individuals with non-recurrent2p15-16.1 microdeletions may be due to many factors. The most likely factors based on my investigations include:

i) the presence of more than one haploinsufficent gene in the 2p15-16 deletion

region,

ii) dysfunction of a gene with a normal copy number from the 2p region due to a

deletion of its regulatory elements(s), e.g. enhancers, and/or

iii) dysfunction of developmental pathway(s) in which one or more genes from the

deleted region are involved.

2.5.3.1 Multiple Haploinsufficent Genes are Located in the 2p15-16.1 Region

Haploinsufficiency, the inability of a gene to function properly when only one copy is present, is often associated with developmental disease and if involved in a neurodevelopmental process can lead to lower IQs and DD (Girirajan et al., 2013; Huang et al., 2010). Deletions that overlap with HI genes often result in severe developmental phenotypes with increased penetrance (Cooper et al., 2011). Of the 13 most commonly deleted genes in 2p15-16.1 microdeletions, those predicted to be HI are XPO1, REL, USP34, and to a lesser probability

BCL11A (Huang et al., 2010). The majority of 2p15-16.1 microdeletions reported in this chapter

81

contain at least one or more genes with high likelihood of being HI (low HI score). Specifically,

13 cases have deletions which include 3-4 HI genes while 8 cases contain 1-2 HI genes. Of the cases with 1-2 HI genes deleted, 3 cases overlap BCL11A and REL while 4 cases overlap XPO1 and USP34, and 1 case overlaps XPO1, further supporting the role of XPO1, REL and USP34 in the syndrome. I have shown that these genes have reduced protein expression, which is in keeping with my expression results which showed reduced mRNA expression for REL and

USP34. For XPO1 however, I found the mRNA expression not to be altered with deletion.

Discrepancy between RNA and protein expression has been previously reported (Schwanhausser et al., 2011; Vogel and Marcotte, 2012). It is possible that the 3 HI gene(s) in the 2p15-16.1 region (XPO1, REL and USP34) contribute to the most frequently reported phenotypes for 2p15-

16.1 CNV carriers as there are only 2 cases out of 23 that do not contain any of the HI genes mentioned above (Prontera et al. 2011 and new Case No. 2), although our case number 2 contains a small CNVs in the BCL11A region (discussed further below).

2.5.3.2 Contribution of Regulatory Elements in the 2p15-16.1 Region

Similarities in phenotypes can be caused by deletions which include regulatory elements for a gene, whether or not the gene itself is deleted. Regulatory elements, DNA sequences that are outside of gene coding regions, are implicated in the complex control of gene expression during development (Consortium, 2012). In particular, elements that enhance gene expression

(enhancers) or block gene expression (insulators), help control when and where a gene is expressed (Strachan et al., 2011). Because regulatory elements can be located at distances far from the genes they help regulate, a CNV that results in genomic deletion can remove an enhancer or insulator element causing a tissue specific loss of expression or gain of tissue specific expression respectively, or alternatively, change the position of elements bringing them

82

closer to gene(s) which were not previously in their control (Lettice et al., 2011; Spielmann and

Klopocki, 2013; Strachan et al., 2011). In the general population a bias of CNVs away from enhancers and other conserved elements suggests that these elements may cause phenotypic consequences when disrupted by a CNV and recent examples of phenotypic consequences caused by CNVs overlapping non-coding regulatory elements support this idea (Conrad et al.,

2010; Spielmann and Klopocki, 2013).

Deletions may also eliminate a topologically associating domain (TAD), ~0.1 to 1 Mb domains of the genome, within which promoter-enhancer interactions tend to occur (Dixon et al.,

2012; Matharu and Ahituv, 2015). TAD boundaries are enriched in insulator and barrier element activity and CNVs that disturb 3D chromatin organization, promoter/enhancer interactions within the domain, or cause loss of insulator barrier elements may affect transcription (Dixon et al., 2012; Matharu and Ahituv, 2015). The genomic region that encompasses the 2p15-16.1

CNVs reported in this chapter (chr2: 55,580,038 – 65,440,018) contains a large number of enhancer elements (VISTA) and regulatory regions (ENCODE). Eighteen of 49 human enhancer elements in this region are expressed in developing mouse tissue, meaning that there is experimental support of their influence on gene expression during development. Interestingly, a number of enhancers in the 2p15-16.1 region are expressed in tissues that are affected in individuals with 2p15-16.1 deletions (e.g. eyes, limbs, ears, facial mesenchyme, and multiple brain related tissues such as forebrain, midbrain, neural tube etc. in the developing mouse)

(Florisson et al., 2013). In addition, the enhancers located in the 2p15-16.1 region seem to occur in clusters with the majority occurring distal to the genomic segment that contains the most commonly deleted genes. For example, 11 enhancers are found within a ~1 Mb segment

(chr2:58,748,340-59,746,992); these enhancers show reporter gene expression in dorsal root

83

ganglion, ear, limb, other, heart, eye, midbrain (mesencephalon), neural tube, hindbrain

(rhombencephalon), and facial mesenchyme. A smaller cluster of 3 enhancers found within a

~150 kb segment (chr2:60,352,514-60,502,013) closer to the region containing the most commonly deleted genes show reporter gene expression in midbrain (mesencephalon), forebrain, and hindbrain (rhombencephalon). Only 2 enhancers overlap the genomic segment that contains the most commonly deleted genes; these enhancers are located within ~95 kb (chr2:60,761,404-

60,856,888) and show reporter gene expression in the forebrain and hindbrain

(rhombencephalon). The last 2 enhancers are found within ~83 kb segment (chr2:63,193,855-

63,277,103), proximal to the genomic segment that contains the most commonly deleted genes, and show reporter gene expression in hindbrain (rhombencephalon), midbrain (mesencephalon), and forebrain.

In addition to the experimentally validated enhancers in the region, a large number of

DNaseI hypersensitive sites (DHS) sites are also present in the 2p15-16.1 region. These sites indicate regions of open DNA that can interact with DNA binding proteins such as transcription factors, making them possible regulatory sequences (Sheffield et al., 2013). There are a total of

1,336 DHS sites in the genomic region that encompasses the 2p15-16.1 CNVs reported in this chapter (chr2: 55,580,038 – 65,440,018); 257 of these are located within the genomic segment that contains the most commonly deleted genes (chr2:60678302-61765418).

Interestingly, CNVs in two cases (Prontera et al. 2011 and Case No. 2) are proximal to the genomic segment with the most commonly deleted genes but overlap with a large number of enhancers (cluster of 11). In addition, in our Case No. 2, there are two additional CNVs in the

2p15-16 region with known regulatory roles. One of them is a 17 kb CNV in the second intron of

BCL11A which overlaps with a previously reported F cell quantitative trait locus (QTL)

84

associated with fetal hemoglobin levels (Menzel et al., 2007). This intron also contains a SNP associated with schizophrenia and autism (Basak et al., 2015; Hinney et al., 2011; Schizophrenia

Working Group of the Psychiatric Genomics, 2014). The second CNV in our Case 2 is a small intergenic deletion distal to BCL11A that overlaps a positive VISTA enhancer (hs1142). Several recent studies point to a role for BCL11A in neurodevelopmental phenotypes based on CNVs and mutations detected in this gene (Balci et al., 2015; Deciphering Developmental Disorders, 2015;

Funnell et al., 2015). Although its exact role in brain development is not known, BCL11A is expressed in brain and new research shows that BCL11A is involved in a regulatory pathway used by migrating cortical neurons (Allen Brain Atlas., 2015; Wiegreffe et al., 2015).

2.5.3.3 Shared Pathways & Protein-Protein Interactions

Dosage effects for genes within CNVs are sometimes incremental, not causing phenotypic effects until they reach a certain threshold (Lee and Scherer, 2010). However, deletions that contain multiple gene partners may cause phenotypic effects through their combined effect on a pathway or through protein-protein interactions.

Two genes from the 2p15-16.1 microdeletion region, XPO1 & REL, produce proteins involved in multiple signaling pathways including the canonical NF-kappaB pathway. The NF- kappaB signaling pathway is a master gene regulatory pathway implicated in a number of biological processes including gene transcription in the brain (Sarnico et al., 2012). While not identified in the pathway commons enrichment analysis, USP34 has also been associated with the NF-kappaB pathway as a negative regulator (Poalas et al., 2013). It is therefore possible that the inclusion of one or more of these genes in a deletion may lead to the malfunction of the NF- kappaB pathway. This pathway is associated with long-term synaptic plasticity and memory formation, and mutations in its components have been associated with ID (Ahn et al., 2008;

85

Kaltschmidt and Kaltschmidt, 2009; Philippe et al., 2009). In addition to shared pathways, protein-protein interactions link many of the proteins from the 2p region. Six of the proteins from the most commonly deleted region (XPO1, USP34, c-REL, BCL11A, PEX13 and K1AA1841) are linked to other proteins from the most commonly deleted region through shared protein interactions. XPO1 and c-REL have the highest number of interactions reported for the same protein (7 shared proteins). XPO1 and BCL11A have the next highest number of interactions reported for the same protein (3 shared proteins). These interactions point to the complexity and multitude of interactions between genes in the 2p15-16.1 region, and the possibility that disturbance of these interaction by deletions of one or more of their components leads to a similar phenotypes although the deletions are different in sizes or exact gene content.

2.6 Conclusion

My studies suggest that the 2p15-16.1 microdeletion syndrome is likely a contiguous gene disorder with several key genes or common biological processes involved rather than a single gene disorder. While there are many genes in the 2p15-16.1 region, my study has combined multiple investigations in order to identify three candidate genes, USP34, XPO1, and

REL from the 2p15-16.1 critical region as possible candidates for ID. The involvement of my candidate genes in common pathways (e.g. NF-B), and through interactions with other proteins may cause the phenotypic similarities in the 2p15-16 syndrome.

86

Chapter 3: Investigations of 1q21.1 Copy Number Variations (CNVs)

3.1 Background

Pathogenic CNVs are typically distinguished from benign CNVs based on their presence in affected individuals and absence in unaffected parents, siblings or in the general population

(Sharp et al., 2008). However, a growing number of recurrent CNVs predisposing to neurodevelopmental disorders (NDD) have recently been detected in affected and unaffected individuals, sometimes within the same family (e.g. 16p11.2, 16p12.1, 16p13.3, 1q21.1, and

15q13.3) (Cooper et al., 2011; Girirajan et al., 2013; Kaminsky et al., 2011). Predisposing CNVs are difficult to interpret in a clinical setting as they overlap different disorders, exhibit incomplete penetrance, and are associated with variable expressivity (Girirajan et al., 2012;

Veltman and Brunner, 2010). Variable expressivity occurs when the same genotype contributes to different phenotypic outcomes within a disorder or leads to different disease outcomes entirely

(Cooper et al., 2013; Cooper et al., 2011). Incomplete (or reduced) penetrance, commonly seen for inherited CNVs that are associated with ID, occurs when a mutation causes a phenotype in some, but not all individuals with the mutation (Cooper et al., 2013). The clinical relevance of predisposing CNVs is inferred from their observed enrichment in NDD compared to the general population (Rosenfeld et al., 2013).

3.2 The 1q21.1 CNV

The recurrent 1q21.1 CNV belongs to this group of clinically ambiguous CNVs. 1q21.1

CNVs (microdeletions and microduplications) can be inherited (~83% of cases) or de novo.

1q21.1 CNVs are sometimes seen in the general population although they occur at higher frequency in individuals with abnormal NDD phenotypes (including ID and congenital anomalies) than in controls (0.49% vs 0.06% respectively) (Rosenfeld et al., 2013). A growing

87

number of NDD phenotypes (ID, ASD, SCZ, and epilepsy) and congenital anomalies (congenital cataracts, congenital heart disease, and dysmorphic features) have been reported for carriers of

1q21.1 CNVs (Brunetti-Pierri et al., 2008; Christiansen et al., 2004; Greenway et al., 2009;

Mefford et al., 2008). While both deletions and duplications at 1q21.1 have been reported in individuals with ID, deletions are enriched in individuals with developmental delay while reciprocal duplications are enriched in individuals with ASD (Girirajan et al., 2013). Penetrance estimates for 1q21.1 deletions and duplications are 36.9% and 29.1% respectively (Rosenfeld et al., 2013). In addition to NDD phenotypes, milder phenotypes including a wide range of learning and behavioural difficulties (ADHD, anxiety/depression, and antisocial behaviours) have also been reported in individuals with 1q21.1 CNVs (Brunetti-Pierri et al., 2008; Mefford et al.,

2008). In my study, retrospective phenotypic analysis for three families showed that all 1q21.1

CNV carriers shared some form of learning difficulty (Harvard et al., 2011). It is therefore possible that 1q21.1 CNV carriers reported in the general population, have milder phenotypic features, and are under recognized because of variable expressivity resulting in subclinical phenotypes, similar to carrier parents of the 16p12.1 microdeletion/duplication (Girirajan and

Eichler, 2010; Rosenfeld et al., 2013).

The majority of reported 1q21.1 CNVs share breakpoints and overlap a 1.35 Mb critical region (144.5-146.3 Mb, hg18) that contains approximately 12 genes (Brunetti-Pierri et al., 2008;

Harvard et al., 2011; Mefford et al., 2008). 1q21.1 CNV breakpoints can vary slightly between individuals, although there is often no detectable difference in CNV size between more and less affected individuals from the same family (Harvard et al., 2011). The genomic instability in the

1q21.1 CNV region and recurrence of deletions and duplications is caused by large segmental duplications (281 kb) that are highly homologous (i.e. sequence identity is greater than 99.9%)

88

therefore 1q21.1 CNVs are likely generated through NAHR (Brunetti-Pierri et al., 2008; Mefford et al., 2008).

3.2.1 Contributors to the Phenotypic Variability in 1q21.1 CNV carriers

Reasons for the phenotypic variability in 1q21.1 CNV carriers, especially in families with unaffected and affected subjects, are unknown although recent studies point to several possibilities which I briefly discuss below.

3.2.1.1 Number of DUF1220 Copies in the 1q21.1 CNV

DUF1220 sequences are found in large numbers in the 1q21.1 CNV, largely in NBPF gene family members, and encode a protein domain that is highly expressed in the brain

(O'Bleness et al., 2014; Popesco et al., 2006). Hyper-amplification of DUF1220 repeats is specific to humans and DUF1220 repeats have been implicated in brain size and brain evolution in primates and are drivers of neural stem cell proliferation (Keeney et al., 2014a; Keeney et al.,

2014b; Popesco et al., 2006). Dumas et al. (Dumas et al., 2012) reported that the numbers of

DUF1220 sequences correlate with brain size in subjects with 1q21.1 CNVs implicating the loss of DUF1220 sequences in the observed microcephaly in 1q21.1 deletion carriers. In addition, the number of repeats, especially the CON1 subtype, have been linked to increased severity in ASD and to cognitive aptitude (Davis et al., 2014; Davis et al., 2015; O'Bleness et al., 2012). This finding is intriguing and may help to explain some of the variability in phenotypes for 1q21.1 carriers (i.e. microcephaly and different learning abilities).

3.2.1.2 Presence of Dosage Sensitive Genes within the 1q21.1 CNV Region

Genes whose dosage sensitivity “drives” or causes a phenotype, have been found in several pathogenic CNVs (discussed in detail in section 1.2.2.3.3). The variability and severity of

89

phenotypes seen in 1q21.1 CNV carriers, similar to other CNVs that cause variable phenotypes, may be due to one or more gene(s) within the CNV region (Girirajan and Eichler, 2010).

3.2.1.3 Single Nucleotide Changes within 1q21.1 CNVs and in the Rest of the Genome

Whole exome and whole genome sequencing has been used to identify sequence variants that contribute to phenotypic variability in cases that have overlapping/similar CNVs. Sequence changes in gene(s) from the CNV region as well as in the rest of the genome are reported in individuals with CNVs who have more severe phenotypes that other individuals with the same

CNV.

For example, a 1q21.1 deletion, mapping distal to the 1q21.1 CNV studied in this chapter, is found in individuals with Thrombocytopenia with absent radii (TAR) syndrome (an autosomal recessive disorder) and can be either inherited or de novo (Cooper et al., 2011;

Klopocki et al., 2007). While the majority of individuals with TAR syndrome have a 200 kb minimal deletion on chromosome 1q21.1 (chr1:145,386,506-145,748,067 bp), the CNV is not sufficient to cause a disease phenotype as carrier parents are often unaffected (Klopocki et al.,

2007).

The underlying cause of TAR syndrome was not known until recently when whole exome sequencing of individuals with TAR syndrome and the distal 1q21.1 CNV identified two low frequency SNPs in 5 individuals in noncoding regions of RBM8A, a gene located within the

CNV region on the non-deleted chromosome: one SNP was located in the 5’ UTR and the other in exon 1 (Albers et al., 2012). Both SNPs significantly decreased the promoter activity of the remaining copy of RBM8A in human megakaryocytes (progenitors of platelets) and reduced protein expression of Y14, the protein encoded by RBM8A, was noted in platelet lysates in individuals with TAR syndrome compared to healthy controls (Albers et al., 2012). Further proof

90

that RBM8A caused TAR syndrome was based on finding frameshift or nonsense mutations on one allele of RBM8A in addition to the 5’ UTR variant on the other allele in individuals with

TAR syndrome who were not carriers of the 1q21.1 distal CNV (Albers et al., 2012; Yassaee et al., 2014). The above studies showed that carriers with distal 1q21.1 CNVs are only affected when nucleotide level changes affect the other copy of RBM8A, while presence of the 1q2.1

CNV alone is associated with a normal phenotype.

Similarly, 22q11.2 deletions associated with typical Di George Syndrome can also occur in individuals with atypical phenotypes and unaffected parents (Cuneo, 2001; McDonald-

McGinn et al., 2001). The reason for the phenotypic variability is not known but novel pathogenic mutations in SNAP29, a gene within the 22q11.2 deletion region, were found in individuals with atypical phenotypic features of Di George syndrome (McDonald-McGinn et al.,

2013). The 22q11.2 deletion in these individuals unmasked an autosomal recessive condition,

CEDIC syndrome (Fuchs-Telem et al., 2011), cause by one null allele due to 22q11.2 CNV

(deletion) and a pathogenic mutation in the second allele of the SNAP29 gene.

In addition to mutations within a CNV region, pathogenic sequence changes in the rest of the genome can also contribute to phenotypic variability. Classen et al. (Classen et al., 2013) identified pathogenic mutations outside the CNV region in 3 individuals with familial CNVs (a

14q32 microdeletion overlapping an imprinted region and two inherited 22q11.2 microduplications) who had atypical phenotypes that could not be explained by the presence of the CNV alone. Pathogenic mutations outside the CNV regions explained the atypical phenotypic presentation in these individuals (Classen et al., 2013).

91

3.3 Chapter Goals

The overall goal of this Chapter is to help further characterize phenotypic consequences of 1q21.1 CNVs (microdeletions and microduplications) and to search for ID candidate genes in individuals with more severe phenotypes. Specifically, I set out to:

1) Explore the functional consequence of a deletion/duplication on genes within 1q21.1

CNVs in 8 individuals and to identify candidate genes for ID. I used a multifaceted approach

(clinical assessment, chromosomal microarray and whole genome expression) to select candidate genes and collaborated to determine the functional impact of 1q21.1 CNVs on two of the candidate genes. My results are described in detail in my publication (Harvard et al., 2011).

2) Find additional sources of phenotypic variability in individuals with 1q21.1 CNVs using exome sequencing (i.e. look for mutation(s) in genes from the 1q21.1 CNV occurring on the intact allele or for additional mutations in the rest of the genome in more affected individuals).

3) Validate and explore the functional consequences of identified mutations.

3.4 Materials and Methods

Three families (A, B, and C) were included in my study. A summary of the tests performed on 1q21.1 CNV carriers is shown in Table 3.1.

92

Assessment of function of two Analysis of ER stress Whole Assessment of genes (CHD1L Whole exome response genes Families and subjects genome ATF6 function in and PRKAB2) sequencing (genome wide expression LBCs from 1q21.1 expression array) CNV in LBCs A 1- proband (1q21.1 del) 2- mother (1q21.1 del) Trio (A 1, A2, A4), A3 3- grandmother (1q21.1 del) A1, A2, A3 N/A - A1, A2, A3 and A5 4 - father 5 - sibling B 1- proband (1q21.1 del) 2- B1’s daughter (1q21.1 del) - B1 - - - 3- maternal uncle (complex 1q21.1 CNV, dup and del) C C1 and C2 1- proband (1q21.1 dup) (expression of 2- father (1q21.1 dup) C1, C2 C1 Trio (C1, C2, and C3) ATF6 and ER C1 and C2 3- mother stress response genes)

Table 3.1 Summary of Tests Run for 1q21.1 CNV Carriers A multifaceted approach was used to search for underlying genetic causes of phenotypic variability in 1q21.1 carriers (del=deletion, dup=duplication). Methods used include whole genome expression array, whole exome sequencing and functional assessment of candidate genes in LBCs established from 1q21.1 CNV carriers. N/A stands for not available.

3.4.1 Subjects

Probands A1 and C1 presented with idiopathic ID and were enrolled for array CGH screening for pathogenic CNVs. The criteria for enrollment included: i) normal by routine cytogenetic testing at the 500–550 band level resolution; ii) negative fragile X testing by

DNA analysis; iii) a phenotype score ≥ 3 on a testing prioritization checklist adapted from de

Vries et al. (Qiao et al., 2008); and iv) both parents available for testing.

Proband B1 and her daughter B2 were recruited via a clinical genetics service in Toronto

(Dr. Eva Chow). B1 was referred because of the onset of psychosis and a history of ID. Her daughter, B2 was referred because of significant developmental delays. B1 and B2 had normal and Fragile X testing. B1’s brother, B3, was also recruited through clinical genetic

93

service in Toronto because of the family history of 1q21.2 CNVs. Parents for B1 were not available, nor was the father for B2.

3.4.2 Chromosomal Microarrays (CMA)

DNA was extracted from whole blood using an ArchivePure DNA Purification Kit (5

PRIME). 1q21.1 CNVs were identified initially by oligo based array-CGH using three types of whole genome arrays: Nimblegen 385k, Agilent 105k, and the clinical Signature GenomicsChip

WGTM. Seven of eight subjects were also analyzed using the new Affymetrix Cytogenetics

Whole-Genome 2.7M Array (DNA was not available from B2 for high resolution array analysis).

This higher resolution array contains approximately 400,000 SNP markers and 2.3 million non- polymorphic markers, with high density coverage across cytogenetically significant regions.

Data was collected and analyzed as described in section 2.3.3. The annotation file used in our analysis can be found on the Affymetrix website, listed as ArrayNA30.2 (hg18). CNVs detected with the high resolution array were compared with the DGV for overlap with copy number variants in controls using criteria for defining common variants (Qiao et al., 2010).

3.4.3 RNA Extraction

Lymphoblastoid cell lines (LBCs) were transformed and maintained as previously described in Chapter 2 (section 2.3.2); RNA was extracted from LBCs using a Qiagen RNeasy

Plus Mini Kit (Qiagen) with optional DNase treatment using a RNase-Free DNase Set (Qiagen) and was then stored at -80°C. Immediately prior to use, samples were run on a 2100 Bioanalyzer

(Agilent) to check for quality and degradation. All RNA samples used in the expression study had RIN >9.

94

3.4.4 Whole Genome Expression

RNA from LBCs was used to study gene expression in individuals with 1q21.1 microdeletions (A1-3), microduplications (C1 & C2), and in 3 control individuals. Transcript levels were assayed using a commercial whole genome expression array, HumanRef-8 v3.0

Expression BeadChip, using standard protocols (Illumina). Briefly, 2 µL of total RNA was quantified using Quant-iT™ RiboGreen® RNA reagent (Invitrogen) prior to RNA amplification.

Five microliters of total RNA (50-500 ng) was then used in the first- and second-strand reverse transcription step followed by a single in vitro transcription (IVT) amplification. Array hybridization, washing, blocking, and streptavadin-Cy3 staining were also done according to standard protocols (Illumina). The BeadChip was then scanned using an Illumina BeadArray

Reader to quantitatively detect fluorescence emission by Cy3. Eight arrays were run in parallel on a single BeadChip. Each array contained ~ 24,500 well-annotated transcripts (NCBI RefSeq database Build 36.2, Release 22), present multiple times on a single array.

Background-corrected intensity values were generated for each probe using

GenomeStudio software (Illumina). Subsequent analyses were carried out in R [http://www.R- project.org/]. The data were quantile normalized and differential expression with respect to

1q21.1 copy number (1, 2, or 3 copies) analyzed using limma (Smyth, 2004), with Benjamini-

Hochberg multiple test correction to control the false discovery rate (FDR). This analysis yielded a list of all the genes present on the expression array with p-values based on correlation between expression and copy number of 1q21.1 which was used in subsequent analyses.

Genes from the 1q21.1 CNV (1.35 Mb containing 12 genes, 11 of which have probes on the whole genome expression array), and genes from the regions flanking the CNV (2.5 Mb and

5 Mb containing 57 and 150 genes respectively) were tested for enrichment using a

95

hypergeometric distribution in the top 100 genes from the expression/copy number correlation analysis. Subsequently, a Wilcoxon rank-sum test was performed to confirm these results and to determine if there was enrichment of genes from flanking regions within the entire data set.

3.4.5 In silico Functional Analysis of the Top 100 Genes

Genes which ranked highest (top 100 genes) in the expression/copy number correlation analysis were selected for further in silico functional analysis. An over-representation analysis

(ORA) for (GO) terms was performed using ermineJ

(http://www.chibi.ubc.ca/ermineJ/) (Gillis et al., 2010). GO terms considered included biological processes, molecular functions, and cellular components. The ORA analysis was run using the following settings: gene set sizes were restricted from to 3-200 genes and best scoring replicates were used for any replicate genes in the datasets.

IPA software was used to determine gene associations among the top 100 genes

(Reference set: HumanRef-8 V3.0) by performing a core analysis using the default settings

(direct and indirect relationships and endogenous chemical) and filter (all molecules and relationships).

3.4.6 Functional Studies for CHD1L and PRKAB2

Functional studies for CHD1L and PRKAB2 were performed in the laboratory of our collaborator, Dr. Mark O’Driscoll, University of Sussex. A full description of the methods used and the obtained results are published (Harvard et al., 2011).

3.4.7 Whole Exome Sequencing

Whole Exome Sequencing (WES) was performed for the multigenerational families

(subjects A1-5 and subjects C1-C3). Samples were sent to Beijing Genomics Institute (BGI) for sequencing. Agilent’s Sure Select exome capture kit (50 MB) was used to generate libraries

96

sequenced on a Hiseq2000. Reads were mapped using BWA v.0.5.9 and variant calls made using

SOAPsnp and SAMtools. SNPs and indels were filtered to look for pathogenic mutations.

3.4.7.1 Variant Filtering Strategy for 1q21.1 Families

Variants not found in the general population (i.e. minor allele frequency <1% in common

2 variant databases, NHLBI 6500 and 1000 exomes) were selected based on their inheritance patterns. Another source for allele frequencies in the general population, released after my dataset was analyzed (October 2014), is the Exome Aggregation Consortium (ExAC) dabase which currently contains allele frequencies for 60,706 unrelated individuals

(http://exac.broadinstitute.org). In both families I identified unique de novo (not present in either parent) sequence variants in the proband and unique shared sequence variants present in the carrier parent and proband (Family C), and in the carrier grandparent, parent and proband

(Family A).

After identification, variants were prioritized based on the probability that they are conserved (PhastCon >0.95, phyloP >1.5) and/or damaging (SIFT <0.05, PolyPhen2 >0.95).

Variants with high conservation scores and those predicted to be damaging, were compared to expression data from the 1q21.1 CNV whole genome expression study and were prioritized for follow-up if they had gene expression fold changes greater than 1.2 or less than 0.8 in 1q21.1

CNV carriers compared to normal controls.

3.4.7.2 Confirmation of Selected Variants by Sanger Sequencing

Confirmation of variants in two genes, DARS2 and ATF6, was performed using Sanger sequencing. Primers were designed to confirm selected variants using primer express 3.0. PCR was performed according to Invitrogen’s protocol for Taq DNA polymerase and cleaned

97

products run on an ABI 3130xl. Variants were tested in the proband and all available family members.

3.4.7.3 Gene Expression Studies for Selected Variants

The expression of the two candidate genes, DARS and ATF6, was studied by qPCR in family A and C respectively. Primer sets for ATF6 spanning exons 1-3 and exons 11-12 and for

DARS spanning exons 2-4 and 13-15 were used. Detailed methods for primer design and qPCR are previously described in Chapter 2 (Section 2.3.8.1). For family A, I used RNA extracted from the first passage of LBCs as the cell lines failed to grow after revival. For family C, transformed

LBCs were available and RNA was isolated from several different passages.

3.4.7.4 ATF6 Functional Follow-up

ATF6α is a transmembrane receptor protein that activates a branch of the Unfolded

Protein Response (UPR), an important cellular system for responding to stress in the

Endoplasmic Reticulum (ER) (Yamamoto et al., 2007). The three branches of the mammalian

ER stress response pathway and the role of ATF6α are shown in Figure 3.1.

I studied the consequences of ATF6 mutation using: i) Western blotting for protein expression, ii) analysis of gene expression of ATF6 downstream genes in the endoplasmatic reticulum (ER) stress response, and iii) response of patient LBCs to chemically induced ER stress response.

i) Protein expression of ATF6α was performed in Dr. Allan Volchuk’s lab using an anti-ATF6α antibody.

ii) To determine if the ATF6 mutation in the proband (C1) and his father (C2) would affect the constitutive expression of genes regulated directly by ATF6α or in interacting arms of the UPR pathway, I made a list of genes reported to be activated by ATF6α and its heterodimerization partner XBP1. This list included genes encoding known ER chaperones and

98

genes with ER associated degradation (ERAD) components taken from Table 2 in Lee et al. (Lee et al., 2003), Table 1 in Adachi et al. (Adachi et al., 2008) and from a list of ATF6 regulated genes in a rat insulinoma cell line (A. Volchuk, personal communication). My final gene list contained downstream (ds) genes with available expression ratios from my whole genome expression analysis: dsATF6α (38 genes), dsXBP1 (17 genes), and dsATF6α & XBP1 (6 genes).

In addition, because of the overlap between the ATF6α and IRE1 arms of the UPR (through

XBP1 and possibly through other unknown mechanisms), I included 45 genes shown to be up- or down-regulated by IRE1 (Zhang et al., 2014) and a known ER resident gene, WFS1, whose mutations have been reported to cause a progressive neurodegenerative disorder called Wolfram syndrome (Fonseca et al., 2010) in the list of genes to be assessed for expression in A1-3, C1, and C2 in comparison to controls. The complete list of genes used for this analysis contains 107 genes (Supplementary Table 3.1, Appendix B)

iii) Two chemicals, thapsigargin and tunicamycin, which cause altered protein glycolization and protein accumulation in the ER (i.e. ER stress), were used to induce ER chaperone gene expression in individuals with the ATF6 mutation and in two controls.

Experiments were modeled after So et al. (So et al., 2007). Expression of genes downstream of

ATF6, specifically GRP94 (HSP90B1), BiP (HSPA5, GRP78) and XBP1, were measured using

SYBR green qPCR with the expectation that gene expression would be altered in cell lines with the ATF6 mutation during ER stress.

99

Figure 3.1 ER Stress Response Pathway Detection of ER stress in mammalian cells occurs through 3 different branches initiated by protein sensors, PERK, ATF6α, and IRE1. The branches of the UPR respond to ER stress by increasing capacity, degrading improperly folded proteins, and by reducing translation. In response to accumulation of unfolded proteins (stress) in the ER caused by external factors (e.g. glucose deprivation, hypoxia, and disturbed Ca2+ levels (Kaufman et al., 2002)), ATF6 is cleaved to produce a potent transcription factor (Yoshida et al., 1998; Yoshida et al., 2000). The cleaved portion of ATF6, ATF6α, regulates the transcription of a large number of UPR genes including ER chaperone and ER-associated degradation (ERAD) genes: ATF6α causes the induction of ER chaperone genes while ATF6α heterodimerization with X-box binding protein 1 (XBP1) causes the induction of genes encoding ERAD components (Yamamoto et al., 2007). Original Source: Figure 2 from Fulda, S., Gorman, A.M., Hori, O. and Samali, A., 2010. Cellular stress responses: cell survival and cell death. Int J Cell Biol. 2010, 214074. doi:10.1155/2010/214074; p. 6 (Fulda et al., 2010)

100

3.5 Results

3.5.1 Genomic and Clinical Features of 1q21.1 CNV Carriers

Genomic findings for all 1q21.1 CNV carriers included in my study are presented in

Figure 3.2 and Supplementary table 3.2 (Appendix B). Detailed clinical evaluation of 1q21.1

CNV carriers, both affected as well as family members with no previously reported phenotypic presentation, was performed and is summarized in Supplementary Table 3.2 (Appendix B). This resulted in recognition of learning problems of various degrees in all studied individuals, although A3 and C2 demonstrated only subtle learning difficulties (A3 did not complete secondary school training and C2 admitted having to work very hard to pass grades). Short stature was the most consistent finding in all subjects. Microcephaly was associated with deletions and macrocephaly with duplications as previously noted (Brunetti-Pierri et al., 2008;

Mefford et al., 2008). Other phenotypic features varied within and between families.

101

Figure 3.2 Comparison of Genomic Overlap for 1q21.1 CNVs CNV breakpoints (hg18) were determined using Affymetrix 2.7 M whole genome array for all subjects except B2 whose breakpoints were determined using a SignatureChip WG v1.1. Red bars indicate a deletion of the 1q21.1 region while blue bars indicate a duplication. The previously reported minimal deletion region is shown in green. Genes seen in the majority of our cases (core genes) are shown within the boundaries of the 1.23 Mb minimal del/dup region (144.5-146.3 Mb); there are a total of 10 Genes, 2 pseudogenes & 1 predicted gene.

1q21.1 CNV breakpoints tended to be consistent within the three family groups (Figure

3.2; Supplementary table 3.2, Appendix B). The CNVs studied overlap a previously reported

1.35 Mb minimal deletion region that is spanned by two large segmental duplication blocks

(Mefford et al., 2008). In family A, the three 1q21.1 deletion carriers shared similar breakpoints.

In family B, the carrier of the largest imbalance (B3) showed the least severe phenotype even though he had a proximal duplication of the TAR syndrome region in addition to the 1q21.1 deletion. In family C, the affected proband (C1) had a slightly smaller 1q21.1 duplication than his less affected father (C2). The core genes seen in all subjects with a 1q21.1 CNV were

PRKAB2, PDIA3P, FMO5, CHD1L/ALC1, BCL9, ACP6, GJA5, GJA8, GPR89B, GPR89C,

102

PDZK1P1, and NBPF11. There were no secondary CNVs detected in our cases that could be considered pathogenic and contributing to the phenotype.

3.5.2 Whole Genome Expression Analysis in 1q21.1 CNV Carriers

A copy number/expression correlation analysis was performed for 3 subjects with microdeletions (A1-3, from family A), two subjects with microduplications (C1 and C2 from family C), and 3 controls (2 females and 1 male). Genes were listed based on the uncorrected p- values generated during the correlation analysis. The top 100 genes from this analysis are shown in Supplementary Table 3.3 (Appendix B). Significant enrichment of gene transcripts from the

1q21.1 CNV (6 of 11 genes with probes on Illumina 24K array) were detected within the top 100 genes in our 1q21.1 copy number/expression correlation analysis). The six genes (PRKAB2,

CHD1L/ALC1, BCL9, ACP6, GPR89A, and PDIA3P) were positioned significantly higher in the correlation analysis than would be expected by chance (Wilcoxon rank-sum test, p = 2.5 × 10-

14) and were positively correlated with 1q21.1 copy number (increased expression when duplicated and decreased expression when deleted) with the exception of PDIA3P which was negatively correlated copy number (Table 3.2). CHD1L/ALC1, a gene within the 1q21.1 CNV, showed the most significant correlation with copy number, p = 2.42 × 10-5, though this was not significant after multiple test correction.

The negative log10 p values for the correlation of expression and 1q21.1 copy number for all probes across all chromosomes are shown in Supplementary Figure 3.1 (Appendix B), and for chromosome 1 in Figure 3.3. I did not find evidence that the 1q21.1 CNV influenced expression of genes flanking the CNV (2.5 or 5 Mb windows; Wilcoxon rank-sum test and hypergeometric tests p > 0.2, see Methods). Gene Ontology (GO) enrichment analysis did not reveal any GO terms with more genes from the top 100 than would be expected by chance.

103

Figure 3.3 Correlation of Expression and Copy Number for Probes from Chromosome 1 Results from our copy number/expression correlation analysis are shown by plotting the negative log10 of the p values along the length (Mb) of the chromosome (see Methods). The probes of genes whose expression correlates with copy number from 1q21.1 CNV region (at position ~146 Mb) cluster above the rest of the probes (red arrow).

104

Gene

Position Symbol Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

chromodomain helicase DNA

binding protein 1-like,

1 CHD1L mRNA. 1 1q21.1c + ILMN_1786016 NM_004284.3 NP_004275.3 2.42E-05 Positive

protein kinase, AMP-

activated, beta 2 non-

10 PRKAB2 catalytic subunit, mRNA. 1 1q21.1c - ILMN_1786021 NM_005399.3 NP_005390.1 5.97E-04 Positive

G protein-coupled receptor

30 GPR89A 89A, mRNA. 1 1q21.1d + ILMN_2116594 NM_016334.2 NP_057418.1 1.55E-03 Positive

B-cell CLL/lymphoma 9,

42 BCL9 mRNA. 1 1q21.1c + ILMN_1704452 NM_004326.2 NP_004317.2 2.01E-03 Positive

protein disulfide isomerase

family A, member 3

pseudogene, non-coding

60 PDIA3P RNA. 1 1q21.1c + ILMN_2075436 NR_002305.1 3.48E-03 Negative

acid phosphatase 6,

82 ACP6 lysophosphatidic, mRNA. 1 1q21.1c - ILMN_2234343 NM_016361.2 NP_057445.2 5.14E-03 Positive

Table 3.2 Genes from the 1q21.1 CNV Affected by Copy Number Gene symbols, names, position of genes from the 1q21.1 CNV from the Expression/Copy Number Correlation analysis within the top 100 genes are listed in this table. For each gene, the uncorrected p-value and correlation direction are provided.

105

3.5.3 Functional Analysis of Candidate Genes from Whole Genome Expression

Gene function analysis was performed using LBCs from B1 and C1 which represented the 1q21.1 deletion (Del) and 1q21.1 duplication (Dup) respectively. Two genes, CHD1L/ALC1 and PRKAB2, were studied because they ranked highest in the expression/1q21.1 copy number correlation analysis (CHD1L/ALC1 position 1 and PRKAB2 position 10) and have functions in relevant cellular processes (see below). Reduced/increased levels of both CHD1L/ALC1 and

AMPKβ2 protein was seen in the LBCs with 1q21.1 Del and Dup respectively in comparison to controls. Full details of the functional studies for CHD1L and AMPKβ2 performed by my collaborator can be found in our publication (Harvard et al., 2011); the main findings relevant to my dissertation are summarized below.

3.5.3.1 Functional Assays for CHD1L/ALC1

CHD1L functions as a chromatin remodeler (Ahel et al., 2009). Since my collaborator noted that 1q21.1 dup and del cell lines had increased numbers of cells with catenated

(entangled) chromosomes, we speculated that this chromatin abnormality may be caused by inability of cells with CHD1L copy number change to efficiently manipulate chromatin structure.

To test this possibility, my collaborators reduced CHD1L expression in a control cell line (A549) using siRNA to mimic 1q21.1 deletions and found that modestly reduced CHD1L was associated with an impaired decatenation (untangling) checkpoint (DCC) activation following topoisomerase II inhibition. In normal cells, the DCC delays cell entry into mitosis (arresting cells at G2) until chromosomes have been untangled by Topo II. The efficiency of the DCC can be measured using the number of pseudomitotic cells after Topo II inhibition. Pseudomitotic cells contain condensed, but entangled chromosomes and are visibly different than normal metaphase cells. A high number of pseudomitotic cells indicates an inefficient DCC as cells have

106

entered mitosis in the presence of entangled chromosomes. In the control cell line (A549) reduction of CHD1L resulted in an increased number of pseudomitotic cells, indicating an inefficient DCC. These results allowed us to describe a novel consequence of limiting CHD1L levels.

Failure of the DCC can also ultimately result in chromosome breakage and elevated levels of genomic instability as evidenced by increased numbers of micronuclei (Bower et al.,

2010; Fenech et al., 2011). Consistent with DDC failure observed in 1q21.1 Del and Dup containing LBCs, my collaborators found elevated levels of micronuclei in both LBCs following prolonged treatment (16 hrs) with a (ICRF193), although to a greater extent in the 1q21.1 Del containing LBCs compared to the 1q21.1 Dup containing LBCs.

Nevertheless, these data are consistent with a failure to efficiently activate the DCC and with elevated levels of DSBs which manifest as micronuclei in these cultures.

3.5.3.2 Functional Assays for AMPKβ2

PRKAB2 codes for the protein AMPK2 which is one of the subunits of AMP-activated protein kinase (AMPK). AMPK senses and regulates systemic and cellular energy balance by regulating food intake, body weight, and glucose and lipid homeostasis (Kahn et al., 2005). It also plays an important role in negatively regulating the mTOR pathway that functions to control and protein biosynthesis (Garelick and Kennedy, 2011). AMPK is a heterotrimeric complex composed of a catalytic -subunit, a regulatory -subunit and an ADP/ATP-binding - subunit (Dasgupta and Milbrandt, 2009). Several isoforms of each subunit exist (1, 2, 1, 2,

1, 2, 3) thereby enabling the generation of multiple distinct heterotrimeric complexes.

Expression of AMPK2 in patient cells was decreased in the cell line with 1q21.1 Del and increased in the cell line with 1q21.1 Dup compared to a wild-type (WT) control, whilst that

107

of the 1 subunit was unaffected. Functional investigations of the AMPK pathway done by our collaborator showed sub-optimal AMPK activity in the deletion line, and to a lesser extent in the duplication line (Harvard et al., 2011).

3.5.4 Detection and Selection of Variants Using Whole Exome Sequencing

To account for the phenotypic variability in 1q21.1 CNV carriers, I postulated that additional mutations could also contribute to a more severe phenotype (i.e. ID in probands). I used whole exome sequencing (WES) to look for: a) “second hit” mutations in gene(s) from the

1q21.1 CNV region, and b) additional pathogenic mutations outside the CNV ( i.e. de novo, compound heterozygous, autosomal recessive, and X-linked mutations in family A, as the affected proband is male and milder phenotypes are seen in his mother and grandmother).

WES was performed for two families with inherited 1q21.1 CNVs (Figure 3.4). For family A, 5 individuals were sequenced; 3 with 1q21.1 deletions (A1-A3), and 2 additional individuals (father, A4, and sibling, A5, of A1) without 1q21.1 CNVs. For Family C, 3 individuals were sequenced; 2 with 1q21.1 duplications (C1, C2), and 1 additional individual

(C3) without a 1q21.1 CNV.

After applying my filtering steps (Figure 3.5), no pathogenic mutations were detected in the 1q21.1 CNV region for either proband. However, I detected one putatively pathogenic variant outside the 1q21.1 CNV in each proband (ATF6 and DARS).

108

Family A Family C

Subject A3

Subject C3 Subject A2 Subject A4 Subject C2

Subject A1 Subject A5 Subject C1 Significant ID and ID and other other more severe more severe abnormalities abnormalities

Figure 3.4 Familial Transmission of the 1q21.1 CNV Individuals with a 1q21.1 deletion are shown in red while individuals with a 1q21.1 duplication are shown in blue. Probands with ID are indicated with solid fill while individuals with learning difficulties (various degrees) are indicated with striped fill.

109

Family A Family C

94,106 Step 1) Number of starting variants 83,394

Step 2) Quality controlled variants 31,446 (read depth >10, Genotype Quality >40 alternate allele percentage >25%) 35,591

Step 3) Novel Variants 2,468 (<1% in 1kG and NHLBI6.5k exome) 2,644

685 Step 4) Non-synonymous coding, UTR variants 658

Step 5) de novo variants, recessive variants and variants shared with 194 affected parent 223

Step 6) Conserved 70 (2 of 3 conservation scores) 71

Step 7) Predicted to be damaging 18 (2 of 3 prediction scores) 28

1 Step 8) Expression fold change 1

DARS (SNP) ATF6 (InDel)

Figure 3.5 Number of Variants Remaining After Each Filtering Step for 1q21.1 CNV Carriers Quality controlled variants (step 2) and those occurring in the general population at a frequency <1% (step 3) were filtered to keep all non-synonymous, coding and UTR variants (step 4). Unique, de novo variants in probands, or unique variants shared between probands and less affected carrier family members reduced the number of candidate genes further (step 5). The remaining variants were then prioritized based on the probability that they are likely to be conserved (PhastCon >0.95, phyloP >2.5, and GERP++ >3.85) and/or damaging (SIFT score "damaging“; PolyPhen2 score "possibly damaging" or "probably damaging“; and Mutation Taster score "disease causing" or "disease causing automatic"). Variants in genes with high conservation scores and those predicted to be damaging, were compared to expression data (from our previous whole genome expression study. Variants in genes with gene expression fold changes greater than 1.2 or less than 0.8 in the 1q21.1 carriers compared to normal controls were prioritized for follow-up.

110

In family A, a de novo heterozygous mutation on chromosome 2q21.3 in the gene

Aspartyl-tRNA synthetase (DARS) at genomic position 136,680,477 bp was detected in the proband (A1). The DARS gene showed an increase in mRNA expression in the proband (A1) but not in the mother (A2) or grandmother (A3) when compared to normal controls in my whole genome expression study. The DARS transcript is 2,361 bps in length, has 16 coding exons and

12 predicted splice variants. It codes for the protein aspartate--tRNA ligase, cytoplasmic which is a member of a multi-protein complex responsible for adding aspartate to a tRNA during protein biosynthesis. The mutation (g.136680477C>T; c.688G>A; p.Gly230Arg) (RefSeq NM_001349), a SNV in exon 9, is predicted to cause a damaging substitution at position 230 of the

501 amino acid protein product. Compound-heterozygous and homozygous mutations of DARS have been reported in individuals with leukoencephalopathy who have hypomyelination with brain stem and spinal cord involvement and leg spasticity (Taft et al., 2013).

I was unable to confirm the presence of the DARS SNV in the proband (A1) or any family members using Sanger sequencing due to difficulty in designing primers that would amplify a single gene product. DARS is a member of a large family of genes and has high sequence similarity to other family members. In addition, no change in gene expression for

DARS was detected in LBCs for the proband (A1) compared to 5 control individuals, including the proband’s mother (A2) and grandmother (A3) using qPCR. I therefore eliminated DARS as a candidate for follow-up.

In family C, a single nucleotide heterozygous deletion in activating transcription factor 6

(ATF6), was found in both the father (C2) and proband (C1). Both C1 and C2 showed ~50% decreased expression of ATF6 compared to normal controls on the whole genome expression array.

111

The ATF6 gene has two transcripts, the longer protein coding transcript (7,496 bps in length) codes for the 670 amino acid protein, ATF6. ATF6 is involved in regulating cellular response to ER stress. The mutation (g.161753885delC; c.353delC; p.Leu118fs) (RefSeq

NM_007348) is located within exon 4 of ATF6 and is predicted to cause both a frameshift and disrupt a splice site (Figure 3.6, A). Homozygous and compound heterozygous mutations in

ATF6 have been reported in individuals with achromatopsia (Ansar et al., 2015; Kohl et al.,

2015).

The ATF6 mutation was confirmed using Sanger Sequencing for both the proband (C1) and father (C2); while the non-carrier mother (C3) did not have the variant (Figure 3.6, B). I then confirmed the observed gene expression change (decrease) for both the proband and father using qPCR on RNA isolated from LBCs established from the proband, his parents, and 2 additional control individuals. Both the proband and his father had reduced expression of ATF6 compared to normal controls while the mother showed no change in expression compared to normal controls (Figure 3.6, C). Based on the reduced expression (~50%) of ATF6 in both the proband and his father, mRNA produced from the mutated ATF6 may be degraded by nonsense mediated decay. Western blotting using two commercial antibodies to determine if protein expression of

ATF6 was reduced in the proband and father was uninformative because multiple bands, all close to the reported molecular weight of ATF6, were detected. My collaborator, Dr. Volchuk, was able to detect reduced protein levels of ATF6 both in C1 and C2 using a different ATF6 antibody.

112

Figure 3.6 Variant Detection, Confirmation and Gene Expression Study for a Prioritized Variant in Family C (A) A heterozygous deletion in ATF6 (activating transcription factor 6) at 1q23.3 detected by exome sequencing is shown in the Golden Helix Genome browser. The variant is predicted to disrupt a splice site located within exon 4 of ATF6 and cause a frameshift in the ATF6 protein beginning at the Proline (P) at position 118 of 670 amino acids. (B) The variant was confirmed by Sanger sequencing to be present in both the proband and his father. (C) Gene expression of ATF6 in LBCs showed reduced expression (~50%) for both the proband and his father when compared to unrelated normal control individuals for two sets of probes (blue and red bars) from the ATF6 gene. The mother who does not have the variant had similar expression levels as the controls.

3.5.5 ATF6 Functional Analysis

Because ATF6 mRNA and protein levels showed reduced expression in C1 and C2, I looked for evidence of abnormal ER stress response using two approaches: i) determining the number of UPR dysregulated genes based on previously obtained whole genome expression data and ii) inducing ER stress response with two chemicals thapsigargin and tunicamycin to see if it is altered.

113

3.5.5.1 Expression of ATF6 Downstream Genes

Of the 107 genes from the UPR pathway with available expression ratios (WGE array)

(Supplementary Table 3.1, Appendix B), approximately 24% showed changes in gene expression

(both increased and decreased) in individuals with the ATF6 mutation (C1 and C2) in comparison to 1.9-5.6% in controls (Table 3.3). The majority of dysregulated genes had reduced expression in C1 (20/26) and C2 (23/26) (Table 3.3, Figure 3.7, Supplementary Table 3.1,

Appendix B). For ~67% of dysregulated genes (16/26 genes) the expression change was in the same direction in both the proband (C1) and his father (C2). Within this subset of genes, the majority (15/16) were expressed at lower levels (~50%) than in normal controls (Figure 3.7).

Surprisingly, the proband from the 1q21.1 deletion family (A1) who did not have a mutation in ATF6, had a comparable proportion of dysregulated ATF6α downstream genes

(~28%) to C1 and C2 (Table 3.3; Supplementary Table 3.1, Appendix B). In addition, the majority of dysregulated genes in A1 that overlapped those shared by C1 and C2 (12/15), had lower expression than normal controls (Figure 3.7). Dysregulated gene expression in genes from the UPR pathway was also noted for A1’s mother A2 (~16%) and grandmother A3 (~28%), the latter of which had predominantly overexpressed ATF6 downstream genes (Table 3.3).

114

Shared Control Control Shared Control Subjects A1 A2 A3 C1 C2 A1, C1, Female Female C1 & C2 Male C2 1 2 Genes with expression ratios <0.72 20 9 6 20 23 15 12 0 0 2 Genes with expression ratios>1.28 10 8 22 6 3 1 1 2 6 4 % underexpressed 18.7% 8.4% 5.6% 18.7% 21.5% 14.0% 11.2% 0.0% 0.0% 1.9% % overexpressed 9.3% 7.5% 20.6% 5.6% 2.8% 0.9% 0.9% 1.9% 5.6% 3.7% Total % dysregulated 28.0% 15.9% 26.2% 24.3% 24.3% 15.0% 12.1% 1.9% 5.6% 5.6%

Table 3.3 Summary of UPR Genes Over and Under-Expressed in Subjects with 1q21.1 CNVs Number and percentage of genes with dysregulated expression (<0.72, >1.28) from the list of 107 UPR genes are reported for 1q21.1 CNV carriers (A1-A3, C1, C2) and controls.

Figure 3.7 Genes from the UPR Pathway with Expression Changes on the Whole Genome Expression Array Expression ratios for 16 dysregulated genes from the UPR shared by ATF6 mutation carriers C1 and C2 are shown in comparison to Family A (no ATF6 mutation) and 3 normal controls. Genes are expressed at lower levels (~50%) in the proband and father with the 1q21.1 duplication (circled in blue) and ATF6 mutation (*) in comparison to normal controls. The same genes are also expressed at lower levels (~50%) in the proband with a 1q21.1 deletion (circled in red) who does not have an ATF6 mutation.

115

3.5.5.2 ER Stress Response in LBCs of ATF6 Mutation Carriers

ATF6 -/- mouse embryonic fibroblasts were reported to have a defect in their ability to induce ER chaperones and ERAD components when challenged by ER stress causing agents, thapsigargin and tunicamycin (Yamamoto et al., 2007). I used the same ER stress causing agents to see if the induction of chaperone gene expression would be altered in cell lines from our cases with a 1q21.1 CNV (duplication) and ATF6 mutation (proband, C1, and his father, C2) compared to non-carriers (mother, C3, and additional control male) during ER stress. Three genes, GRP94

(HSP90B1), BiP (HSPA5, GRP78) and XBP1 were tested. No consistent difference in chaperone gene induction were noted between the ATF6 mutation carriers and controls in this experiment

(Table 3.4, Figure 3.8).

Subject GPR78 HSP90B1 XBP1 Treatment C1 2.05 1.54 2.03 Tm C2 2.62 1.50 1.23 Tm C3 1.57 1.26 1.15 Tm Control Male 3.03 1.56 1.59 Tm C1 3.12 2.40 3.89 Th C2 4.10 1.81 1.07 Th C3 1.92 1.30 1.67 Th Control Male 3.16 1.92 2.02 Th

Table 3.4 ER Stress Induced Response Target Genes in ATF6 -/+ LBCs Gene expression in cells treated with tunicamycin (Tm) or thapsigargin (Th) were compared to matched non-treated cells. The ER stress induced fold changes of target genes (GRP78, HSP90B1, and XBP1) are shown in the table for cell lines from individuals with (C1 & C2) and without (C3 and Control Male) a heterozygous mutation in ATF6.

116

Figure 3.8 Fold Change of Target Genes During ER Stress Response Fold changes of target genes (GRP78, HSP90B1, and XBP1) are shown after treatment with thapsigargin and tunicamycin in individuals with (C1 & C2) and without (C3 and Control Male) a heterozygous mutation in ATF6.

3.5.5.3 Search for Variants in ER Stress Response Genes with Relaxed Filters in Family

A

Since genes downstream of ATF6 showed decreased expression in the proband (A1) without a mutation in ATF6, I looked specifically for mutations in genes involved in the ER stress response pathway upstream of ATF6 and in other arms of the UPR pathway (i.e. IRE1 &

PERK). No mutations in any UPR genes were found, even when the stringency of my filtering criteria was reduced.

3.6 Discussion

This chapter focuses on individuals with familial 1q21.1 CNVs (6 deletions and 2 duplications from 3 families: A, B, and C). The size and position of 1q21.1 CNVs detected in my study are consistent with previous reports (Brunetti-Pierri et al., 2008; Mefford et al., 2008) and

117

little variation was detected between family members with the same CNV with the exception of family B. In family B, one family member, B3, had a simultaneous duplication overlapping the

TAR region (distal) in addition to his 1q21.1 deletion. While this was a more complex genomic rearrangement than in family members, B1 and B2, it did not result in a more severe phenotype in this individual. It is therefore unlikely that differences in gene content are the cause of phenotypic variability in these 3 families. In addition, I ruled out the presence of secondary pathogenic CNVs as a cause of phenotypic variability in the families I studied.

I found that ~50% of genes integral to the 1q21.1 CNV showed changes in gene expression. Five genes (CHD1L, PRKAB2, ACP6, BCL9, and GPR89A) had expression changes consistent with copy number and no genes outside the boundaries of the CNV showed expression changes that could be explained by the 1q21.1 copy number. Two genes, CHD1L and PRKAB2, found within the top 10 genes of my copy number/expression correlation analysis, were considered good candidate genes for further study.

Both CHD1L and AMPKβ2 (the protein product of PRKAB2) showed altered protein expression in cell lines established from 1q21.1 carriers. Further functional studies showed abnormal chromatid behavior (e.g. delayed untangling of chromatids during the cell cycle and increased number of micronuclei caused by genomic damage) indicating a dysfunction of

CHD1L as a chromatin remodeler in cell lines from 1q21.1 CNV carriers (a deletion and duplication). In addition, sub-optimal activation of the AMPK complex in 1q21.1 CNV cell lines with abnormal PRKAB2 copy number and AMPKβ2 protein expression was noted. My publication was the first report on the consequences of copy number changes of two dosage sensitive genes from the 1q21.1 CNV on their function. Both genes are expressed in multiple tissues, including brain (Chen et al., 2009; Dasgupta and Milbrandt, 2009) which may explain

118

the multi-systemic nature of the physical abnormalities, and the frequent involvement of learning difficulty albeit at a very variable levels in subjects with the CNV.

Considering that CHD1L and AMPKβ2 both play a role in sensing stress in the environment, genomic damage and energy balance respectively, it is possible that the phenotypic variability seen in individuals with 1q21.1 CNVs may be the consequence of different environmental stress. In this model, resulting phenotype(s) could be more severe in individuals with 1q21.1 CNVs who develop in less favorable environments, due, for example, to prenatal and postnatal exposures to genotoxins or suboptimal diets. If these, or other environmental insults, do not occur in 1q21.1 CNV carriers, dysfunction of CHD1L and

AMPKβ2 will be less obvious and the 1q21.1 phenotype will be milder or not expressed

(Harvard et al., 2011).

The importance of CHD1L and PRKAB2 in causing phenotypic consequences for 1q21.1

CNV carriers is supported by other studies. For example, human LBCs and brain tissue have recently been used to confirm that CHD1L and PRKAB2 mRNA levels correlate with 1q21.1

CNV copy number (Luo et al., 2012; Mehta et al., 2014; Ye et al., 2012). In addition, the role of

CHD1L in an abnormal phenotype is supported by a small deletion in an individual with NDD that contains only CHD1L (Girirajan et al., 2013).

In addition to genes affected by copy number, I also looked for sequence variants that could contribute to more severe phenotypic consequences (e.g. level of NDD and ID) in individuals with 1q21.1 CNVs. No pathogenic sequence changes were detected in the 1q21.1 deletion region in either proband, excluding the possibility that a more severe phenotype is due to two hits in the coding segments of genes from 1q21.1 CNV region. However, a mutation found outside the 1q21.1 CNV region in family C could provide an alternative or complementary

119

cause of phenotypic differences seen in this family. The pathogenic variant, a frameshift mutation in ATF6, was found in two individuals, C1 and C2. The mutation resulted in reduction of mRNA (ATF6) and protein (ATF6α) expression. Interestingly, ATF6 is also associated with cellular response to stress, specifically ER stress.

The possibility of ER stress response involvement in the phenotypic abnormalities in our subjects was intriguing considering that many pathological and physiological stimuli, such as ischemia, hypoxia, metabolic factors (glucose starvation, fatty acid levels) and poisons, including neurotoxins (e.g. ethanol) interfere with normal protein glycosylation and protein folding

(Alimov et al., 2013; Minamino et al., 2010). If activation of the UPR is severe or prolonged enough, the final cellular outcome is cell death. I was therefore interested to see if the ATF6 mutation would result in an altered UPR. I used two chemicals, tunicamycin (inhibits glycolization) and thapsigargin (lowers ER calcium levels), to initiate the UPR response (e.g. increased expression of ER stress response genes) in LBCs from C1 and C2 but found no difference in the ER stress response in comparison to two controls.

Although my results suggests a functioning ER stress response in C1 and C2, there are several reasons why my results cannot rule out a functional impact of an ATF6 mutation on the

ER stress response during development. For instance, I cannot exclude the possibility that EBV transformation affects the ER stress response in LBCs from C1 and C2 as others have shown that

EBV can activate ATF6 (Lee and Sugden, 2008). Similarly, my results may not represent the full cellular impact of an ATF6 mutation because they are from a single time point and concentration of chemical treatment. It is possible that the ATF6 mutation causes a cellular defect similar to that seen in ATF6 knock-out cells which were able to respond quickly to short term stress but had a prolonged UPR response compared to wild-type cells making them less able to recover

120

from long term or repeated stress (Wu et al., 2007). Lastly, the UPR has been proposed to be developmentally regulated and the less mature brain more susceptible to ER stress (Wang et al.,

2015), neither of which can be tested using LBCs.

ATF6 mutations have rarely been reported in humans. A mutation in the promoter of

ATF6 was associated with psychiatric disorders (Kazeminasab et al., 2012) and homozygous and compound heterozygous mutations in ATF6 were reported in individuals with achromatopsia

(Ansar et al., 2015; Kohl et al., 2015). Studies of Atf6 -/- mice show that they initially have normal retinal function but develop rod and cone dysfunction as they age (Kohl et al., 2015).

And, although developmental or brain abnormalities in mouse knock-out models of Atf6 were not reported (Wu et al., 2007; Yamamoto et al., 2007), increased neuronal cell death upon insult

(ischemia) suggests that an ATF6 defect represents a predisposition for an impaired cellular response to stress (Yoshikawa et al., 2015). To further pursue the role of ATF6 mutation in C1 and C2, non-transformed patient cells (e.g. skin fibroblasts) would be needed as their response to treatment could provide support or refute my findings in EBV transformed LBCs from individuals with the ATF6 mutation.

The presence of a mutation in ATF6 prompted me to look at the expression of other genes from the stress response pathway in my WGE data. LBCs from C1 and C2 had increased numbers of downregulated ER stress response genes. Surprisingly, I found a similar number of the same downregulated genes in A1, an individual with a 1q21.1 deletion who did not have the

ATF6 mutation. In addition, A2 and A4, both carriers of a 1q21.1 deletion with no ATF6 mutation, also had increased number of dysregulated ER stress response genes, although fewer genes were shared with C1 and C2, and A4 had a higher percentage of upregulated genes.

Unfortunately no cell lines were available for individuals from family A for follow up. Together

121

these data suggest that some aspect of the ER stress response pathway may be defective, although my search for additional variants in ER stress response genes in my WES data using a relaxed filtering criteria did not uncover any additional genetic defects.

Genetic differences, such as changes in non-coding regions or the presence of functional

SNPs, could also play a role in the dysregulation of ER stress response gene expression seen in

1q21.1 CNV carriers. For example, polymorphisms in ER stress response genes have previously been reported to both impair and protect cells from ER stress: a polymorphism in XBP1 was associated with its reduced expression and an impaired ER stress response in both Schizophrenia and Bipolar Disorder (Kakiuchi et al., 2004; Kakiuchi et al., 2003) while a SNP in GRP94, a stress response gene, was found to be protective against stress and was associated with increased expression of this gene in cells treated with thapsigargin (Hayashi et al., 2009).

Overall, my studies suggest that pathways related to stress response (genomic, metabolic and ER) could be impaired in carriers of 1q21.1 CNV, either due to the CNV and dosage changes of integral genes (CHD1L and PRKAB2) or by a yet unknown cause of dysregulated expression of ER stress response genes. Cellular stress, specifically oxidative stress, was recently proposed to contribute to neuropsychiatric disorders as genes from CNVs detected in brains of affected subjects were enriched in oxidative stress pathways in comparison to controls (Mehta et al., 2014). It is also possible that interactions between genes contribute to variable phenotypes.

For example, PRKAB2, a copy number sensitive gene integral to the 1q21.1 CNV, encodes a component of the AMPK enzyme (AMPKβ2) which regulates cell’s energy balance, metabolism and is also a physiological regulator of ER stress (Dasgupta et al., 2012; Dong et al., 2010). The timing of stress insult could also contribute to the differences in phenotypes as early stages of development tend to be more impacted than later stages (Wang et al., 2015). Therefore, multiple

122

affected pathways, combined with different environmental factors acting at different stages of development could result in a variable NDD phenotype.

3.7 Conclusion

My work on the 1q21.1 CNV implicated multiple forms of cellular stress in the phenotype variability of 1q21.1 CNV carriers. The role of cellular stress, in the form of metabolic, genomic and ER stress are strongly supported by the abnormal downstream function of CHD1L and AMPKβ2 and the abnormal expression of a number of ER stress response genes in individuals with 1q21.1 CNVs.

123

Chapter 4: Imprinting Potential of CNVs and their Integral Genes

This chapter further explores the causes of phenotypic variability associated with familial

CNVs. It focuses on epigenetic changes as a possible explanation for how inherited CNVs cause phenotypic consequences. This can occur in two ways: 1) the CNV contains an imprinted gene, and 2) the CNV affects epigenetic marks of nearby regions for example if a boundary element is deleted. Imprinted genes are differentially expressed from the maternal or paternal chromosome and altered expression of the imprinted gene can occur if a CNV transmitted to a child affects the allele on the expressing parental chromosome.

In this chapter I used two approaches to study epigenetic (imprinting) characteristics of

CNVs:

i) I used a FISH based assessment of replication timing (FISH-RT) as a marker of chromatin condensation, because late replicating regions tend to be more condensed (generally associated with gene inactivity and reduced expression) and early replicating regions are associated with euchromatin (generally associated with gene activity and increased expression).

This method has been used to show that imprinted chromosomal regions have asynchronous replication timing (Greally et al., 1998; Kitsberg et al., 1993; Knoll et al., 1994). I used FISH-RT to determine the replication timing for a selection of familial and control CNVs.

ii) I compared a larger number of CNVs identified in our ID cohort with putative imprinted DMRs detected in Dr. Robinson’s laboratory (Hanna et al., 2015). DMRs are found in imprint control regions and may indicate the presence of an imprinted gene. Chromosomal regions of overlap between CNVs and DMRs were used to identify CNVs containing genes that are possibly imprinted.

124

4.1 Replication Timing as a Marker of Imprinting

4.1.1 Background

DNA replication, the process in which a DNA molecule is copied, results in two identical daughter strands (Figure 4.1). It is initiated at origin sites that coordinate replication in a predictable temporal order creating replication domains that range in size from a few hundred kilobases to several megabases (Rhind and Gilbert, 2013; Ryba et al., 2010). DNA replication timing can vary between replication domains with some domains being replicated earlier or later in S-phase than others (Selig et al., 1992). Regardless of when they are replicated, homologous chromosomal segments are usually replicated at the same time (synchronously) (Kitsberg et al.,

1993). However, for a subset of genomic areas (e.g. those that are imprinted (Kitsberg et al.,

1993; Knoll et al., 1994; Simon et al., 1999) or are located on chromosome X (Avner and Heard,

2001; Koren et al., 2014), homologous chromosomal segments are replicated at different times

(asynchronous) during S phase.

Sister chromatids M

G2 Chromosome = 2n Late S DNA=4C (paired double helices) G1 S

Chromosome = 2n Early S DNA=2C (single double helix) humans (n=23)

Figure 4.1 Replication of DNA During the S Phase of the Cell Cycle Approximately 75% of genes in euchromatic regions are replicated in the first half of S phase while heterochromatic regions of the genome are late replicating (Rhind & Gilbert 2015).

125

Asynchronous replication timing occurs for monoallelically expressed genes and in most instances active (expressing) alleles replicate earlier than inactive alleles (Ensminger and Chess,

2004; Gilbert, 2002). Monallelically expressed genes may occur due to 1) X-chromosome inactivation 2) generation of cellular diversity (as for random silencing in olfactory gene clusters) 3) widespread random monoallelic silencing of unknown function and, 4) (Avner and Heard, 2001; Ensminger and Chess, 2004; Gimelbrant et al., 2007;

Gimelbrant and Chess, 2006; Kagotani et al., 2002; Koren et al., 2014; Weaver et al., 2009).

Replication timing patterns (synchronous or asynchronous) were proposed to be related to epigenetic marks maintained in a cell’s progeny (Ensminger and Chess, 2004). This idea is supported by experiments that showed significant conservation and stability of replication timing domain boundaries in both mouse and human embryonic stem cell lines (Hiratani et al., 2010;

Hiratani et al., 2008; Koren et al., 2014; Ryba et al., 2010). Measuring replication timing synchrony between two alleles thus became a tool to gain insight into the epigenetic characteristic of the region (e.g. chromatin conformation or differential gene expression).

4.1.2 Assessment of Replication Timing Using FISH

FISH can be used to detect the replication timing properties of small chromosomal regions using fluorescently labeled probes and analysis of probe hybridization patterns

(Ensminger and Chess, 2004; Kitsberg et al., 1993; Selig et al., 1992). In a normal diploid cell; unreplicated genomic segments are visible as single hybridization signals while replicated segments are visible as double hybridization signals. Alleles that replicate synchronously will be visible as either two replicated signals (doublet-doublet, DD) or two unreplicated signals

(singlet-singlet, SS) depending on if they replicate early or late respectively. Alleles that

126

replicate asynchronously will have one unreplicated signal and one replicated signal (singlet- doublet, SD). See Section 4.4.5 for additional details and illustration.

In FISH experiments, ~10% of SD signals are due to incomplete probe hybridization inefficiencies (i.e. some of the SD signals will be artifacts due to poor probe hybridization)

(Selig et al., 1992). Loci that replicate synchronously are reported to show SD patterns in 10-

20% of evaluated nuclei (Rajcan-Separovic et al., 1998; Selig et al., 1992) and those that replicate asynchronously show SD patterns in >20% of evaluated nuclei (Amiel et al., 1998;

Kagotani et al., 2002; Rajcan-Separovic et al., 1998; Yeshaya et al., 2009). While labeling with

BrDU to identify cells in S phase has been used to assess replication timing in actively replicating cells, replication timing patterns in non-labeled cultures have similar percentages of hybridization signals for synchronously and asynchronously replicating probes (Nagler et al.,

2010; Yeshaya et al., 2009).

Asynchronously replicating genomic regions are reported to have significantly higher percentages of SD hybridization signals than synchronously replicating genomic regions. In non- synchronized cell cultures where the frequency of cells in S-phase is ~25% the number of SD hybridization signals reported for asynchronously replicating loci is typically in the range of 25-

35% (Ensminger and Chess, 2004; Kagotani et al., 2002; Kitsberg et al., 1993; Rajcan-Separovic et al., 1998). For example, the imprinted gene SNRPN, which is located within known asynchronously replicating region, is reported to have 38% SD hybridization signals (Kitsberg et al., 1993).

FISH based replication timing assays have been used to identify loci that undergo X- inactivation. Inactivated loci have a high degree of replication asynchrony while loci that are not inactivated show synchronous replication timing patterns (Boggs and Chinault, 1994).

127

Additionally, an imprinted region on chromosome 15q11.2 is reported to have an asynchronous replication pattern with parent-specific replication observed for the early (expressing) and late replicating alleles (Kitsberg et al., 1993; Knoll et al., 1994). Another example is the use of FISH-

RT to study replication timing properties of regions with triplet repeat expansions (e.g. Myotonic dystrophy (Rajcan-Separovic et al., 1998) and fragile X (Yeshaya et al., 1998). Late replication timing for the chromosome with the triplet expansion were reported and the authors speculated that changes in replication timing are caused by a more condensed chromatin structure (Rajcan-

Separovic et al., 1998; Yeshaya et al., 1998).

4.1.3 Replication Timing and Genomic Changes

More recently, FISH-RT was used to detect changes in replication timing for cases with microdeletions/microduplications. Yeshaya et al. (Yeshaya et al., 2009) showed that a known imprinted region on 15q11.2 lost its asynchronous replication timing in individuals with known microdeletions of 22q11.2 or 7q11.23. The reason for this is unknown but was speculated to be caused by altered chromatin folding in the genome and/or in the imprinted region due to the loss or gain of chromosomal material in the CNV distant to the imprinted region (Yeshaya et al.,

2009).

In addition to the indirect effect proposed above, CNVs could also have a more direct effect by 1) deleting the expressed allele of an imprinted gene (Figure 4.2), or 2) occurring adjacent to and/or deleting a boundary element causing the spread of heterochromatization.

These possibilities would provide an explanation for differences in phenotypes in carriers of familial CNVs. Despite frequent speculations that CNVs can impact novel imprinted regions, this has not been studied in the past.

128

Figure 4.2 Consequences of a CNV Affecting an Imprinted Gene, Expressed from the Maternal Chromosome

4.2 Differentially Methylated Regions as Markers of Imprinting

Imprinted genes are typically associated with parent of origin dependent DMRs (Lawson et al., 2013). Screening for parent of origin dependent DMRs in digynic and diandric triploid placenta using the Illumina whole genome methylation arrays resulted in the detection of putatively imprinted DMRs (Hanna et al., 2015; Yuen et al., 2011).

4.3 Chapter Goals

The overall goal of this chapter is to assess the epigenetic characteristics of CNVs using two approaches:

129

i) FISH-RT as a marker of imprinting to determine A) the replication timing for

CNV regions detected in individuals with ID using control cells to determine if the region replicates asynchronously and, B) the replication timing status for genomic regions with known replication timing in cells from affected individuals (ID) who carry distant CNVs (Figure 4.3) to determine if distant CNVs affect the RT of known imprinted regions.

ii) An overlap analysis of CNVs detected in a cohort of ID patients with DMRs identified in Dr. Robinson’s laboratory.

I hypothesized that some novel familial CNVs or de novo CNVs will show asynchronous

RT suggesting that they possibly contain imprinted genes that cause ID (direct impact) or that they will affect the imprinting status of known imprinted regions (indirect impact). I also expect that some of the CNVs identified in our ID cohort will overlap with DMRs, pointing to possible imprinted genes in CNVs.

130

Figure 4.3 Proposed Uses of Replication Timing to Assess Epigenetic Characteristics of CNVs In A) asynchronous replication timing of a probe from a selected CNVs (e.g. 1q21.1 and 17q25) in control cells would suggest that the CNV may contain imprinted genes while in B) changes in replication timing status for genomic regions with known replication timing (synchronous or asynchronous) in patient cells would suggest global replication timing changes are caused by the presence of a CNV (e.g. 1q21.1, Xq21 and 4p14).

131

4.4 Materials and Methods

4.4.1 Lymphoblast Cultures

Lymphoblasts were transformed and frozen during an early passage. Before I began this experiment, LBCs were thawed and maintained as previously described in Chapter 2 (section

2.3.2). Where possible, cell lines were grown and harvested together for consistency. Cells were counted using a hemocytometer, seeded at approximately equal density, and allowed to grow for

1-2 days prior to treatment with BrDU.

4.4.2 BrDU Labeling and Detection

When included in the protocol, cells in log phase were pulse labeled with 5-bromo-2- deoxyuridine (BrDU) for 2 hours at 37⁰C prior to harvest. BrDU is a structural analog of thymidine and when added to cellular substrate, can be incorporated into DNA during the synthesis-phase of the cell cycle. A BrDU mouse primary antibody (B35128) and an Alexa Fluor

350 goat anti-mouse secondary antibody (A21049) from Invitrogen were used for subsequent identification of proliferating cells, i.e. cells in S phase (Figure 4.4).

132

BrDU Non BrDU

Figure 4.4 Identification of Proliferating Cells Using BrDU The Alexa fluor 350 (blue), conjugated to a secondary antibody, is used to detect proliferating cells in culture. FISH probes (red and green) in proliferating interphase nuclei (blue) are easily distinguishable from non-labeled interphase nuclei with only red and green signals (labeled, arrows).

4.4.3 Cell Harvest and Fixation

Growing cells were moved to a 15 mL falcon tube and spun (1.8 RPM) for 5 minutes to collect cells. If BrDU pulse was included in the protocol, BrDU labeling media was removed and the pellet was re-suspended in 10 mL of pre-warmed PBS to wash cells. A second cell collection centrifugation step was done for these cells. Cell pellets were then re-suspended in fresh media containing colcemid (9 mL media and 0.5 mL colcemid) and were incubated at 37°C for 16 minutes. Cells were re-collected by centrifugation (1.8 RPM, 5 minutes), media was removed to

1 mL, and cells were re-suspended in the remaining volume. A pre-warmed hypotonic solution

(0.6M KCl, 37°C) was then added and the cells were incubated at 37°C for 14 minutes. A final cell collection was done by centrifugation (1.8 RPM, 5 minutes) and the solution removed to 0.5-

1.0 mL. The cell pellet was then fully re-suspended in the remaining volume and fresh fix (3:1 methanol:acetic acid) was carefully added while vortexing to a final volume of 9 mLs.

133

4.4.4 FISH

Slides for FISH experiments were prepared according to standard protocols using cultured lymphoblast cells fixed in 3:1 methanol/acetic acid. Prior to hybridization, slides were permeabilized using a solution of 1 x PBS and 0.1% Triton after which the slide was then immersed in a solution of 2 x SSC for 20 minutes and then taken through a series of ethanol washes (70%, 85%, and 100%). Slides were denatured separate from probes in standard solution

(70% formamide / 2 x SSC) at 72⁰C for 6-7 minutes and were then dehydrated using a series of ethanol washes. Slides were kept in 100% ethanol and dried immediately prior to probe application. Probes were prepared according to the manufacture’s recommendations, denatured at 75⁰C for 5 minutes, and were immediately applied to slides. A coverslip was then placed on the slide and sealed with rubber cement prior to overnight hybridization at 37⁰C in a humid chamber.

After hybridization, post-hybridization washes and antibody application steps were performed according to standard protocols and/or manufacture’s recommendations. After the final washes, slides were allowed to dry fully before application of ProLong Gold antifade reagent (slides with BrDU) or counterstaining of DNA using DAPI (slides without BrDU) in an antifade solution. Coverslips were sealed with nail polish and slides allowed to rest for 15 minutes prior to viewing. Slides were stored in the dark at 4⁰C until viewed.

4.4.5 Detection of Replication Timing

In a normal cell population the majority of cells are in G1 and have not gone through replication. Cells that have gone through the S phase in its entirety will have replicated DNA.

However, during the synthesis phase, DNA replicates in different replication timing domains that can be detected by FISH; unreplicated genomic segments are visible as single hybridization

134

signals while replicated segments are visible as double hybridization signals (Figure 4.5). Alleles that replicate synchronously will be visible as either two replicated signals (doublet-doublet, DD) or two unreplicated signals (singlet-singlet, SS) depending on if they replicate early or late respectively. Alleles that replicate asynchronously will have one unreplicated signal and one replicated signal (singlet-doublet, SD).

S S D D D S

Figure 4.5 FISH Probe Hybridization Signals The image above contains examples of interphase cells with two single hybridization signals (SS; left panel), two double hybridization signals (DD; middle panel), and cells with one single and one double hybridization signal (SD; left panel). Probes labeled with the fluorophores used in my experiments, spectrum green (SG; top) and spectrum orange (SO; bottom), are shown.

4.4.6 Scoring FISH Signals

Slides were scored blindly (in most cases) by two independent counters (Chansonette

Badduke and Dr. Evica Rajcan-Separovic). Nuclei were scored only if cells were intact, did not overlap, and at least two hybridization signals were visible. Signals were counted as doublets only if the width between signals was equal to the size of the signal. Cells with a 1:2 pattern (SD)

135

were scored as asynchronous, while those with 1:1 and 2:2 patterns (SS) were scored as synchronous. In addition, cells in which multiple signals were seen were included in counts and were scored based on the number of signals seen; even (SS) suggesting synchronous replication versus uneven (SD), suggesting asynchronous replication. This allowed for the scoring of all patterns and accounts for some of the variability seen between FISH experiments (i.e. background, differences in depth of field, hybridization efficiency, as well as for differences in condensation of DNA and distance between chromatids).

4.4.7 CNVs and Control Regions Selected for Replication Timing Experiments

As shown in Figure 4.3, I wanted to determine if

A) CNVs detected in individuals with ID have imprinting properties (i.e. asynchronous replication) in control cells and,

B) Whether control genomic regions with known asynchronous replication timing patterns (e.g. SNRPN) change their replication timing in patient cells with CNVs.

For aim A) I selected CNVs based on the following criteria:

i) The CNV overlaps with regions containing predicted or known imprinted genes

(17q25.3) and,

ii) The CNV is familial in origin (1q21.1).

i) Predicted and known imprinted genes were downloaded from two websites,

GeneImprint (http://www.geneimprint.org/) and the Catalogue of Parent of Origin Effects

(http://igc.otago.ac.nz/home.html). Familial and de novo CNVs reported in our cohort of 255 individuals with ID (Qiao et al., 2010) were combined with the gene lists to find overlapping regions and/or regions in close proximity to each other (Supplementary Table 4.1).

136

The 17q25.3 CNV was selected because of its proximity to a predicted imprinted region at 17q25.3 (GeneImprint, chr17: 76,690,599-76,712,249) and because multiple individuals (08-

38, 09-60, and 10-03) from our study had reported CNVs in the region: 08-38 was reported to have a 494.3 kb de novo duplication at 17q25.3 (chr17:77,660,313-78,154,619, hg18), 09-60 was reported to have a 281 kb deletion of unknown origin at 17q25.3 (chr17:77,282,655-77,564,320), and 10-03 was reported to have a 597 kb paternal duplication at 17q25.3 (77,503,473-

78,100,423).

ii) A familial 1q21.1 CNV was selected because both duplications and deletions are found in a number of families in our lab and are associated with variable phenotypic consequences (from mild/normal to moderate/severe, see Chapter 3). This CNV also contains a gene (CHD1L) involved in chromatin remodeling, which could affect replication timing in carriers.

For aim B) I used probes in control regions reported to have synchronous (RB1, PAX6) and asynchronous (SNRPN) replication timing and assessed their replication timing patterns in cells from affected individuals with CNVs distant to the regions above (i.e. mapping to 1q21.1,

Xq23 and 4p14 respectively). The 1q21.1, Xq23 and 4p14 CNVs all contain genes with possible impact on replication timing due to their functions in chromatin remodeling (CHD1L in 1q21.1 and ATRX in Xq23) or replication (RFC1 in 4p14). I therefore expected that the presence of the

CNV could affect the replication timing properties genome wide, specifically for the control regions tested (RB1, PAX6 and SNRPN).

Multiple BACs at different chromosomal positions were selected for the 1q21.1 CNV (3 locations) and 17q25.3 CNV (2 locations). BACs selected for the 15q11.2 region overlapped a region 130 kb distal to SNRPN and included the IPW gene. Commercial probes for the 15q11.2

137

region overlap SNRPN. Labeled BAC probes were ordered from the Centre for Applied

Genomics (TCAG), Toronto, ON or from a commercial company. A list of probes (along with probe size and chromosomal position) ordered for both aims is available in Supplementary Table

4.2, Appendix C. The position of commercial and non-commercial probes overlapping the

15q11.2 imprinted region are shown in Supplementary Figure 4.2, Appendix C.

In summary, I performed FISH-RT using probes for two patient CNVs (1q21.1 and

17q25.3) in control cells and used control probes known to be asynchronously (SNRPN) or synchronously (RB1 and PAX6) replicating (Amiel et al., 1998; Kitsberg et al., 1993) in control and in patient cells with CNVs from 1q21.1, Xq23 and 4p14 . The use of control probes with known replication timing in control cells was necessary to make sure I was able to differentiate between asynchronous and synchronous replication using FISH-RT.

4.4.8 Overlap of de novo and Familial CNVs with Differentially Methylated Regions

An analysis of CNVs found in individuals with ID and their overlap with differentially methylated regions (DMRs) was performed. A total of 24 de novo and 46 familial CNVs from

213 cases with idiopathic ID reported by Qiao et al. (Qiao et al., 2013) were used for the overlap analysis. The reported CNV breakpoints (hg18) were converted to hg19 using the UCSC liftover tool. A list of DMRs identified by comparing diandric and digynic triploid samples was provided by Dr. Courtney Hanna (Dr. Wendy Robinson’s Lab) (Hanna et al., 2015). A total of 82 high confidence DMRs (hg19) from this analysis were used in the overlap analysis. Of these, 21 were known maternal DMRs, 6 were known paternal DMRs, and 55 were putative novel imprinted

DMRs.

138

4.5 Results

4.5.1 FISH Results for Selected BAC Probes in Control Individuals

The results of SS, DD and SD counts in control individuals for BACs from control regions and from selected CNVs are shown in Table 4.1 and Figure 4.6. Percentages of SD signals in control individuals for the BAC probes (RP11-171C8 and RP11-441B20) overlapping

SNRPN (asynchronous) are within the expected range of 25-35%, with an average SD signal pattern of ~30%. The genomic position of control probes for the known asynchronously replicating SNRPN region, both commercial and non-commercial, are shown in Supplementary

Figure 4.2, Appendix C.

However, the percentage of SD signals in control individuals for BACs (RP11-885H1 and RP11-996I3) overlapping genes reported to be synchronously replicating (PAX6 and RB1) are higher than the expected 10-20%, with an average SD of 29% and 27% respectively. It is therefore not possible to distinguish asynchronous replication timing from synchronous replication timing in this experiment nor is it possible to ascertain if replication timing is asynchronous for the BACs that overlap selected CNVs (17q25.3 and 1q21.1). In addition, variability of FISH signal patterns between controls was noted which further precluded me to make conclusions about the replication timing pattern of selected patient CNVs in controls.

139

Total Cells Sample/Probe Counted SS DD SD Color RP11-171C8 (15q11.2) SNRPN(d) Control 1 643 35.77% 36.86% 27.91% ± 4.19% SG Control 2 966 39.86% 24.43% 35.69% ± 7.09% SG Control 3 200 32.00% 36.50% 31.50% ± 0.71% SG Control 4 409 43.52% 28.61% 27.92% ± 4.06% SG AVE NORMALS 2218 38.64% 29.89% 31.98% ± 6.50% SG RP11-441B20 (15q11.2) SNRPN(d) Control 4 505 47.52% 22.57% 29.88% ± 7.62% SG RP11-885H1 (11p13) PAX6 Control 1 370 50.54% 26.22% 22.94% ± 8.07% SG Control 2 297 50.51% 13.47% 35.94% ± 12.84% SG Control 3 203 48.77% 22.66% 28.58% ± 2.99% SG AVE NORMALS 870 50.11% 21.03% 29.15% ± 6.52% SG RP11-996I3 (13) RB1 Control 1 730 45.34% 29.73% 26.75% ± 6.67% SG Control 2 758 41.69% 33.38% 24.87% ± 6.77% SG Control 3 202 45.05% 35.64% 19.30% ± 0.43% SG Control 4 530 17.55% 42.83% 40.33% ± 8.64% SG AVE NORMALS 2220 37.43% 34.64% 27.81% ± 8.92% SG RP11-433J22 (1q21.1) Control 1 87 22.99% 39.08% 37.93% ± NA SO Control 2 81 30.86% 25.93% 43.21% ± NA SO Control 3 202 36.63% 34.65% 28.73% ± 1.80% SO AVE NORMALS 370 32.16% 33.78% 36.62% ± 7.33% SO RP11-1059K1 (1q21.1) alternate for RP11-433J22 Control 1 220 33.64% 33.64% 32.89% ± 5.06% SG Control 1 141 41.84% 20.57% 36.70% ± 3.76% SO RP11-598A16 (17q25.3) Control 1 90 31.11% 27.78% 41.11% ± NA SO Control 2 112 18.75% 41.07% 40.18% ± NA SO AVE NORMALS 202 24.26% 35.15% 40.64% ± 0.66% SO RP11-1022H22 (17q25.3) alternate for RP11-598A16 Control 1 243 19.75% 44.44% 35.26% ± 10.77% SG Control 2 278 25.54% 40.29% 34.72% ± 5.27% SO

Table 4.1 Summary of FISH-RT Assay Counts for Selected BAC Probes in Control Individuals Results of FISH assay (SS, DD and SD percentages) for BACs overlapping known replication timing regions and selected CNVs: RP11-171C8 and RP11-441B20 overlap the SNRPN gene (asynchronous), RP11-885H1 and RP11- 996I3 overlap the PAX6 and RB1 genes (synchronous), RP11-433J22 and RP11-1059K1 overlap the 1q21.1 CNV, and RP11-598A16 and RP11-1022H22 overlap the 17q25.3 CNV. SG= spectrum green, SO = spectrum orange, SNRPN(d) = SNRPN distal.

140

Figure 4.6 Replication Timing Patterns for Selected BAC Probes in Control Individuals The percentages of cells with SS (blue), DD (red), and SD (green) are shown in a stacked bar plot for control individuals for selected BACs from control regions; A) SNRPN (asynchronous), B) PAX6 (synchronous), C) RB1 (synchronous) and from one queried CNV region D) 1q21.1.

4.5.2 Commercial FISH Probe Results for SNRPN (Synchronous) and 15qter

(Asynchronous) in Patient and Control Cells

I used a commercially available control probe (Cytocell) that contains a SNRPN imprinting center (SNRPN IC) probe (labeled with spectrum red) and a 15qter probe (labeled with spectrum green) to determine if the presence of a distant CNV causes RT changes.

Replication timing for SNRPN and 15qter was assessed by FISH-RT in cells from individuals with CNVs likely to cause replication timing defects (1q21.1, Xq21, and 4p14) and in control cells. My results (SS, DD, and SD counts) are shown in Table 4.2 and Figure 4.7.

Similar to FISH-RT experiments using selected BAC probes described in section 4.5.1, I could 141

not differentiate between synchronously and asynchronously replicating regions using FISH-RT in control cells as both the SNRPN probe (asynchronous RT) and the 15qter probe (synchronous

RT) had a similar FISH pattern distribution. My FISH counts also showed a large variability between individuals tested (e.g. SS ranged from 25% to 50% for the SNRPN probe in 3 controls). Overall, percentages of SD signals in controls for the SNRPN IC region

(asynchronous) are within the expected range of 25-35%, with an average SD signal pattern of

28% in three control individuals. However, the percentage of SD signals in normal controls for the 15qter region (synchronous) are higher than the expected 10-20%, with an average SD of

33% in three control individuals. Due to these findings it is not possible to determine if there are global changes in replication timing (asynchronous to synchronous) for a known imprinted gene

(SNRPN) in individuals who have CNVs (chromosomal positions 1q21.1, Xq21.1 and 4p14) that overlap genes (CHD1L, ATRX, and RFC1 respectively) which may have an effect on replication timing.

142

SNRPN (15qter probe) SNRPN (Imprinting Center probe)

Total Total Cells Cells Sample Counted SS DD SD Counted SS DD SD Nr-101M 276 25.00% 43.12% 31.69% ± 5.53% 284 50.35% 22.54% 27.70% ± 3.44%

Nr-102F 480 55.63% 13.33% 30.23% ± 3.99% 374 60.43% 17.65% 24.26% ± 8.79%

Nr-105M 550 24.36% 39.09% 36.23% ± 3.91% 506 50.79% 17.19% 32.40% ± 3.89%

AVE NORMALS 1306 35.99% 30.47% 32.71% ± 3.13% 1164 53.78% 18.64% 28.12% ± 2.97% 08-102 1q21.1 (CHD1L) Dup 31 16.13% 54.84% 29.03% ± NA 28 60.71% 17.86% 21.43% ± NA

09-66 Xp-q (ATRX) 164 19.51% 44.51% 36.53% ± 3.58% 156 52.56% 20.51% 27.29% ± 1.82%

09-103 1q21.1 (CHD1L) Del 200 26.00% 41.50% 32.50% ± 0.71% 200 62.50% 16.00% 21.50% ± 3.54%

10-108A 4p14 Dup (RFC1) 382 16.75% 47.64% 35.54% ± 5.78% 329 40.12% 28.27% 31.67% ± 2.16%

Table 4.2 Summary of FISH Assay for Commercial Control Probes on Chromosome 15 in Controls and Subjects with CNVs at 1q21.1, Xq21.1 and 4p14 Results of FISH assay (SS, DD and SD percentages) for known replication timing probes, 15qter (synchronous) and SNRPN (asynchronous), are shown for 3 normal controls and 4 individuals with CNVs overlapping genes that might cause replication timing defects.

143

Figure 4.7 Replication Timing Patterns for Control Probes on Chromosome 15 in Controls and Subjects with CNVs at 1q21.1, Xq21.1 and 4p14 The percentages of cells with SS (blue), DD (red), and SD (green) are shown in a stacked bar plot for 3 normal control individuals and 4 individuals with CNVs overlapping genes with possible replication timing defects for two control regions; A) 15qter and B) SNRPN imprinting center. An average of the normal controls is shown for comparison between the queried individuals (08-102 with a 1q21.1 duplication, 09-66 with a Xq21.1 deletion, 09- 103 with a 1q21.1 deletion, and 10-108A with a 4p14 deletion).

144

4.5.3 Overlap of de novo and Familial CNVs with Putatively Imprinted DMRs

My overlap analysis of 82 putatively imprinted DMRs with 24 de novo and 46 familial

CNVs from 213 individuals with idiopathic ID showed that four individuals have CNVs that overlap putatively imprinted DMRs. The CNVs, three pathogenic de novo CNVs and one putatively pathogenic familial CNV, overlap DMRs that have the maternal allele methylated.

The first individual (Lab ID 09-03) has 2 reported pathogenic de novo CNVs; a 5.462 Mb gain of 1p36 and a 3.328 Mb deletion of 5q35.3 (chr5:177,383,573-180,712,404; hg19), both confirmed by FISH (Qiao et al., 2010). The father of this individual has a balanced translocation of 1p36.3 and 5q35.3 so the imbalance is due to an inherited derivative (abnormal) chromosome from the paternal translocation. The overlapping putatively imprinted DMR (chr5:178,593,785-

178,594,990; 1,206 bp) is found within the 3.328 Mb 5q35.3 deletion and is located within intron

5 of the gene, ADAM metallopeptidase with thrombospondin type 1 motif, 2 (ADAMTS2), although the closest transcription start site (106,179 bp) is for the gene, zinc finger protein 354C

(ZNF354C) (Figure 4.6, A). ADAMTS2 and ZNF354C are both expressed in the brain (Colige et al., 1999; Gao et al., 2004). ZNF354C is proposed to repress transcription during development although little else is known about its function (Gao et al., 2004). Mutations in ADAMTS2 cause a recessively inherited connective tissue disorder and sexually biased expression levels of

ADAMTS2 (higher in males versus females) in brain suggests that the gene may be imprinted

(Faisal et al., 2014).

The second individual (Lab ID 07-27) has a single reported pathogenic de novo 2.52 Mb deletion at 10 p12.1-p11.23 (chr10:26674215-29194917; hg19) (Qiao et al., 2010; Qiao et al.,

2014). The overlapping putatively imprinted DMR (chr10:27,702,309-27,703,547; 1,239 bp) located within this deletion is intragenic and is found within the promoter of the gene PTCHD3

145

(Figure 4.6, B). A novel allele-specific methylation region, also associated with imprint control regions, is found in exon 1 of the gene, Patched-domain containing 3 (PTCHD3) (Elliott et al.,

2015). However, a homozygous deletion of PTCHD3 in an individual without any overt abnormal phenotypes suggests that this gene may be a non-essential human gene (Ghahramani

Seno et al., 2011).

The third individual (Lab ID 06-139) has a single reported pathogenic de novo 3.664 Mb gain of 20q13.33 (chr20:59266069-62893330) (Qiao et al., 2010; Qiao et al., 2014). The overlapping DMR is located within this gain (chr20:60,540,388-60,541,082; 695 bp) in an intergenic region that is closest to the transcription start site (11,670 bp) for the microRNA,

MIR1257 (Figure 4.6, C).

The last individual (Lab ID 06-54) is reported to have 2 maternal CNVs, a 58.467 kb gain of 12q24.11 and a 0.677 Mb gain of 15q21.1 (chr15:45056155-45733502; hg19), both of unknown significance (Qiao et al., 2010). The overlapping DMR found in the 15q21.1 gain

(chr15:45,314,789-45,315,642; 854 bp) is intragenic and is found within the promoter of the gene, sorbitol dehydrogenase (SORD) (Figure 4.6, D). SORD has previously been identified as a putative imprinted gene in two other studies but no evidence of parent of origin gene expression has been published to date (Nakabayashi et al., 2011; Yuen et al., 2011).

146

Figure 4.8 Putative Imprinted DMRs Found in de novo and Familial CNVs Putative imprinted DMRs found in de novo CNVs (A-C) and in a familial CNV (D) are shown in the UCSC genome browser with their closest known gene(s) (RefSeq).

4.6 Discussion

4.6.1 FISH Replication Timing Assay

The FISH-RT assay in my experiments could not differentiate between previously published synchronously and asynchronously replicating loci and I was not able to reliably determine if the selected CNVs have synchronous or asynchronous replication in control cells and whether they affect the replication of known imprinted regions in cells from individuals with

CNVs.

147

Previous studies showed that the SNRPN gene, whose first exon is located within the imprinting center for the 15q11.2 locus, replicates asynchronously when a SNRPN probe

(VYSIS) was used in controls in comparison to other synchronously replicating probes (Green

Finberg et al., 2003; Nagler et al., 2010; Yeshaya et al., 2009). However, I was unable to detect a difference between the SD frequencies for the asynchronously replicating SNRPN locus and the synchronously replicating 15qter locus in my controls.

Several factors can affect analysis of replication timing by FISH. 1) DNA must both be replicated and denatured with adequate separation of the DNA strands at a particular locus to accurately assess hybridization patterns, 2) background (non-specific probe binding) can confound signals, 3) low probe-DNA hybridization efficiency can reduce signals, 4) the number of dividing cells and, 5) subjectivity of counting signal patterns (Ensminger and Chess, 2004;

Selig et al., 1992). In my experiments, I tried to minimize the impact of these issues by running experiments with as much consistency as possible. For example, I attempted to minimize inadequate DNA separation by denaturing the probe and slide separately. In addition, I tried to select probes of similar size and use similar concentrations during hybridization. To do this, I assessed probes prior to use and re-ordered probes with high background in a different color. In my experience, BAC probes were quite variable in their performance and could have contributed to the variability I saw in my experiments. Overall, probes labeled with spectrum orange performed better than spectrum green in regards to background, but were sometimes fainter. In order to assess if the non-dividing cell population affected counts, I initially compared counts for both BrDU and non-BrDU labeled cells. In my experiment, counts were stable and did not seem to fluctuate in the number of SS, DD or SD cells detected for any probes counted, which is similar to other reports (Nagler et al., 2010; Yeshaya et al., 2009). The majority of data presented

148

in this chapter are for non-BrDU labeled cells. Finally, the subjectivity of counting signals was reduced by establishing a standard set of scoring criteria prior to counting, and both individuals counting attempted to be as consistent as possible even when signal patterns were not straightforward.

Similar to my experiments, Hirsch et al. (Hirsch et al., 2011) did not observe the expected percentage of cells with asynchronous RT for SNRPN in control peripheral blood lymphocytes. In addition, they found a significantly lower frequency of asynchronous cells compared to two loci with known synchronous replication timing (15qter and TP53). The authors noted that each probe varied significantly from each other and pointed out that differences in SD frequencies can be caused by differing probe characteristics, especially those known to cause performance differences (i.e. probe size and repetitive content within a probe). Keeping this in mind, I reviewed the information regarding the commercially available Cytocell probe (LPU-

005-A) containing the SNRPN and 15qter locus used as a control in my experiment. The size for both probes are similar; the SNRPN probe (red) in the Cytocell mixture is approximately 120 kb in size while the 15qter probe is approximately 100 kb in size. However, the Cytocell probe does not contain equal amounts of labeled probe for both loci; it contains 11.8 times more probe per test for the 15qter probe which may be artificially inflating the counts for the 15qter probe in my normal controls. In addition, most reports where asynchronous replication timing was reported for the 15q11.2 locus were done using VYSIS probes which was not used in my study. Although the Cytocell probe claims to overlap the imprinting center, it is possible that it covers the

SNRPN region in a way that does not allow asynchrony to be detectable by FISH (i.e. is at the boundary of a replication timing zone) (Supplementary Figure 4.1, Appendix C) (Greally et al.,

1998). This could also apply to the Toronto BAC probes ordered in the 15q11.2 region.

149

A potential drawback in my replication timing experiments is the use of EBV- transformed lymphoblast cells when short term cultured lymphocytes are often used (Bras et al.,

2008; Kagotani et al., 2002; Nagler et al., 2010). Reports of asynchronous replication timing detected in transformed mouse cell lines (Kitsberg et al., 1993; Rajcan-Separovic et al., 1998) suggest that the use of LBCs might not be an issue, however, other researchers have suggested that allele specific replication patterns may change during the transformation process (Gribnau et al., 2003) and that cultured cells lose features of imprinted domains as they are passaged

(Nogami et al., 2000).

While replication timing assays have been used to determine both synchronously and asynchronously replicating genomic regions, I have not been able to do so in my experiments. I am therefore unable to interpret the results of my replication timing assays and cannot determine whether the selected patient CNVs replicate synchronously or asynchronously and whether they affect the replication timing of other chromosomal regions.

4.6.2 Overlap of de novo and Familial CNVs in Individuals with ID with Putative Novel

DMRs

DNA methylation of CpG residues is an epigenetic mechanism that can repress or enhance gene expression. Four putatively imprinted DMRs identified by Hanna et al. (Hanna et al., 2015) overlap with 3 de novo and 1 familial CNVs on , 10, 20 and 15 respectively, in individuals with ID. Each are methylated on the maternal allele meaning that transcription would likely occur from the paternal allele.

The phenotypic consequences of the CNVs in individuals with ID that include putative imprinted DMRs are difficult to assess. For example, the putative imprinted DMR on chromosome 5q35.3 is found in an individual with a large deletion of 5q35.3 and a larger gain of

150

1p36 (3.328 Mb and 5.462 Mb respectively). The gain of 1p36 and the loss of 5q35 are the result of an unbalanced translocation transmitted from the father of the child who is a carrier of a balanced translocation. It is therefore likely that the large number of genes, both deleted and gained, play a significant role in the phenotype of the child. The impact of putative imprinted

DMRs detected in two additional de novo CNVs (10p12.1-11.23 and 20q13.33) are also difficult to assess due to their de novo origin, large size (>2Mb), and number of other CNV integral genes. Finally, the maternal duplication of the putatively imprinted SORD gene is not expected to cause a phenotypic effect as this gene is methylated on the maternal copy, so no change in gene expression should result.

4.7 Conclusion

The FISH-RT assay proved to be challenging as FISH signal patterns and counts were prone to variability. FISH-RT is advocated by some groups to be straightforward and even clinically relevant for detecting changes replication timing caused by chromosomal aberrations

(Amiel et al., 1998; Amiel et al., 2002; Yeshaya et al., 2009) and to monitor increasing genomic instability in cancer progression (Cytron et al., 2011; Nagler et al., 2010). However, this method is not clinically proven and no established guidelines exist regarding its use. Nevertheless, through my work on epigenetic aspects of CNVs, I identified a CNV from 17q25.3 that is close to an imprinted locus (section 4.4.7) and several CNVs that overlap putative imprinted DMRs

(section 4.5.3).

151

Chapter 5: Discussion

5.1 Overview

The widespread use of CMAs have significantly impacted clinical genetic practice through the detection of pathogenic CNVs in ~15% of individuals with ID. Although guidelines for determining the pathogenicity of a CNV are available, analysis of CNV effect on gene function is rarely performed. The goal of my PhD research project was to identify novel genes that cause ID, primarily by performing a multifaceted functional analysis of genes within the

CNV region, but I also looked for additional causes of ID in the rest of the genome. My study, which included subjects with two types of CNVs: a unique de novo pathogenic CNV from the

2p15-16 region and a familial CNV from the 1q21.1 region, resulted in the identification of several candidate ID genes integral to each CNV and in the identification of a mutation of a possible genomic modifier. In addition, I identified de novo and familial CNVs from our ID cohort that contain putative imprinted DMRs, and one CNV from 17q25 that is in close proximity to a predicted imprinted region. In this discussion I will summarize my main findings and discuss their significance. I will also discuss the strengths and limitations of my research and outline future directions.

5.2 Summary and Significance

In chapter 2, I present evidence that multiple genes contribute to the 2p15-16.1 microdeletion syndrome. In my study I extracted phenotype and genotype data from 23 individuals with 2p15-16.1 deletions in order to identify the most frequently reported phenotypes and deleted genes. Consistent phenotypes were seen in individuals regardless of the extent of the

CNV overlap suggesting that more than one gene in the region contributes to the syndrome.

Using multiple lines of evidence (i.e. frequency of gene deletion, changes in gene expression

152

when deleted, and haploinsufficiency scores), I identified 3 candidate genes, USP34, XPO1 and

REL, that had reduced protein levels when deleted. In addition, I used bioinformatics to show that two candidate genes, REL and XPO1, are members of multiple cellular pathways, including the canonical NF- B pathway. USP34, while not a member of the canonical NF- B pathway, is a negative regulator of NF- B in T-lymphocytes (Poalas et al., 2013). Interestingly, dysfunction of the NF- B pathway is associated with abnormalities of cognition and memory making it a putative ID related pathway in individuals whose deletions include one or more of my candidate

NF-B related genes (Ahn et al., 2008; Philippe et al., 2009). Overall, my work on the 2p15-16.1 microdeletion points to the possibility that the deletion of one or more candidate genes results in similar neurodevelopmental phenotypes, possibly through dysfunction of a shared pathway.

In chapter 3, I present evidence that altered function of 3 genes (CHD1L and PRKAB2 from the 1q21.1 CNV, and ATF6 from 1q23, outside the CNV) in individuals with ID and recurrent 1q21.1 CNVs may predispose them to environmental stress. My work was the first to show that copy number changes of CHD1L and PRKAB2 resulted in a change in their mRNA and protein expression levels as well as in cellular dysfunction in patient cells (Harvard et al.,

2011). Since I published my findings, several studies have reported altered expression levels for the two genes in 1q21.1 CNV carriers in both human LBCs and brain tissue (Luo et al., 2012;

Mehta et al., 2014; Ye et al., 2012). CHD1L and PRKAB2 regulate cellular response to genomic and metabolic stress, respectively, and their dysfunction due to copy number changes could make carriers of 1q21.1 CNVs more susceptible to environmental stress. I speculated that resulting phenotypes in 1q21.1 carriers could be more severe in individuals who experience increased levels of genomic or metabolic stress during development (Harvard et al., 2011).

153

In addition, I detected a mutation in a possible genetic modifier gene outside the 1q21.1

CNV in a family with an inherited 1q21.1 CNV. The pathogenic variant was in an ER stress sensor gene, ATF6, and resulted in reduction of mRNA (ATF6) and protein (ATF6α) in individuals carrying the mutation. I was unable to detect a defect in cellular response to short term chemically induced ER stress in cells with the ATF6 mutation. However, it is possible that the cells with constitutively reduced ATF6α are less able to deal with long term ER stress. I also detected an increase in the number of dysregulated ER stress response genes downstream of

ATF6α in individuals with the ATF6 mutation but also in members of an unrelated family with a

1q21.1 deletion, implicating abnormalities in this stress response pathway as a cause of abnormal development. Abnormal ER stress response has previously been implicated in ASD by causing inhibition of neural transmission in individuals with incorrectly folded protein (Momoi et al.,

2010). Overall, my work on the 1q21.1 CNV points to multiple stress response pathways whose deficit during variable environmental conditions in pre or postnatal development may result in differing phenotypic consequences.

My finding that the ER stress response may be impaired in some subjects with 1q21.1

CNVs opens possibilities for treatment with drugs that improve ER stress response. One drug, valproate, typically used to treat mood disorders, enhances stress response by increasing the expression of several ER stress response genes including ATF6 (Kazeminasab et al., 2012).

Although many questions related to the role of the specific mutation in ATF6 gene and ER stress dysfunction overall in subjects with 1q21.1 CNVs remain unresolved, my work points to the relevance of the ER stress response pathway to the phenotypic abnormalities in subjects with ID.

Finally, in chapter 4, I investigated the possibility that phenotypic variability seen for familial CNVs, may be due to imprinted genes (parent of origin effect). Although I was unable to

154

use FISH to identify candidate imprinted regions, I identified several CNVs that overlap putative imprinted DMRs and/or predicted imprinted loci, pointing to the possibility that they are involved in ID.

5.3 Strengths and Limitations

One strength of my study is that I used a multifaceted approach (high resolution CMA, gene expression, protein expression, bioinformatics, and functional tests) to explore multiple possibilities for how each CNV caused functional consequences of its integral genes. This broad approach allowed me to select candidate genes based on multiple factors, increasing my confidence in their selection.

In addition, I considered other possibilities for how CNVs can cause a functional impact.

For example, I explored the presence of regulatory elements (e.g. enhancers) in the 2p15-16.1

CNV, or imprinted genes in a larger number of ID associated CNVs from our cohort.

Furthermore, I tried to identify common pathways for a number of genes from the CNV that could be defective if any of these genes is deleted. Finally, I addressed a complex question of the effect of the environment on developmental abnormalities by assessing the function of genes associated with multiple forms of cellular stress.

One major limitation of my study is the use of LBCs for analysis of genes with suspected role in brain function. However, a large number of CNV genes in LBCs and human brain tissue have similar expression (e.g. 16p11.2 and 1q21.1), which supports the use of LBCs when studying NDDs (Luo et al., 2012; Mehta et al., 2014; Ye et al., 2012). On the other hand, studying cellular pathways that may be influenced by viral transformation, like the ER stress response (Isler et al., 2005), remains a possible issue when using LBCs and perhaps explains the unaltered ER stress response in subjects with ATF6 mutation, despite the reduction of ATF6α

155

protein expression. In addition, inducing ER stress with a single dose of chemical to assess ATF6 and downstream gene response to ER stress does not determine the possible effects of a constitutive reduction of ATF6α on development. Finally, the role of ATF6α in many processes including lipid biosynthesis, ER biogenesis and GABAergic inhibitory transmission suggest that the effect of and ATF6 mutation may be more complex and cannot be measured reliably in LBCs

(Bommiasamy et al., 2009; Maiuolo et al., 2011; Momoi et al., 2010).

5.4 Future Directions

5.4.1 Future Studies for Candidate Genes from 2p15-16.1 and 1q21.1 CNVs

In order to assess the contribution of each candidate gene to phenotypes observed for a specific CNV, it would be beneficial to do single gene knockdown or overexpression studies in an animal model. Confirming that a candidate gene causes developmental abnormalities similar to those seen in affected subjects, would represent strong proof of their causality (Koolen et al.,

2012). For example, overexpression and knockdown of KTC13 from the 16p11.2 CNV was sufficient to recapitulate the micro and macrocephaly, associated with individuals who have duplications or deletions of this CNV (Golzio et al., 2012). Knocking down genes that my studies have short-listed as candidates could be a start for further elucidation of their roles in the

1q21.1 CNV and 2p15-16.1 deletion syndrome. Further studies of pair-wise knock down of candidate genes from each CNV region could provide valuable insight into genes and gene interactions that have additive effects (Carvalho et al., 2014). Zebrafish are an attractive animal model for both types of studies because knockdown can be done for single genes and for gene combinations in order to study epistatic interactions (Carvalho et al., 2014; Golzio et al., 2012).

156

5.4.2 Future Studies for 2p15-16.1 Deletions

Future experiments testing the function of the NF-B pathway in individuals with XPO1,

REL and USP34 deletions would help determine if this pathway is affected in individuals with

2p15-16.1 deletions, and help determine if deletions containing more than one of these genes causes a more pronounced cellular effect. Additionally, higher resolution array analysis for cases previously studied by low resolution arrays would help refine breakpoints and gene content in pathogenic CNVs and identify other CNVs that might contain relevant genes. Another approach could be the use of a custom array designed to cover enhancer elements in the 2p region to determine if enhancers are deleted in other cases, as we discovered in Case No. 2.

5.4.3 Future Studies for 1q21.1 CNVs

In future studies, the genetic background of individuals with familial 1q21.1 CNVs should be carefully assessed to determine if genomic modifiers are present in more severely affected individuals. This could be done by using high resolution arrays to detect SNPs, small, intronic and exonic CNVs, and/or by using whole exome or whole genome sequencing to find pathogenic variants. Additionally, it would be beneficial to assess ER stress response in non- transformed cells (e.g. skin) from 1q21.1 CNV/ATF6 mutation carriers to determine if the UPR functions normally. Study of cellular responses to subtle ER stress in LBCs from individuals with the ATF6 mutation over longer time periods may also help determine if/how the ATF6 mutation modifies the cellular phenotype in 1q21.1 CNV carriers. Finally, functional studies knocking down ATF6 expression in neural cell lines to assess if a constitutive decrease in ATF6α expression results in increased apoptosis and/or causes defects in function (e.g. synaptic transmission) could link ATF6α to NDD phenotypes.

157

5.4.4 Future Studies for CNVs Overlapping Novel Putative Imprinted Regions

Parent of origin studies (i.e. establishing that the CNV occurred on the parental chromosome that carries the expressed gene) for the two de novo CNVs overlapping DMRs identified in my study would help to determine if further molecular follow-up is warranted.

Studying the methylation properties of the imprinted locus in the family with the CNV, in comparison to controls, as well as its allelic expression would also be of interest. The caveat of these studies as well as all other studies of imprinted regions is that the imprinting characteristics could differ during development and between tissues.

5.5 Conclusion

Recent widespread application of genomic approaches (CMA, NGS) to find causes of human is an exciting development, which has increased the diagnostic yield for many individuals. However, these approaches also results in a large number of uncertain findings that can only be resolved by downstream analysis of genetic changes in animal models or patient cells. Downstream analysis can be challenging as each gene or pathway requires expertise and often prompts world-wide collaborations, as I have experienced in my PhD project. Functional analysis of genes affected by CNVs or mutation is therefore a bottle-neck of the diagnostic work- up for many genetic diseases.

Functional analysis is particularly challenging when studying downstream effects of genetic changes in NDDs that cannot be assessed in affected tissues such as brain. Nevertheless, as I have shown in my dissertation, the number of candidate genes from larger regions of the genome can be narrowed down by using a combination of approaches. The development and widespread use of induced pluripotent patient stem cells (iPS) (Corti et al., 2015) that can be

158

induced to become neurons holds promise for a more specific gene-cellular phenotype association in neurodevelopmental disorders, although it brings its own challenges.

Ultimately, the goal of research is not only to acquire new knowledge but also to help discover new possible treatments. Identification of perturbed pathways that can be “corrected” by diet or medications may not cure patients with ID, but in some cases can reduce the impact of a genetic abnormality, if identified early. Several exciting treatment possibilities have been identified based on defective pathways detected in individuals with ID (Vissers et al., 2015). My work has also tackled this aspect of functional assessment of genetic changes by identifying a possible “treatable” pathway (i.e. ER stress response and Valporate). I hope that future comprehensive functional studies of genetic changes (CNVs, mutation) will help in identifying new candidate genes and putative pathways associated with ID.

159

Works Cited

Adachi, Y., Yamamoto, K., Okada, T., Yoshida, H., Harada, A. and Mori, K., 2008. ATF6 is a transcription factor specializing in the regulation of quality control proteins in the endoplasmic reticulum. Cell Struct Funct. 33, 75-89. Ahel, D., Horejsi, Z., Wiechens, N., Polo, S.E., Garcia-Wilson, E., Ahel, I., Flynn, H., Skehel, M., West, S.C., Jackson, S.P., Owen-Hughes, T. and Boulton, S.J., 2009. Poly(ADP- ribose)-dependent regulation of DNA repair by the chromatin remodeling enzyme ALC1. Science. 325, 1240-3. Ahn, H.J., Hernandez, C.M., Levenson, J.M., Lubin, F.D., Liou, H.C. and Sweatt, J.D., 2008. c- Rel, an NF-kappaB family transcription factor, is required for hippocampal long-term synaptic plasticity and memory formation. Learn Mem. 15, 539-49. Albers, C.A., Paul, D.S., Schulze, H., Freson, K., Stephens, J.C., Smethurst, P.A., Jolley, J.D., Cvejic, A., Kostadima, M., Bertone, P., Breuning, M.H., Debili, N., Deloukas, P., Favier, R., Fiedler, J., Hobbs, C.M., Huang, N., Hurles, M.E., Kiddle, G., Krapels, I., Nurden, P., Ruivenkamp, C.A., Sambrook, J.G., Smith, K., Stemple, D.L., Strauss, G., Thys, C., van Geet, C., Newbury-Ecob, R., Ouwehand, W.H. and Ghevaert, C., 2012. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat Genet. 44, 435-9, S1-2. Albrecht, U., Sutcliffe, J.S., Cattanach, B.M., Beechey, C.V., Armstrong, D., Eichele, G. and Beaudet, A.L., 1997. Imprinted expression of the murine Angelman syndrome gene, Ube3a, in hippocampal and Purkinje neurons. Nat Genet. 17, 75-8. Alimov, A., Wang, H., Liu, M., Frank, J.A., Xu, M., Ou, X. and Luo, J., 2013. Expression of autophagy and UPR genes in the developing brain during ethanol-sensitive and resistant periods. Metab Brain Dis. 28, 667-76. Allen Brain Atlas., 2015. The Allen Institute for Brain Science. American Association on Intellectual and Developmental Disabilities., 2010. Intellectual disability : definition, classification, and systems of supports, 11th ed. American Association on Intellectual and Developmental Disabilities, Washington, DC. American Psychiatric Association. and American Psychiatric Association. DSM-5 Task Force., 2013. Diagnostic and statistical manual of mental disorders : DSM-5, 5th ed. American Psychiatric Association, Washington, D.C. Amiel, A., Avivi, L., Gaber, E. and Fejgin, M.D., 1998. Asynchronous replication of allelic loci in . Eur J Hum Genet. 6, 359-64. Amiel, A., Reish, O., Gaber, E., Masterman, R., Tohami, T. and Fejgin, M.D., 2002. Asynchronous replication of alleles in genomes carrying a microdeletion. Isr Med Assoc J. 4, 702-5. Ansar, M., Santos-Cortez, R.L., Saqib, M.A., Zulfiqar, F., Lee, K., Ashraf, N.M., Ullah, E., Wang, X., Sajid, S., Khan, F.S., Amin-Ud-Din, M., University of Washington Center for Mendelian, G., Smith, J.D., Shendure, J., Bamshad, M.J., Nickerson, D.A., Hameed, A., Riazuddin, S., Ahmed, Z.M., Ahmad, W. and Leal, S.M., 2015. Mutation of ATF6 causes autosomal recessive achromatopsia. Hum Genet. Anthony, K., More, A. and Zhang, X., 2014. Activation of silenced cytokine gene promoters by the synergistic effect of TBP-TALE and VP64-TALE activators. PLoS One. 9, e95790.

160

Avner, P. and Heard, E., 2001. X-chromosome inactivation: counting, choice and initiation. Nat Rev Genet. 2, 59-67. Bai, B., Moore, H.M. and Laiho, M., 2013. CRM1 and its ribosome export adaptor NMD3 localize to the nucleolus and affect rRNA synthesis. Nucleus. 4, 315-25. Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. and Eichler, E.E., 2001. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005-17. Bajpai, R., Chen, D.A., Rada-Iglesias, A., Zhang, J., Xiong, Y., Helms, J., Chang, C.P., Zhao, Y., Swigut, T. and Wysocka, J., 2010. CHD7 cooperates with PBAF to control multipotent neural crest formation. Nature. 463, 958-62. Balci, T.B., Sawyer, S.L., Davila, J., Humphreys, P. and Dyment, D.A., 2015. Brain malformations in a patient with deletion 2p16.1: A refinement of the phenotype to BCL11A. Eur J Med Genet. 58, 351-4. Bamshad, M.J., Ng, S.B., Bigham, A.W., Tabor, H.K., Emond, M.J., Nickerson, D.A. and Shendure, J., 2011. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 12, 745-55. Basak, A., Hancarova, M., Ulirsch, J.C., Balci, T.B., Trkova, M., Pelisek, M., Vlckova, M., Muzikova, K., Cermak, J., Trka, J., Dyment, D.A., Orkin, S.H., Daly, M.J., Sedlacek, Z. and Sankaran, V.G., 2015. BCL11A deletions result in fetal hemoglobin persistence and neurodevelopmental alterations. J Clin Invest. 125, 2363-8. Bauer, D.E., Kamran, S.C., Lessard, S., Xu, J., Fujiwara, Y., Lin, C., Shao, Z., Canver, M.C., Smith, E.C., Pinello, L., Sabo, P.J., Vierstra, J., Voit, R.A., Yuan, G.C., Porteus, M.H., Stamatoyannopoulos, J.A., Lettre, G. and Orkin, S.H., 2013. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science. 342, 253-7. Benjamini, Y. and Hochberg, Y., 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological). 57, 289-300. Blaby, I.K., Majumder, M., Chatterjee, K., Jana, S., Grosjean, H., de Crecy-Lagard, V. and Gupta, R., 2011. Pseudouridine formation in archaeal RNAs: The case of Haloferax volcanii. RNA. 17, 1367-80. Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A. and Taylor, J., 2010. Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol. Chapter 19, Unit 19 10 1-21. Boggs, B.A. and Chinault, A.C., 1994. Analysis of replication timing properties of human X- chromosomal loci by fluorescence in situ hybridization. Proc Natl Acad Sci U S A. 91, 6083-7. Bollmann, F., Fechir, K., Nowag, S., Koch, K., Art, J., Kleinert, H. and Pautz, A., 2013. Human inducible nitric oxide synthase (iNOS) expression depends on chromosome region maintenance 1 (CRM1)- and eukaryotic translation initiation factor 4E (elF4E)-mediated nucleocytoplasmic mRNA transport. Nitric Oxide. 30, 49-59. Bommiasamy, H., Back, S.H., Fagone, P., Lee, K., Meshinchi, S., Vink, E., Sriburi, R., Frank, M., Jackowski, S., Kaufman, R.J. and Brewer, J.W., 2009. ATF6alpha induces XBP1- independent expansion of the endoplasmic reticulum. J Cell Sci. 122, 1626-36.

161

Botstein, D. and Risch, N., 2003. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 33 Suppl, 228-37. Bower, J.J., Karaca, G.F., Zhou, Y., Simpson, D.A., Cordeiro-Stone, M. and Kaufmann, W.K., 2010. Topoisomerase IIalpha maintains genomic stability through decatenation G(2) checkpoint signaling. Oncogene. 29, 4787-99. Bradford, M.M., 1976. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem. 72, 248-54. Bras, A., Cotrim, C.Z., Vasconcelos, I., Mexia, J., Leonard, A., Sanzhar, I., Akhmatullina, N. and Rueff, J., 2008. Asynchronous DNA replication detected by fluorescence in situ hybridisation as a possible indicator of genetic damage in human lymphocytes. Oncol Rep. 19, 369-75. Bresson, S.M. and Conrad, N.K., 2013. The human nuclear poly(a)-binding protein promotes RNA hyperadenylation and decay. PLoS Genet. 9, e1003893. Brown, P.O. and Botstein, D., 1999. Exploring the new world of the genome with DNA microarrays. Nat Genet. 21, 33-7. Brunetti-Pierri, N., Berg, J.S., Scaglia, F., Belmont, J., Bacino, C.A., Sahoo, T., Lalani, S.R., Graham, B., Lee, B., Shinawi, M., Shen, J., Kang, S.H., Pursley, A., Lotze, T., Kennedy, G., Lansky-Shafer, S., Weaver, C., Roeder, E.R., Grebe, T.A., Arnold, G.L., Hutchison, T., Reimschisel, T., Amato, S., Geragthy, M.T., Innis, J.W., Obersztyn, E., Nowakowska, B., Rosengren, S.S., Bader, P.I., Grange, D.K., Naqvi, S., Garnica, A.D., Bernes, S.M., Fong, C.T., Summers, A., Walters, W.D., Lupski, J.R., Stankiewicz, P., Cheung, S.W. and Patel, A., 2008. Recurrent reciprocal 1q21.1 deletions and duplications associated with microcephaly or macrocephaly and developmental and behavioral abnormalities. Nat Genet. 40, 1466-71. Bull, P., Morley, K.L., Hoekstra, M.F., Hunter, T. and Verma, I.M., 1990. The mouse c-rel protein has an N-terminal regulatory domain and a C-terminal transcriptional transactivation domain. Mol Cell Biol. 10, 5473-85. Bunting, K., Rao, S., Hardy, K., Woltring, D., Denyer, G.S., Wang, J., Gerondakis, S. and Shannon, M.F., 2007. Genome-wide analysis of gene expression in T cells to identify targets of the NF-kappa B transcription factor c-Rel. J Immunol. 178, 7097-109. Calaway, J.D., Dominguez, J.I., Hanson, M.E., Cambranis, E.C., Pardo-Manuel de Villena, F. and de la Casa-Esperon, E., 2012. Intronic parent-of-origin dependent differential methylation at the Actn1 gene is conserved in rodents but is not associated with imprinted expression. PLoS One. 7, e48936. Carter, N.P., 2007. Methods and strategies for analyzing copy number variation using DNA microarrays. Nat Genet. 39, S16-21. Carvalho, C.M., Vasanth, S., Shinawi, M., Russell, C., Ramocki, M.B., Brown, C.W., Graakjaer, J., Skytte, A.B., Vianna-Morgante, A.M., Krepischi, A.C., Patel, G.S., Immken, L., Aleck, K., Lim, C., Cheung, S.W., Rosenberg, C., Katsanis, N. and Lupski, J.R., 2014. Dosage changes of a segment at 17p13.1 lead to intellectual disability and microcephaly as a result of complex genetic interaction of multiple genes. Am J Hum Genet. 95, 565- 78. Carvill, G.L. and Mefford, H.C., 2013. Microdeletion syndromes. Curr Opin Genet Dev. 23, 232- 9.

162

Casilli, F., Di Rocco, Z.C., Gad, S., Tournier, I., Stoppa-Lyonnet, D., Frebourg, T. and Tosi, M., 2002. Rapid detection of novel BRCA1 rearrangements in high-risk breast-ovarian cancer families using multiplex PCR of short fluorescent fragments. Hum Mutat. 20, 218-26. Catalogue of Parent of Origin Effects., 2011. in: Morison I.M. (Ed.). Dunedin School of Medicine, Dunedin, New Zealand. Chabchoub, E., Vermeesch, J.R., de Ravel, T., de Cock, P. and Fryns, J.P., 2008. The facial dysmorphy in the newly recognised microdeletion 2p15-p16.1 refined to a 570 kb region in 2p15. J Med Genet. 45, 189-92. Charbonnier, F., Raux, G., Wang, Q., Drouot, N., Cordier, F., Limacher, J.M., Saurin, J.C., Puisieux, A., Olschwang, S. and Frebourg, T., 2000. Detection of exon deletions and duplications of the mismatch repair genes in hereditary nonpolyposis families using multiplex polymerase chain reaction of short fluorescent fragments. Cancer Res. 60, 2760-3. Chen, M., Huang, J.D., Hu, L., Zheng, B.J., Chen, L., Tsang, S.L. and Guan, X.Y., 2009. Transgenic CHD1L expression in mouse induces spontaneous tumors. PLoS One. 4, e6727. Choi, M., Scholl, U.I., Ji, W., Liu, T., Tikhonova, I.R., Zumbo, P., Nayir, A., Bakkaloglu, A., Ozen, S., Sanjad, S., Nelson-Williams, C., Farhi, A., Mane, S. and Lifton, R.P., 2009. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A. 106, 19096-101. Christiansen, J., Dyck, J.D., Elyas, B.G., Lilley, M., Bamforth, J.S., Hicks, M., Sprysak, K.A., Tomaszewski, R., Haase, S.M., Vicen-Wyhony, L.M. and Somerville, M.J., 2004. Chromosome 1q21.1 contiguous gene deletion is associated with congenital heart disease. Circ Res. 94, 1429-35. Ciofi, C., Funk, S.M., Coote, T., Cheesman, D.J., Hammond, R.L., Saccheri, I.J. and Bruford, M.W., 1998. Genotyping with microsatellite markers, Molecular Tools for Screening Biodiversity. Springer, pp. 195-201. Classen, C.F., Riehmer, V., Landwehr, C., Kosfeld, A., Heilmann, S., Scholz, C., Kabisch, S., Engels, H., Tierling, S., Zivicnjak, M., Schacherer, F., Haffner, D. and Weber, R.G., 2013. Dissecting the genotype in syndromic intellectual disability using whole exome sequencing in addition to genome-wide copy number analysis. Hum Genet. 132, 825-41. Coe, B.P., Girirajan, S. and Eichler, E.E., 2012. The genetic variability and commonality of neurodevelopmental disease. Am J Med Genet C Semin Med Genet. 160C, 118-29. Colige, A., Sieron, A.L., Li, S.W., Schwarze, U., Petty, E., Wertelecki, W., Wilcox, W., Krakow, D., Cohn, D.H., Reardon, W., Byers, P.H., Lapiere, C.M., Prockop, D.J. and Nusgens, B.V., 1999. Human Ehlers-Danlos syndrome type VII C and bovine dermatosparaxis are caused by mutations in the procollagen I N-proteinase gene. Am J Hum Genet. 65, 308- 17. Conrad, D.F., Pinto, D., Redon, R., Feuk, L., Gokcumen, O., Zhang, Y., Aerts, J., Andrews, T.D., Barnes, C., Campbell, P., Fitzgerald, T., Hu, M., Ihm, C.H., Kristiansson, K., Macarthur, D.G., Macdonald, J.R., Onyiah, I., Pang, A.W., Robson, S., Stirrups, K., Valsesia, A., Walter, K., Wei, J., Tyler-Smith, C., Carter, N.P., Lee, C., Scherer, S.W. and Hurles, M.E., 2010. Origins and functional impact of copy number variation in the human genome. Nature. 464, 704-12. Consortium, E.P., 2012. An integrated encyclopedia of DNA elements in the human genome. Nature. 489, 57-74. 163

Cooper, D.N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. and Kehrer-Sawatzki, H., 2013. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum Genet. 132, 1077-130. Cooper, G.M., Coe, B.P., Girirajan, S., Rosenfeld, J.A., Vu, T.H., Baker, C., Williams, C., Stalker, H., Hamid, R., Hannig, V., Abdel-Hamid, H., Bader, P., McCracken, E., Niyazov, D., Leppig, K., Thiese, H., Hummel, M., Alexander, N., Gorski, J., Kussmann, J., Shashi, V., Johnson, K., Rehder, C., Ballif, B.C., Shaffer, L.G. and Eichler, E.E., 2011. A copy number variation morbidity map of developmental delay. Nat Genet. 43, 838-46. Corti, S., Faravelli, I., Cardano, M. and Conti, L., 2015. Human pluripotent stem cells as tools for neurodegenerative and neurodevelopmental disease modeling and drug discovery. Expert Opin Drug Discov. 10, 615-29. Cuneo, B.F., 2001. 22q11.2 deletion syndrome: DiGeorge, velocardiofacial, and conotruncal anomaly face syndromes. Curr Opin Pediatr. 13, 465-72. Curry, C.J., Stevenson, R.E., Aughton, D., Byrne, J., Carey, J.C., Cassidy, S., Cunniff, C., Graham, J.M., Jr., Jones, M.C., Kaback, M.M., Moeschler, J., Schaefer, G.B., Schwartz, S., Tarleton, J. and Opitz, J., 1997. Evaluation of mental retardation: recommendations of a Consensus Conference: American College of . Am J Med Genet. 72, 468-77. Cytron, S., Stepnov, E., Bounkin, I., Mashevich, M., Dotan, A. and Avivi, L., 2011. Epigenetic analyses in blood cells of men suspected of prostate cancer predict the outcome of biopsy better than serum PSA levels. Clin . 2, 383-388. Dasgupta, B., Ju, J.S., Sasaki, Y., Liu, X., Jung, S.R., Higashida, K., Lindquist, D. and Milbrandt, J., 2012. The AMPK beta2 subunit is required for energy homeostasis during metabolic stress. Mol Cell Biol. 32, 2837-48. Dasgupta, B. and Milbrandt, J., 2009. AMP-activated protein kinase phosphorylates to control mammalian brain development. Dev Cell. 16, 256-70. Davis, J.M., Searles, V.B., Anderson, N., Keeney, J., Dumas, L. and Sikela, J.M., 2014. DUF1220 dosage is linearly associated with increasing severity of the three primary symptoms of autism. PLoS Genet. 10, e1004241. Davis, J.M., Searles, V.B., Anderson, N., Keeney, J., Raznahan, A., Horwood, L.J., Fergusson, D.M., Kennedy, M.A., Giedd, J. and Sikela, J.M., 2015. DUF1220 copy number is linearly associated with increased cognitive function as measured by total IQ and mathematical aptitude scores. Hum Genet. 134, 67-75. de la Chapelle, A., Herva, R., Koivisto, M. and Aula, P., 1981. A deletion in can cause DiGeorge syndrome. Hum Genet. 57, 253-6. de Leeuw, N., Pfundt, R., Koolen, D.A., Neefs, I., Scheltinga, I., Mieloo, H., Sistermans, E.A., Nillesen, W., Smeets, D.F., de Vries, B.B. and Knoers, N.V., 2008. A newly recognised microdeletion syndrome involving 2p15p16.1: narrowing down the critical region by adding another patient detected by genome wide tiling path array comparative genomic hybridisation analysis. J Med Genet. 45, 122-4. de Ligt, J., Willemsen, M.H., van Bon, B.W., Kleefstra, T., Yntema, H.G., Kroes, T., Vulto-van Silfhout, A.T., Koolen, D.A., de Vries, P., Gilissen, C., del Rosario, M., Hoischen, A., Scheffer, H., de Vries, B.B., Brunner, H.G., Veltman, J.A. and Vissers, L.E., 2012.

164

Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med. 367, 1921-9. Deciphering Developmental Disorders, S., 2015. Large-scale discovery of novel genetic causes of developmental disorders. Nature. 519, 223-8. Dieci, G., Preti, M. and Montanini, B., 2009. Eukaryotic snoRNAs: a paradigm for gene expression flexibility. Genomics. 94, 83-8. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S. and Ren, B., 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 485, 376-80. Dong, Y., Zhang, M., Wang, S., Liang, B., Zhao, Z., Liu, C., Wu, M., Choi, H.C., Lyons, T.J. and Zou, M.H., 2010. Activation of AMP-activated protein kinase inhibits oxidized LDL- triggered endoplasmic reticulum stress in vivo. Diabetes. 59, 1386-96. Dumas, L.J., O'Bleness, M.S., Davis, J.M., Dickens, C.M., Anderson, N., Keeney, J.G., Jackson, J., Sikela, M., Raznahan, A., Giedd, J., Rapoport, J., Nagamani, S.S., Erez, A., Brunetti- Pierri, N., Sugalski, R., Lupski, J.R., Fingerlin, T., Cheung, S.W. and Sikela, J.M., 2012. DUF1220-domain copy number implicated in human brain-size pathology and evolution. Am J Hum Genet. 91, 444-54. Elliott, G., Hong, C., Xing, X., Zhou, X., Li, D., Coarfa, C., Bell, R.J., Maire, C.L., Ligon, K.L., Sigaroudinia, M., Gascard, P., Tlsty, T.D., Harris, R.A., Schalkwyk, L.C., Bilenky, M., Mill, J., Farnham, P.J., Kellis, M., Marra, M.A., Milosavljevic, A., Hirst, M., Stormo, G.D., Wang, T. and Costello, J.F., 2015. Intermediate DNA methylation is a conserved signature of genome regulation. Nat Commun. 6, 6363. Ensminger, A.W. and Chess, A., 2004. Coordinated replication timing of monoallelically expressed genes along human . Hum Mol Genet. 13, 651-8. Faisal, M., Kim, H. and Kim, J., 2014. Sexual differences of imprinted genes' expression levels. Gene. 533, 434-8. Fannemel, M., Baroy, T., Holmgren, A., Rodningen, O.K., Haugsand, T.M., Hansen, B., Frengen, E. and Misceo, D., 2014. Haploinsufficiency of XPO1 and USP34 by a de novo 230 kb deletion in 2p15, in a patient with mild intellectual disability and cranio-facial dysmorphisms. Eur J Med Genet. 57, 513-9. Feenstra, I., Fang, J., Koolen, D.A., Siezen, A., Evans, C., Winter, R.M., Lees, M.M., Riegel, M., de Vries, B.B., Van Ravenswaaij, C.M. and Schinzel, A., 2006. European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations (ECARUCA); an online database for rare chromosome abnormalities. Eur J Med Genet. 49, 279-91. Felix, T.M., Petrin, A.L., Sanseverino, M.T. and Murray, J.C., 2010. Further characterization of microdeletion syndrome involving 2p15-p16.1. Am J Med Genet A. 152A, 2604-8. Fenech, M., Kirsch-Volders, M., Natarajan, A.T., Surralles, J., Crott, J.W., Parry, J., Norppa, H., Eastmond, D.A., Tucker, J.D. and Thomas, P., 2011. Molecular mechanisms of micronucleus, nucleoplasmic bridge and nuclear bud formation in mammalian and human cells. Mutagenesis. 26, 125-32. Feuk, L., Carson, A.R. and Scherer, S.W., 2006. in the human genome. Nat Rev Genet. 7, 85-97. Firth, H.V., Richards, S.M., Bevan, A.P., Clayton, S., Corpas, M., Rajan, D., Van Vooren, S., Moreau, Y., Pettett, R.M. and Carter, N.P., 2009. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 84, 524-33. 165

Florisson, J.M., Mathijssen, I.M., Dumee, B., Hoogeboom, J.A., Poddighe, P.J., Oostra, B.A., Frijns, J.P., Koster, L., de Klein, A., Eussen, B., de Vries, B.B., Swagemakers, S., van der Spek, P.J. and Verkerk, A.J., 2013. Complex craniosynostosis is associated with the 2p15p16.1 microdeletion syndrome. Am J Med Genet A. 161A, 244-53. Fonseca, S.G., Ishigaki, S., Oslowski, C.M., Lu, S., Lipson, K.L., Ghosh, R., Hayashi, E., Ishihara, H., Oka, Y., Permutt, M.A. and Urano, F., 2010. Wolfram syndrome 1 gene negatively regulates ER stress signaling in rodent and human cells. J Clin Invest. 120, 744-55. Foo, J.N., Liu, J.J. and Tan, E.K., 2012. Whole-genome and whole-exome sequencing in neurological diseases. Nat Rev Neurol. 8, 508-17. Fornerod, M., Ohno, M., Yoshida, M. and Mattaj, I.W., 1997. CRM1 is an export receptor for leucine-rich nuclear export signals. Cell. 90, 1051-60. Freeman, J.L., Perry, G.H., Feuk, L., Redon, R., McCarroll, S.A., Altshuler, D.M., Aburatani, H., Jones, K.W., Tyler-Smith, C., Hurles, M.E., Carter, N.P., Scherer, S.W. and Lee, C., 2006. Copy number variation: new insights in genome diversity. Genome Res. 16, 949- 61. Fuchs-Telem, D., Stewart, H., Rapaport, D., Nousbeck, J., Gat, A., Gini, M., Lugassy, Y., Emmert, S., Eckl, K., Hennies, H.C., Sarig, O., Goldsher, D., Meilik, B., Ishida- Yamamoto, A., Horowitz, M. and Sprecher, E., 2011. CEDNIK syndrome results from loss-of-function mutations in SNAP29. Br J Dermatol. 164, 610-6. Fukuda, M., Asano, S., Nakamura, T., Adachi, M., Yoshida, M., Yanagida, M. and Nishida, E., 1997. CRM1 is responsible for intracellular transport mediated by the nuclear export signal. Nature. 390, 308-11. Fulda, S., Gorman, A.M., Hori, O. and Samali, A., 2010. Cellular stress responses: cell survival and cell death. Int J Cell Biol. 2010, 214074. Funnell, A.P., Prontera, P., Ottaviani, V., Piccione, M., Giambona, A., Maggio, A., Ciaffoni, F., Stehling-Sun, S., Marra, M., Masiello, F., Varricchio, L., Stamatoyannopoulos, J.A., Migliaccio, A.R. and Papayannopoulou, T., 2015. 2p15-p16.1 microdeletions encompassing and proximal to BCL11A are associated with elevated HbF in addition to neurologic impairment. Blood. 126, 89-93. Gao, L., Sun, C., Qiu, H.L., Liu, H., Shao, H.J., Wang, J. and Li, W.X., 2004. Cloning and characterization of a novel human zinc finger gene, hKid3, from a C2H2-ZNF enriched human embryonic cDNA library. Biochem Biophys Res Commun. 325, 1145-52. Garelick, M.G. and Kennedy, B.K., 2011. TOR on the brain. Exp Gerontol. 46, 155-63. Ghahramani Seno, M.M., Kwan, B.Y., Lee-Ng, K.K., Moessner, R., Lionel, A.C., Marshall, C.R. and Scherer, S.W., 2011. Human PTCHD3 nulls: rare copy number and sequence variants suggest a non-essential gene. BMC Med Genet. 12, 45. Gheldof, N., Witwicki, R.M., Migliavacca, E., Leleu, M., Didelot, G., Harewood, L., Rougemont, J. and Reymond, A., 2013. Structural variation-associated expression changes are paralleled by chromatin architecture modifications. PLoS One. 8, e79973. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W.J. and Nekrutenko, A., 2005. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451-5. Gilbert, D.M., 2002. Replication timing and transcriptional control: beyond cause and effect. Curr Opin Cell Biol. 14, 377-83.

166

Gilissen, C., Hehir-Kwa, J.Y., Thung, D.T., van de Vorst, M., van Bon, B.W., Willemsen, M.H., Kwint, M., Janssen, I.M., Hoischen, A., Schenck, A., Leach, R., Klein, R., Tearle, R., Bo, T., Pfundt, R., Yntema, H.G., de Vries, B.B., Kleefstra, T., Brunner, H.G., Vissers, L.E. and Veltman, J.A., 2014. Genome sequencing identifies major causes of severe intellectual disability. Nature. 511, 344-7. Gilissen, C., Hoischen, A., Brunner, H.G. and Veltman, J.A., 2012. Disease gene identification strategies for exome sequencing. Eur J Hum Genet. 20, 490-7. Gillis, J., Mistry, M. and Pavlidis, P., 2010. Gene function analysis in complex data sets using ErmineJ. Nat Protoc. 5, 1148-59. Gimelbrant, A., Hutchinson, J.N., Thompson, B.R. and Chess, A., 2007. Widespread monoallelic expression on human autosomes. Science. 318, 1136-40. Gimelbrant, A.A. and Chess, A., 2006. An epigenetic state associated with areas of gene duplication. Genome Res. 16, 723-9. Girirajan, S., Dennis, M.Y., Baker, C., Malig, M., Coe, B.P., Campbell, C.D., Mark, K., Vu, T.H., Alkan, C., Cheng, Z., Biesecker, L.G., Bernier, R. and Eichler, E.E., 2013. Refinement and discovery of new hotspots of copy-number variation associated with autism spectrum disorder. Am J Hum Genet. 92, 221-37. Girirajan, S. and Eichler, E.E., 2010. Phenotypic variability and genetic susceptibility to genomic disorders. Hum Mol Genet. 19, R176-87. Girirajan, S., Rosenfeld, J.A., Coe, B.P., Parikh, S., Friedman, N., Goldstein, A., Filipink, R.A., McConnell, J.S., Angle, B., Meschino, W.S., Nezarati, M.M., Asamoah, A., Jackson, K.E., Gowans, G.C., Martin, J.A., Carmany, E.P., Stockton, D.W., Schnur, R.E., Penney, L.S., Martin, D.M., Raskin, S., Leppig, K., Thiese, H., Smith, R., Aberg, E., Niyazov, D.M., Escobar, L.F., El-Khechen, D., Johnson, K.D., Lebel, R.R., Siefkas, K., Ball, S., Shur, N., McGuire, M., Brasington, C.K., Spence, J.E., Martin, L.S., Clericuzio, C., Ballif, B.C., Shaffer, L.G. and Eichler, E.E., 2012. Phenotypic heterogeneity of genomic disorders and rare copy-number variants. N Engl J Med. 367, 1321-31. Global Burden of Disease Study, C., 2015. Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet. Goecks, J., Nekrutenko, A., Taylor, J. and Galaxy, T., 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86. Golzio, C., Willer, J., Talkowski, M.E., Oh, E.C., Taniguchi, Y., Jacquemont, S., Reymond, A., Sun, M., Sawa, A., Gusella, J.F., Kamiya, A., Beckmann, J.S. and Katsanis, N., 2012. KCTD13 is a major driver of mirrored neuroanatomical phenotypes of the 16p11.2 copy number variant. Nature. 485, 363-7. Gravina, G.L., Tortoreto, M., Mancini, A., Addis, A., Di Cesare, E., Lenzi, A., Landesman, Y., McCauley, D., Kauffman, M., Shacham, S., Zaffaroni, N. and Festuccia, C., 2014. XPO1/CRM1-selective inhibitors of nuclear export (SINE) reduce tumor spreading and improve overall survival in preclinical models of prostate cancer (PCa). J Hematol Oncol. 7, 46. Greally, J.M., Starr, D.J., Hwang, S., Song, L., Jaarola, M. and Zemel, S., 1998. The mouse H19 locus mediates a transition between imprinted and non-imprinted DNA replication patterns. Hum Mol Genet. 7, 91-5. 167

Green Finberg, Y., Kantor, B., Hershko, A.Y. and Razin, A., 2003. Characterization of the human Snrpn minimal promoter and cis elements within it. Gene. 304, 201-6. Greenberg, F., Crowder, W.E., Paschall, V., Colon-Linares, J., Lubianski, B. and Ledbetter, D.H., 1984. Familial DiGeorge syndrome and associated partial of chromosome 22. Hum Genet. 65, 317-9. Greenway, S.C., Pereira, A.C., Lin, J.C., DePalma, S.R., Israel, S.J., Mesquita, S.M., Ergul, E., Conta, J.H., Korn, J.M., McCarroll, S.A., Gorham, J.M., Gabriel, S., Altshuler, D.M., Quintanilla-Dieck Mde, L., Artunduaga, M.A., Eavey, R.D., Plenge, R.M., Shadick, N.A., Weinblatt, M.E., De Jager, P.L., Hafler, D.A., Breitbart, R.E., Seidman, J.G. and Seidman, C.E., 2009. De novo copy number variants identify new genes and loci in isolated sporadic tetralogy of Fallot. Nat Genet. 41, 931-5. Gregersen, P.K., Amos, C.I., Lee, A.T., Lu, Y., Remmers, E.F., Kastner, D.L., Seldin, M.F., Criswell, L.A., Plenge, R.M., Holers, V.M., Mikuls, T.R., Sokka, T., Moreland, L.W., Bridges, S.L., Jr., Xie, G., Begovich, A.B. and Siminovitch, K.A., 2009. REL, encoding a member of the NF-kappaB family of transcription factors, is a newly defined risk locus for rheumatoid arthritis. Nat Genet. 41, 820-3. Gribnau, J., Hochedlinger, K., Hata, K., Li, E. and Jaenisch, R., 2003. Asynchronous replication timing of imprinted loci is independent of DNA methylation, but consistent with differential subnuclear localization. Genes Dev. 17, 759-73. Grigoriadis, G., Vasanthakumar, A., Banerjee, A., Grumont, R., Overall, S., Gleeson, P., Shannon, F. and Gerondakis, S., 2011. c-Rel controls multiple discrete steps in the thymic development of Foxp3+ CD4 regulatory T cells. PLoS One. 6, e26851. Gu, W., Zhang, F. and Lupski, J.R., 2008. Mechanisms for human genomic rearrangements. Pathogenetics. 1, 4. Gurha, P. and Gupta, R., 2008. Archaeal Pus10 proteins can produce both pseudouridine 54 and 55 in tRNA. RNA. 14, 2521-7. Hamma, T. and Ferre-D'Amare, A.R., 2006. Pseudouridine synthases. Chem Biol. 13, 1125-35. Hancarova, M., Simandlova, M., Drabova, J., Mannik, K., Kurg, A. and Sedlacek, Z., 2013. A patient with de novo 0.45 Mb deletion of 2p16.1: the role of BCL11A, PAPOLG, REL, and FLJ16341 in the 2p15-p16.1 microdeletion syndrome. Am J Med Genet A. 161A, 865-70. Hanna, C.W., Peñaherrera, M.S., Saadeh, H., Andrews, S., McFadden, D.E., Kelsey, G. and Robinson, W.P., 2015. Pervasive polymorphic imprinted methylation in the human placenta. Submitted. Harris, J.C., 2005. Intellectual Disability : Understanding Its Development, Causes, Classification, Evaluation, and Treatment, Oxford University Press, USA, Cary, NC, USA. Harvard, C., Strong, E., Mercier, E., Colnaghi, R., Alcantara, D., Chow, E., Martell, S., Tyson, C., Hrynchak, M., McGillivray, B., Hamilton, S., Marles, S., Mhanni, A., Dawson, A.J., Pavlidis, P., Qiao, Y., Holden, J.J., Lewis, S.M., O'Driscoll, M. and Rajcan-Separovic, E., 2011. Understanding the impact of 1q21.1 copy number variant. Orphanet J Rare Dis. 6, 54. Hastings, P.J., Ira, G. and Lupski, J.R., 2009a. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 5, e1000327.

168

Hastings, P.J., Lupski, J.R., Rosenberg, S.M. and Ira, G., 2009b. Mechanisms of change in gene copy number. Nat Rev Genet. 10, 551-64. Hayashi, A., Kasahara, T., Kametani, M., Toyota, T., Yoshikawa, T. and Kato, T., 2009. Aberrant endoplasmic reticulum stress response in lymphoblastoid cells from patients with bipolar disorder. Int J Neuropsychopharmacol. 12, 33-43. Henrichsen, C.N., Csardi, G., Zabot, M.T., Fusco, C., Bergmann, S., Merla, G. and Reymond, A., 2011. Using transcription modules to identify expression clusters perturbed in Williams-Beuren syndrome. PLoS Comput Biol. 7, e1001054. Hinney, A., Scherag, A., Jarick, I., Albayrak, O., Putter, C., Pechlivanis, S., Dauvermann, M.R., Beck, S., Weber, H., Scherag, S., Nguyen, T.T., Volckmar, A.L., Knoll, N., Faraone, S.V., Neale, B.M., Franke, B., Cichon, S., Hoffmann, P., Nothen, M.M., Schreiber, S., Jockel, K.H., Wichmann, H.E., Freitag, C., Lempp, T., Meyer, J., Gilsbach, S., Herpertz- Dahlmann, B., Sinzig, J., Lehmkuhl, G., Renner, T.J., Warnke, A., Romanos, M., Lesch, K.P., Reif, A., Schimmelmann, B.G., Hebebrand, J. and Psychiatric, G.C.A.s., 2011. Genome-wide association study in German patients with attention deficit/hyperactivity disorder. Am J Med Genet B Neuropsychiatr Genet. 156B, 888-97. Hiratani, I., Ryba, T., Itoh, M., Rathjen, J., Kulik, M., Papp, B., Fussner, E., Bazett-Jones, D.P., Plath, K., Dalton, S., Rathjen, P.D. and Gilbert, D.M., 2010. Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome Res. 20, 155-69. Hiratani, I., Ryba, T., Itoh, M., Yokochi, T., Schwaiger, M., Chang, C.W., Lyou, Y., Townes, T.M., Schubeler, D. and Gilbert, D.M., 2008. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS Biol. 6, e245. Hirsch, B., Oseth, L., Cain, M., Trader, E., Pulkrabek, S., Lindgren, B., Luo, X., Clay, M., Miller, J., Confer, D., Weisdorf, D. and McCullough, J., 2011. Effects of granulocyte- colony stimulating factor on chromosome and replication asynchrony in healthy peripheral blood stem cell donors. Blood. 118, 2602-8. Hoischen, A., van Bon, B.W., Gilissen, C., Arts, P., van Lier, B., Steehouwer, M., de Vries, P., de Reuver, R., Wieskamp, N., Mortier, G., Devriendt, K., Amorim, M.Z., Revencu, N., Kidd, A., Barbosa, M., Turner, A., Smith, J., Oley, C., Henderson, A., Hayes, I.M., Thompson, E.M., Brunner, H.G., de Vries, B.B. and Veltman, J.A., 2010. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nat Genet. 42, 483-5. Huang, N., Lee, I., Marcotte, E.M. and Hurles, M.E., 2010. Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154. Hucthagowder, V., Liu, T.C., Paciorkowski, A.R., Thio, L.L., Keller, M.S., Anderson, C.D., Herman, T., Dehner, L.P., Grange, D.K. and Kulkarni, S., 2012. Chromosome 2p15p16.1 microdeletion syndrome: 2.5 Mb deletion in a patient with renal anomalies, intractable seizures and a choledochal cyst. Eur J Med Genet. 55, 485-9. Iakoubov, L., Mossakowska, M., Szwed, M., Duan, Z., Sesti, F. and Puzianowska-Kuznicka, M., 2013. A common copy number variation (CNV) polymorphism in the CNTNAP4 gene: association with aging in females. PLoS One. 8, e79790. Isler, J.A., Skalet, A.H. and Alwine, J.C., 2005. Human cytomegalovirus infection activates and regulates the unfolded protein response. J Virol. 79, 6890-9. Itsara, A., Cooper, G.M., Baker, C., Girirajan, S., Li, J., Absher, D., Krauss, R.M., Myers, R.M., Ridker, P.M., Chasman, D.I., Mefford, H., Ying, P., Nickerson, D.A. and Eichler, E.E.,

169

2009. Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet. 84, 148-61. Joardar, A., Jana, S., Fitzek, E., Gurha, P., Majumder, M., Chatterjee, K., Geisler, M. and Gupta, R., 2013. Role of forefinger and thumb loops in production of Psi54 and Psi55 in tRNAs by archaeal Pus10. RNA. 19, 1279-94. John, A., Brylka, H., Wiegreffe, C., Simon, R., Liu, P., Juttner, R., Crenshaw, E.B., 3rd, Luyten, F.P., Jenkins, N.A., Copeland, N.G., Birchmeier, C. and Britsch, S., 2012. Bcl11a is required for neuronal morphogenesis and sensory circuit formation in dorsal spinal cord development. Development. 139, 1831-41. Joyce, C.A., Dennis, N.R., Cooper, S. and Browne, C.E., 2001. Subtelomeric rearrangements: results from a study of selected and unselected probands with idiopathic mental retardation and control individuals by using high-resolution G-banding and FISH. Hum Genet. 109, 440-51. Kagotani, K., Takebayashi, S., Kohda, A., Taguchi, H., Paulsen, M., Walter, J., Reik, W. and Okumura, K., 2002. Replication timing properties within the mouse distal imprinting cluster. Biosci Biotechnol Biochem. 66, 1046-51. Kahn, B.B., Alquier, T., Carling, D. and Hardie, D.G., 2005. AMP-activated protein kinase: ancient energy gauge provides clues to modern understanding of metabolism. Cell Metab. 1, 15-25. Kakiuchi, C., Ishiwata, M., Umekage, T., Tochigi, M., Kohda, K., Sasaki, T. and Kato, T., 2004. Association of the XBP1-116C/G polymorphism with schizophrenia in the Japanese population. Psychiatry Clin Neurosci. 58, 438-40. Kakiuchi, C., Iwamoto, K., Ishiwata, M., Bundo, M., Kasahara, T., Kusumi, I., Tsujita, T., Okazaki, Y., Nanko, S., Kunugi, H., Sasaki, T. and Kato, T., 2003. Impaired feedback regulation of XBP1 as a genetic risk factor for bipolar disorder. Nat Genet. 35, 171-5. Kaltschmidt, B. and Kaltschmidt, C., 2009. NF-kappaB in the nervous system. Cold Spring Harb Perspect Biol. 1, a001271. Kaminsky, E.B., Kaul, V., Paschall, J., Church, D.M., Bunke, B., Kunig, D., Moreno-De-Luca, D., Moreno-De-Luca, A., Mulle, J.G., Warren, S.T., Richard, G., Compton, J.G., Fuller, A.E., Gliem, T.J., Huang, S., Collinson, M.N., Beal, S.J., Ackley, T., Pickering, D.L., Golden, D.M., Aston, E., Whitby, H., Shetty, S., Rossi, M.R., Rudd, M.K., South, S.T., Brothman, A.R., Sanger, W.G., Iyer, R.K., Crolla, J.A., Thorland, E.C., Aradhya, S., Ledbetter, D.H. and Martin, C.L., 2011. An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genet Med. 13, 777-84. Kaufman, R.J., Scheuner, D., Schroder, M., Shen, X., Lee, K., Liu, C.Y. and Arnold, S.M., 2002. The unfolded protein response in nutrient sensing and differentiation. Nat Rev Mol Cell Biol. 3, 411-21. Kazeminasab, S., Esmaeilzadeh-Gharehdaghi, E., Oladnabi, M., Ohadi, M., Mirabzadeh, A. and Hosseinkhani, S., 2012. Aberrant expression of Activating Transcription Factor 6 (ATF6) in major psychiatric disorders. Psychiatry Res. 200, 1086-7. Kearney, H.M., Thorland, E.C., Brown, K.K., Quintero-Rivera, F., South, S.T. and Working Group of the American College of Medical Genetics Laboratory Quality Assurance, C., 2011. American College of Medical Genetics standards and guidelines for interpretation and reporting of postnatal constitutional copy number variants. Genet Med. 13, 680-5.

170

Keeney, J.G., Davis, J.M., Siegenthaler, J., Post, M.D., Nielsen, B.S., Hopkins, W.D. and Sikela, J.M., 2014a. DUF1220 protein domains drive proliferation in human neural stem cells and are associated with increased cortical volume in anthropoid primates. Brain Struct Funct. Keeney, J.G., Dumas, L. and Sikela, J.M., 2014b. The case for DUF1220 domain dosage as a primary contributor to anthropoid brain expansion. Front Hum Neurosci. 8, 427. King, R.D. and Lu, C., 2014. An investigation into eukaryotic pseudouridine synthases. J Bioinform Comput Biol. 12, 1450015. Kiss, A.M., Jady, B.E., Darzacq, X., Verheggen, C., Bertrand, E. and Kiss, T., 2002. A Cajal body-specific pseudouridylation guide RNA is composed of two box H/ACA snoRNA- like domains. Nucleic Acids Res. 30, 4643-9. Kitsberg, D., Selig, S., Brandeis, M., Simon, I., Keshet, I., Driscoll, D.J., Nicholls, R.D. and Cedar, H., 1993. Allele-specific replication timing of imprinted gene regions. Nature. 364, 459-63. Klopocki, E., Schulze, H., Strauss, G., Ott, C.E., Hall, J., Trotier, F., Fleischhauer, S., Greenhalgh, L., Newbury-Ecob, R.A., Neumann, L.M., Habenicht, R., Konig, R., Seemanova, E., Megarbane, A., Ropers, H.H., Ullmann, R., Horn, D. and Mundlos, S., 2007. Complex inheritance pattern resembling autosomal recessive inheritance involving a microdeletion in thrombocytopenia-absent radius syndrome. Am J Hum Genet. 80, 232- 40. Knoll, J.H., Cheng, S.D. and Lalande, M., 1994. Allele specificity of DNA replication timing in the Angelman/Prader-Willi syndrome imprinted chromosomal region. Nat Genet. 6, 41-6. Kohl, S., Zobor, D., Chiang, W.C., Weisschuh, N., Staller, J., Gonzalez Menendez, I., Chang, S., Beck, S.C., Garcia Garrido, M., Sothilingam, V., Seeliger, M.W., Stanzial, F., Benedicenti, F., Inzana, F., Heon, E., Vincent, A., Beis, J., Strom, T.M., Rudolph, G., Roosing, S., Hollander, A.I., Cremers, F.P., Lopez, I., Ren, H., Moore, A.T., Webster, A.R., Michaelides, M., Koenekoop, R.K., Zrenner, E., Kaufman, R.J., Tsang, S.H., Wissinger, B. and Lin, J.H., 2015. Mutations in the unfolded protein response regulator ATF6 cause the cone dysfunction disorder achromatopsia. Nat Genet. 47, 757-65. Koolen, D.A., Kramer, J.M., Neveling, K., Nillesen, W.M., Moore-Barton, H.L., Elmslie, F.V., Toutain, A., Amiel, J., Malan, V., Tsai, A.C., Cheung, S.W., Gilissen, C., Verwiel, E.T., Martens, S., Feuth, T., Bongers, E.M., de Vries, P., Scheffer, H., Vissers, L.E., de Brouwer, A.P., Brunner, H.G., Veltman, J.A., Schenck, A., Yntema, H.G. and de Vries, B.B., 2012. Mutations in the chromatin modifier gene KANSL1 cause the 17q21.31 microdeletion syndrome. Nat Genet. 44, 639-41. Koolen, D.A., Pfundt, R., Linda, K., Beunders, G., Veenstra-Knol, H.E., Conta, J.H., Fortuna, A.M., Gillessen-Kaesbach, G., Dugan, S., Halbach, S., Abdul-Rahman, O.A., Winesett, H.M., Chung, W.K., Dalton, M., Dimova, P.S., Mattina, T., Prescott, K., Zhang, H.Z., Saal, H.M., Hehir-Kwa, J.Y., Willemsen, M.H., Ockeloen, C.W., Jongmans, M.C., Van der Aa, N., Failla, P., Barone, C., Avola, E., Brooks, A.S., Kant, S.G., Gerkes, E.H., Firth, H.V., Ounap, K., Bird, L.M., Masser-Frye, D., Friedman, J.R., Sokunbi, M.A., Dixit, A., Splitt, M., Study, D.D.D., Kukolich, M.K., McGaughran, J., Coe, B.P., Florez, J., Nadif Kasri, N., Brunner, H.G., Thompson, E.M., Gecz, J., Romano, C., Eichler, E.E. and de Vries, B.B., 2015. The Koolen-de Vries syndrome: a phenotypic comparison of patients with a 17q21.31 microdeletion versus a KANSL1 sequence variant. Eur J Hum Genet. 171

Koolen, D.A., Sharp, A.J., Hurst, J.A., Firth, H.V., Knight, S.J., Goldenberg, A., Saugier-Veber, P., Pfundt, R., Vissers, L.E., Destree, A., Grisart, B., Rooms, L., Van der Aa, N., Field, M., Hackett, A., Bell, K., Nowaczyk, M.J., Mancini, G.M., Poddighe, P.J., Schwartz, C.E., Rossi, E., De Gregori, M., Antonacci-Fulton, L.L., McLellan, M.D., 2nd, Garrett, J.M., Wiechert, M.A., Miner, T.L., Crosby, S., Ciccone, R., Willatt, L., Rauch, A., Zenker, M., Aradhya, S., Manning, M.A., Strom, T.M., Wagenstaller, J., Krepischi- Santos, A.C., Vianna-Morgante, A.M., Rosenberg, C., Price, S.M., Stewart, H., Shaw- Smith, C., Brunner, H.G., Wilkie, A.O., Veltman, J.A., Zuffardi, O., Eichler, E.E. and de Vries, B.B., 2008. Clinical and molecular delineation of the 17q21.31 microdeletion syndrome. J Med Genet. 45, 710-20. Koolen, D.A., Vissers, L.E., Pfundt, R., de Leeuw, N., Knight, S.J., Regan, R., Kooy, R.F., Reyniers, E., Romano, C., Fichera, M., Schinzel, A., Baumer, A., Anderlid, B.M., Schoumans, J., Knoers, N.V., van Kessel, A.G., Sistermans, E.A., Veltman, J.A., Brunner, H.G. and de Vries, B.B., 2006. A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism. Nat Genet. 38, 999-1001. Koren, A., Handsaker, R.E., Kamitaki, N., Karlic, R., Ghosh, S., Polak, P., Eggan, K. and McCarroll, S.A., 2014. Genetic variation in human DNA replication timing. Cell. 159, 1015-26. Kowalski, J.R. and Juo, P., 2012. The role of deubiquitinating enzymes in synaptic function and nervous system diseases. Neural Plast. 2012, 892749. Krause, C., Rosewich, H., Woehler, A. and Gartner, J., 2013. Functional analysis of PEX13 mutation in a Zellweger syndrome spectrum patient reveals novel homooligomerization of PEX13 and its role in human biogenesis. Hum Mol Genet. 22, 3844-57. Krishnakumar, R. and Kraus, W.L., 2010. The PARP side of the nucleus: molecular actions, physiological outcomes, and clinical targets. Mol Cell. 39, 8-24. Kryukov, G.V., Pennacchio, L.A. and Sunyaev, S.R., 2007. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 80, 727-39. Kyriakopoulou, C.B., Nordvarg, H. and Virtanen, A., 2001. A novel nuclear human poly(A) polymerase (PAP), PAP gamma. J Biol Chem. 276, 33504-11. Lander, E.S. and Botstein, D., 1987. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 236, 1567-70. Lawson, H.A., Cheverud, J.M. and Wolf, J.B., 2013. Genomic imprinting and parent-of-origin effects on complex traits. Nat Rev Genet. 14, 609-17. Ledbetter, D.H., Mascarello, J.T., Riccardi, V.M., Harper, V.D., Airhart, S.D. and Strobel, R.J., 1982. Chromosome 15 abnormalities and the Prader-Willi syndrome: a follow-up report of 40 cases. Am J Hum Genet. 34, 278-85. Lee, A.H., Iwakoshi, N.N. and Glimcher, L.H., 2003. XBP-1 regulates a subset of endoplasmic reticulum resident chaperone genes in the unfolded protein response. Mol Cell Biol. 23, 7448-59. Lee, C. and Scherer, S.W., 2010. The clinical context of copy number variation in the human genome. Expert Rev Mol Med. 12, e8. Lee, D.Y. and Sugden, B., 2008. The LMP1 oncogene of EBV activates PERK and the unfolded protein response to drive its own synthesis. Blood. 111, 2280-9. Lejeune, J., Gautier, M. and Turpin, R., 1959. [Study of somatic chromosomes from 9 mongoloid children]. C R Hebd Seances Acad Sci. 248, 1721-2. 172

Lejeune, J., Lafourcade, J., Berger, R., Vialatta, J., Boeswillwald, M., Seringe, P., Turpin, R., 1963. Trois ca de deletion partielle du bras court d'un chromosome 5. . C. R. Acad. Sci (Paris). 257. Lese, C.M. and Ledbetter, D.H., 2001. Molecular cytogenetic analysis of telomere rearrangements. Curr Protoc Hum Genet. Chapter 8, Unit 8 11. Lettice, L.A., Daniels, S., Sweeney, E., Venkataraman, S., Devenney, P.S., Gautier, P., Morrison, H., Fantes, J., Hill, R.E. and FitzPatrick, D.R., 2011. Enhancer-adoption as a mechanism of human developmental disease. Hum Mutat. 32, 1492-9. Liang, J.S., Shimojima, K., Ohno, K., Sugiura, C., Une, Y., Ohno, K. and Yamamoto, T., 2009. A newly recognised microdeletion syndrome of 2p15-16.1 manifesting moderate developmental delay, autistic behaviour, short stature, microcephaly, and dysmorphic features: a new patient with 3.2 Mb deletion. J Med Genet. 46, 645-7. Lin, E., Balogh, R., Isaacs, B., Ouellette-Kuntz, H., Selick, A., Wilton, A.S., Cobigo, V. and Lunsky, Y., 2014. Strengths and Limitations of Health and Disability Support Administrative Databases for Population-Based Health Research in Intellectual and Developmental Disabilities. Journal of Policy and Practice in Intellectual Disabilities. 11, 235-244. Liu, X., Malenfant, P., Reesor, C., Lee, A., Hudson, M.L., Harvard, C., Qiao, Y., Persico, A.M., Cohen, I.L., Chudley, A.E., Forster-Gibson, C., Rajcan-Separovic, E., Lewis, M.E. and Holden, J.J., 2011. 2p15-p16.1 microdeletion syndrome: molecular characterization and association of the OTX1 and XPO1 genes with autism spectrum disorders. Eur J Hum Genet. 19, 1264-70. Liu, Y., Bjorkman, J., Urquhart, A., Wanders, R.J., Crane, D.I. and Gould, S.J., 1999. PEX13 is mutated in complementation group 13 of the peroxisome-biogenesis disorders. Am J Hum Genet. 65, 621-34. Livak, K.J. and Schmittgen, T.D., 2001. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods. 25, 402-8. Luckasson, R., Borthwick-Duffy, S., Buntix, W., Coulter, D., Craig, E., Reeve, A. and al., e., 2002. Mental Retardation: Definition, Classification, and Systems of Supports. (10th ed.), Washington, DC. Lui, T.T., Lacroix, C., Ahmed, S.M., Goldenberg, S.J., Leach, C.A., Daulat, A.M. and Angers, S., 2011. The ubiquitin-specific protease USP34 regulates axin stability and Wnt/beta- catenin signaling. Mol Cell Biol. 31, 2053-65. Luo, R., Sanders, S.J., Tian, Y., Voineagu, I., Huang, N., Chu, S.H., Klei, L., Cai, C., Ou, J., Lowe, J.K., Hurles, M.E., Devlin, B., State, M.W. and Geschwind, D.H., 2012. Genome- wide profiling reveals the functional impact of rare de novo and recurrent CNVs in autism spectrum disorders. Am J Hum Genet. 91, 38-55. MacDonald, J.R., Ziman, R., Yuen, R.K., Feuk, L. and Scherer, S.W., 2014. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42, D986-92. Maine, E.M., 2008. Studying gene function in Caenorhabditis elegans using RNA-mediated interference. Brief Funct Genomic Proteomic. 7, 184-94. Maiuolo, J., Bulotta, S., Verderio, C., Benfante, R. and Borgese, N., 2011. Selective activation of the transcription factor ATF6 mediates endoplasmic reticulum proliferation triggered by a membrane protein. Proc Natl Acad Sci U S A. 108, 7832-7.

173

Mardis, E.R., 2008. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 9, 387-402. Matharu, N. and Ahituv, N., 2015. Minor Loops in Major Folds: Enhancer-Promoter Looping, Chromatin Restructuring, and Their Association with Transcriptional Regulation and Disease. PLoS Genet. 11, e1005640. Maulik, P.K., Mascarenhas, M.N., Mathers, C.D., Dua, T. and Saxena, S., 2011. Prevalence of intellectual disability: a meta-analysis of population-based studies. Res Dev Disabil. 32, 419-36. Maxwell, M., Bjorkman, J., Nguyen, T., Sharp, P., Finnie, J., Paterson, C., Tonks, I., Paton, B.C., Kay, G.F. and Crane, D.I., 2003. Pex13 inactivation in the mouse disrupts peroxisome biogenesis and leads to a Zellweger syndrome phenotype. Mol Cell Biol. 23, 5947-57. McDonald-McGinn, D.M., Fahiminiya, S., Revil, T., Nowakowska, B.A., Suhl, J., Bailey, A., Mlynarski, E., Lynch, D.R., Yan, A.C., Bilaniuk, L.T., Sullivan, K.E., Warren, S.T., Emanuel, B.S., Vermeesch, J.R., Zackai, E.H. and Jerome-Majewska, L.A., 2013. Hemizygous mutations in SNAP29 unmask autosomal recessive conditions and contribute to atypical findings in patients with 22q11.2DS. J Med Genet. 50, 80-90. McDonald-McGinn, D.M., Tonnesen, M.K., Laufer-Cahana, A., Finucane, B., Driscoll, D.A., Emanuel, B.S. and Zackai, E.H., 2001. Phenotype of the 22q11.2 deletion in individuals identified through an affected relative: cast a wide FISHing net! Genet Med. 3, 23-9. Mefford, H.C., Sharp, A.J., Baker, C., Itsara, A., Jiang, Z., Buysse, K., Huang, S., Maloney, V.K., Crolla, J.A., Baralle, D., Collins, A., Mercer, C., Norga, K., de Ravel, T., Devriendt, K., Bongers, E.M., de Leeuw, N., Reardon, W., Gimelli, S., Bena, F., Hennekam, R.C., Male, A., Gaunt, L., Clayton-Smith, J., Simonic, I., Park, S.M., Mehta, S.G., Nik-Zainal, S., Woods, C.G., Firth, H.V., Parkin, G., Fichera, M., Reitano, S., Lo Giudice, M., Li, K.E., Casuga, I., Broomer, A., Conrad, B., Schwerzmann, M., Raber, L., Gallati, S., Striano, P., Coppola, A., Tolmie, J.L., Tobias, E.S., Lilley, C., Armengol, L., Spysschaert, Y., Verloo, P., De Coene, A., Goossens, L., Mortier, G., Speleman, F., van Binsbergen, E., Nelen, M.R., Hochstenbach, R., Poot, M., Gallagher, L., Gill, M., McClellan, J., King, M.C., Regan, R., Skinner, C., Stevenson, R.E., Antonarakis, S.E., Chen, C., Estivill, X., Menten, B., Gimelli, G., Gribble, S., Schwartz, S., Sutcliffe, J.S., Walsh, T., Knight, S.J., Sebat, J., Romano, C., Schwartz, C.E., Veltman, J.A., de Vries, B.B., Vermeesch, J.R., Barber, J.C., Willatt, L., Tassabehji, M. and Eichler, E.E., 2008. Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes. N Engl J Med. 359, 1685-99. Mehta, D., Iwamoto, K., Ueda, J., Bundo, M., Adati, N., Kojima, T. and Kato, T., 2014. Comprehensive survey of CNVs influencing gene expression in the human brain and its implications for pathophysiology. Neurosci Res. 79, 22-33. Menzel, S., Garner, C., Gut, I., Matsuda, F., Yamaguchi, M., Heath, S., Foglio, M., Zelenika, D., Boland, A., Rooks, H., Best, S., Spector, T.D., Farrall, M., Lathrop, M. and Thein, S.L., 2007. A QTL influencing F cell production maps to a gene encoding a zinc-finger protein on chromosome 2p15. Nat Genet. 39, 1197-9. Merla, G., Howald, C., Henrichsen, C.N., Lyle, R., Wyss, C., Zabot, M.T., Antonarakis, S.E. and Reymond, A., 2006. Submicroscopic deletion in patients with Williams-Beuren syndrome influences expression levels of the nonhemizygous flanking genes. Am J Hum Genet. 79, 332-41.

174

Metcalfe, A., Hippman, C., Pastuck, M. and Johnson, J.A., 2014. Beyond Trisomy 21: Additional Chromosomal Anomalies Detected through Routine Aneuploidy Screening. J Clin Med. 3, 388-415. Metzker, M.L., 2010. Sequencing technologies - the next generation. Nat Rev Genet. 11, 31-46. Miller, D.T., Adam, M.P., Aradhya, S., Biesecker, L.G., Brothman, A.R., Carter, N.P., Church, D.M., Crolla, J.A., Eichler, E.E., Epstein, C.J., Faucett, W.A., Feuk, L., Friedman, J.M., Hamosh, A., Jackson, L., Kaminsky, E.B., Kok, K., Krantz, I.D., Kuhn, R.M., Lee, C., Ostell, J.M., Rosenberg, C., Scherer, S.W., Spinner, N.B., Stavropoulos, D.J., Tepperberg, J.H., Thorland, E.C., Vermeesch, J.R., Waggoner, D.J., Watson, M.S., Martin, C.L. and Ledbetter, D.H., 2010. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 86, 749-64. MIM 190685. DOWN SYNDROME. Johns Hopkins University, Baltimore, MD. MIM 610443. KOOLEN-DE VRIES SYNDROME; KDVS. Johns Hopkins University, Baltimore, MD. MIM 612513. CHROMOSOME 2p16.1-p15 DELETION SYNDROME. Johns Hopkins University, Baltimore, MD. Minamino, T., Komuro, I. and Kitakaze, M., 2010. Endoplasmic reticulum stress as a therapeutic target in cardiovascular disease. Circ Res. 107, 1071-82. Momoi, T., Fujita, E., Senoo, H. and Momoi, M., 2010. Genetic factors and epigenetic factors for autism: endoplasmic reticulum stress and impaired synaptic function. Cell Biol Int. 34, 13-9. Moreno-Igoa, M., Hernandez-Charro, B., Bengoa-Alonso, A., Perez-Juana-Del-Casal, A., Romero-Ibarra, C., Nieva-Echebarria, B. and Ramos-Arroyo, M.A., 2015. KANSL1 gene disruption associated with the full clinical spectrum of 17q21.31 microdeletion syndrome. BMC Med Genet. 16, 68. Morison, I.M., Ramsay, J.P. and Spencer, H.G., 2005. A census of mammalian imprinting. Trends Genet. 21, 457-65. Mu, J.J., Wang, Y., Luo, H., Leng, M., Zhang, J., Yang, T., Besusso, D., Jung, S.Y. and Qin, J., 2007. A proteomic analysis of telangiectasia-mutated (ATM)/ATM-Rad3-related (ATR) substrates identifies the ubiquitin-proteasome system as a regulator for DNA damage checkpoints. J Biol Chem. 282, 17330-4. Muller, C.C., Nguyen, T.H., Ahlemeyer, B., Meshram, M., Santrampurwala, N., Cao, S., Sharp, P., Fietz, P.B., Baumgart-Vogt, E. and Crane, D.I., 2011. PEX13 deficiency in mouse brain as a model of Zellweger syndrome: abnormal cerebellum formation, reactive gliosis and oxidative stress. Dis Model Mech. 4, 104-19. Musante, L. and Ropers, H.H., 2014. Genetics of recessive cognitive disorders. Trends Genet. 30, 32-9. Nagase, T., Nakayama, M., Nakajima, D., Kikuno, R. and Ohara, O., 2001. Prediction of the coding sequences of unidentified human genes. XX. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res. 8, 85-95. Nagler, A., Cytron, S., Mashevich, M., Korenstein-Ilan, A. and Avivi, L., 2010. The aberrant asynchronous replication - characterizing lymphocytes of cancer patients - is erased following stem cell transplantation. BMC Cancer. 10, 230. Nakabayashi, K., Trujillo, A.M., Tayama, C., Camprubi, C., Yoshida, W., Lapunzina, P., Sanchez, A., Soejima, H., Aburatani, H., Nagae, G., Ogata, T., Hata, K. and Monk, D., 175

2011. Methylation screening of reciprocal genome-wide UPDs identifies novel human- specific imprinted genes. Hum Mol Genet. 20, 3188-97. Ng, S.B., Buckingham, K.J., Lee, C., Bigham, A.W., Tabor, H.K., Dent, K.M., Huff, C.D., Shannon, P.T., Jabs, E.W., Nickerson, D.A., Shendure, J. and Bamshad, M.J., 2010. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 42, 30-5. Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Bigham, A.W., Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E.E., Bamshad, M., Nickerson, D.A. and Shendure, J., 2009. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. Nogami, M., Kohda, A., Taguchi, H., Nakao, M., Ikemura, T. and Okumura, K., 2000. Relative locations of the centromere and imprinted SNRPN gene within chromosome 15 territories during the cell cycle in HL60 cells. J Cell Sci. 113 ( Pt 12), 2157-65. O'Bleness, M., Searles, V.B., Dickens, C.M., Astling, D., Albracht, D., Mak, A.C., Lai, Y.Y., Lin, C., Chu, C., Graves, T., Kwok, P.Y., Wilson, R.K. and Sikela, J.M., 2014. Finished sequence and assembly of the DUF1220-rich 1q21 region using a haploid human genome. BMC Genomics. 15, 387. O'Bleness, M.S., Dickens, C.M., Dumas, L.J., Kehrer-Sawatzki, H., Wyckoff, G.J. and Sikela, J.M., 2012. Evolutionary history and genome organization of DUF1220 protein domains. G3 (Bethesda). 2, 977-86. O'Roak, B.J., Deriziotis, P., Lee, C., Vives, L., Schwartz, J.J., Girirajan, S., Karakoc, E., Mackenzie, A.P., Ng, S.B., Baker, C., Rieder, M.J., Nickerson, D.A., Bernier, R., Fisher, S.E., Shendure, J. and Eichler, E.E., 2011. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet. 43, 585-9. O'Roak, B.J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B.P., Levy, R., Ko, A., Lee, C., Smith, J.D., Turner, E.H., Stanaway, I.B., Vernot, B., Malig, M., Baker, C., Reilly, B., Akey, J.M., Borenstein, E., Rieder, M.J., Nickerson, D.A., Bernier, R., Shendure, J. and Eichler, E.E., 2012. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 485, 246-50. Ofengand, J., Bakin, A., Wrzesinski, J., Nurse, K. and Lane, B.G., 1995. The pseudouridine residues of ribosomal RNA. Biochem Cell Biol. 73, 915-24. Ofengand, J., Malhotra, A., Remme, J., Gutgsell, N.S., Del Campo, M., Jean-Charles, S., Peil, L. and Kaya, Y., 2001. Pseudouridines and pseudouridine synthases of the ribosome. Cold Spring Harb Symp Quant Biol. 66, 147-59. Ouellette-Kuntz, H., Shooshtari, S., Balogh, R. and Martens, P., 2015. Understanding Information About Mortality Among People with Intellectual and Developmental Disabilities in Canada. J Appl Res Intellect Disabil. 28, 423-35. Panaretou, B., Siligardi, G., Meyer, P., Maloney, A., Sullivan, J.K., Singh, S., Millson, S.H., Clarke, P.A., Naaby-Hansen, S., Stein, R., Cramer, R., Mollapour, M., Workman, P., Piper, P.W., Pearl, L.H. and Prodromou, C., 2002. Activation of the ATPase activity of hsp90 by the stress-regulated cochaperone aha1. Mol Cell. 10, 1307-18. Pereira, S.G. and Oakley, F., 2008. Nuclear factor-kappaB1: regulation and function. Int J Biochem Cell Biol. 40, 1425-30. Peter, B., Matsushita, M., Oda, K. and Raskind, W., 2014. De novo microdeletion of BCL11A is associated with severe speech sound disorder. Am J Med Genet A. 164A, 2091-6.

176

Pfeiffer, R.A., 1980. Langer-Giedion syndrome and additional congenital malformations with interstitial deletion of the long arm of 46, XY, del 8 (q 13-22). Clin Genet. 18, 142-6. Philippe, O., Rio, M., Carioux, A., Plaza, J.M., Guigue, P., Molinari, F., Boddaert, N., Bole- Feysot, C., Nitschke, P., Smahi, A., Munnich, A. and Colleaux, L., 2009. Combination of linkage mapping and microarray-expression analysis identifies NF-kappaB signaling defect as a cause of autosomal-recessive mental retardation. Am J Hum Genet. 85, 903-8. Piccione, M., Piro, E., Serraino, F., Cavani, S., Ciccone, R., Malacarne, M., Pierluigi, M., Vitaloni, M., Zuffardi, O. and Corsello, G., 2012. Interstitial deletion of chromosome 2p15-16.1: report of two patients and critical review of current genotype-phenotype correlation. Eur J Med Genet. 55, 238-44. Poalas, K., Hatchi, E.M., Cordeiro, N., Dubois, S.M., Leclair, H.M., Leveau, C., Alexia, C., Gavard, J., Vazquez, A. and Bidere, N., 2013. Negative regulation of NF-kappaB signaling in T lymphocytes by the ubiquitin-specific protease USP34. Cell Commun Signal. 11, 25. Poduri, A., Evrony, G.D., Cai, X. and Walsh, C.A., 2013. Somatic mutation, genomic variation, and neurological disease. Science. 341, 1237758. Pollack, J.R., Perou, C.M., Alizadeh, A.A., Eisen, M.B., Pergamenschikov, A., Williams, C.F., Jeffrey, S.S., Botstein, D. and Brown, P.O., 1999. Genome-wide analysis of DNA copy- number changes using cDNA microarrays. Nat Genet. 23, 41-6. Popesco, M.C., Maclaren, E.J., Hopkins, J., Dumas, L., Cox, M., Meltesen, L., McGavran, L., Wyckoff, G.J. and Sikela, J.M., 2006. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science. 313, 1304-7. Prontera, P., Bernardini, L., Stangoni, G., Capalbo, A., Rogaia, D., Romani, R., Ardisia, C., Dallapiccola, B. and Donti, E., 2011. Deletion 2p15-16.1 syndrome: case report and review. Am J Med Genet A. 155A, 2473-8. Qiao, Y., Badduke, C., Mercier, E., Lewis, S.M., Pavlidis, P. and Rajcan-Separovic, E., 2013. miRNA and miRNA target genes in copy number variations occurring in individuals with intellectual disability. BMC Genomics. 14, 544. Qiao, Y., Harvard, C., Riendeau, N., Fawcett, C., Liu, X., Holden, J.J., Lewis, M.E. and Rajcan- Separovic, E., 2008. Putatively benign copy number variants in subjects with idiopathic autism spectrum disorder and/or intellectual disability. Cytogenet Genome Res. 123, 79- 87. Qiao, Y., Harvard, C., Tyson, C., Liu, X., Fawcett, C., Pavlidis, P., Holden, J.J., Lewis, M.E. and Rajcan-Separovic, E., 2010. Outcome of array CGH analysis for 255 subjects with intellectual disability and search for candidate genes using bioinformatics. Hum Genet. 128, 179-94. Qiao, Y., Mercier, E., Dastan, J., Hurlburt, J., McGillivray, B., Chudley, A., Farrell, S., Lewis, S., Pavlidis, P. and Rajcan-Separovic, E., 2014. Copy Number Variant analysis in a deeply phenotyped cohort of individuals with Intellectual Disability. BMC Medical Genetics. In submission. Qiao, Y., Tyson, C., Hrynchak, M., Lopez-Rangel, E., Hildebrand, J., Martell, S., Fawcett, C., Kasmara, L., Calli, K., Harvard, C., Liu, X., Holden, J.J., Lewis, S.M. and Rajcan- Separovic, E., 2012. Clinical application of 2.7M Cytogenetics array for CNV detection in subjects with idiopathic autism and/or intellectual disability. Clin Genet. 83, 145-54.

177

Quesada, V., Diaz-Perales, A., Gutierrez-Fernandez, A., Garabaya, C., Cal, S. and Lopez-Otin, C., 2004. Cloning and enzymatic analysis of 22 novel human ubiquitin-specific proteases. Biochem Biophys Res Commun. 314, 54-62. R Core Team., 2015. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Rahim, R.S., Meedeniya, A.C. and Crane, D.I., 2014. Central serotonergic neuron deficiency in a mouse model of Zellweger syndrome. Neuroscience. 274, 229-41. Rajcan-Separovic, E., Barcelo, J.M. and Korneluk, R.G., 1998. Fluorescence in situ hybridization analysis of the replication properties of the myotonic dystrophy protein kinase (DMPK) gene region. Cytogenet Cell Genet. 82, 247-50. Rajcan-Separovic, E., Harvard, C., Liu, X., McGillivray, B., Hall, J.G., Qiao, Y., Hurlburt, J., Hildebrand, J., Mickelson, E.C., Holden, J.J. and Lewis, M.E., 2007. Clinical and molecular cytogenetic characterisation of a newly recognised microdeletion syndrome involving 2p15-16.1. J Med Genet. 44, 269-76. Rao, S., Gerondakis, S., Woltring, D. and Shannon, M.F., 2003. c-Rel is required for chromatin remodeling across the IL-2 gene promoter. J Immunol. 170, 3724-31. Rauch, A., Wieczorek, D., Graf, E., Wieland, T., Endele, S., Schwarzmayr, T., Albrecht, B., Bartholdi, D., Beygo, J., Di Donato, N., Dufke, A., Cremer, K., Hempel, M., Horn, D., Hoyer, J., Joset, P., Ropke, A., Moog, U., Riess, A., Thiel, C.T., Tzschach, A., Wiesener, A., Wohlleber, E., Zweier, C., Ekici, A.B., Zink, A.M., Rump, A., Meisinger, C., Grallert, H., Sticht, H., Schenck, A., Engels, H., Rappold, G., Schrock, E., Wieacker, P., Riess, O., Meitinger, T., Reis, A. and Strom, T.M., 2012. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet. 380, 1674-82. Ravnan, J.B., Tepperberg, J.H., Papenhausen, P., Lamb, A.N., Hedrick, J., Eash, D., Ledbetter, D.H. and Martin, C.L., 2006. Subtelomere FISH analysis of 11 688 cases: an evaluation of the frequency and pattern of subtelomere rearrangements in individuals with developmental disabilities. J Med Genet. 43, 478-89. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W., Cho, E.K., Dallaire, S., Freeman, J.L., Gonzalez, J.R., Gratacos, M., Huang, J., Kalaitzopoulos, D., Komura, D., MacDonald, J.R., Marshall, C.R., Mei, R., Montgomery, L., Nishimura, K., Okamura, K., Shen, F., Somerville, M.J., Tchinda, J., Valsesia, A., Woodwark, C., Yang, F., Zhang, J., Zerjal, T., Zhang, J., Armengol, L., Conrad, D.F., Estivill, X., Tyler-Smith, C., Carter, N.P., Aburatani, H., Lee, C., Jones, K.W., Scherer, S.W. and Hurles, M.E., 2006. Global variation in copy number in the human genome. Nature. 444, 444-54. Reichow, S.L., Hamma, T., Ferre-D'Amare, A.R. and Varani, G., 2007. The structure and function of small nucleolar ribonucleoproteins. Nucleic Acids Res. 35, 1452-64. Reik, W. and Walter, J., 2001. Genomic imprinting: parental influence on the genome. Nat Rev Genet. 2, 21-32. Reyes-Turcu, F.E., Ventii, K.H. and Wilkinson, K.D., 2009. Regulation and cellular roles of ubiquitin-specific deubiquitinating enzymes. Annu Rev Biochem. 78, 363-97. Reymond, A., Henrichsen, C.N., Harewood, L. and Merla, G., 2007. Side effects of genome structural changes. Curr Opin Genet Dev. 17, 381-6. Rhind, N. and Gilbert, D.M., 2013. DNA replication timing. Cold Spring Harb Perspect Biol. 5, a010132. 178

Robinson, P.N., Krawitz, P. and Mundlos, S., 2011. Strategies for exome and genome sequence data analysis in disease-gene discovery projects. Clin Genet. 80, 127-32. Ropers, H.H., 2008. Genetics of intellectual disability. Curr Opin Genet Dev. 18, 241-50. Rosenfeld, J.A., Coe, B.P., Eichler, E.E., Cuckle, H. and Shaffer, L.G., 2013. Estimates of penetrance for recurrent pathogenic copy-number variations. Genet Med. 15, 478-81. Ryba, T., Hiratani, I., Lu, J., Itoh, M., Kulik, M., Zhang, J., Schulz, T.C., Robins, A.J., Dalton, S. and Gilbert, D.M., 2010. Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res. 20, 761-70. Samonte, R.V. and Eichler, E.E., 2002. Segmental duplications and the evolution of the primate genome. Nat Rev Genet. 3, 65-72. Sankaran, V.G., Menne, T.F., Xu, J., Akie, T.E., Lettre, G., Van Handel, B., Mikkola, H.K., Hirschhorn, J.N., Cantor, A.B. and Orkin, S.H., 2008. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science. 322, 1839-42. Sankaran, V.G., Xu, J. and Orkin, S.H., 2010. Transcriptional silencing of fetal hemoglobin by BCL11A. Ann N Y Acad Sci. 1202, 64-8. Sarnico, I., Branca, C., Lanzillotta, A., Porrini, V., Benarese, M., Spano, P.F. and Pizzi, M., 2012. NF-kappaB and epigenetic mechanisms as integrative regulators of brain resilience to anoxic stress. Brain Res. 1476, 203-10. Schinzel, A., 1988. Microdeletion syndromes, balanced translocations, and gene mapping. J Med Genet. 25, 454-62. Schizophrenia Working Group of the Psychiatric Genomics, C., 2014. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 511, 421-7. Schmitz, C., Kinge, P. and Hutter, H., 2007. Axon guidance genes identified in a large-scale RNAi screen using the RNAi-hypersensitive Caenorhabditis elegans strain nre-1(hd20) lin-15b(hd126). Proc Natl Acad Sci U S A. 104, 834-9. Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W. and Selbach, M., 2011. Global quantification of mammalian gene expression control. Nature. 473, 337-42. Schwartz, S., Bernstein, D.A., Mumbach, M.R., Jovanovic, M., Herbst, R.H., Leon-Ricardo, B.X., Engreitz, J.M., Guttman, M., Satija, R., Lander, E.S., Fink, G. and Regev, A., 2014. Transcriptome-wide mapping reveals widespread dynamic-regulated pseudouridylation of ncRNA and mRNA. Cell. 159, 148-62. Selig, S., Okumura, K., Ward, D.C. and Cedar, H., 1992. Delineation of DNA replication time zones by fluorescence in situ hybridization. Embo J. 11, 1217-25. Shaffer, L.G., 2001. Diagnosis of microdeletion syndromes by fluorescence in situ hybridization (FISH). Curr Protoc Hum Genet. Chapter 8, Unit 8 10. Shaffer, L.G., Bejjani, B.A., Torchia, B., Kirkpatrick, S., Coppinger, J. and Ballif, B.C., 2007a. The identification of microdeletion syndromes and other chromosome abnormalities: cytogenetic methods of the past, new technologies for the future. Am J Med Genet C Semin Med Genet. 145C, 335-45. Shaffer, L.G., Theisen, A., Bejjani, B.A., Ballif, B.C., Aylsworth, A.S., Lim, C., McDonald, M., Ellison, J.W., Kostiner, D., Saitta, S. and Shaikh, T., 2007b. The discovery of microdeletion syndromes in the post-genomic era: review of the methodology and characterization of a new 1q41q42 microdeletion syndrome. Genet Med. 9, 607-16. 179

Shao, L., Shaw, C.A., Lu, X.Y., Sahoo, T., Bacino, C.A., Lalani, S.R., Stankiewicz, P., Yatsenko, S.A., Li, Y., Neill, S., Pursley, A.N., Chinault, A.C., Patel, A., Beaudet, A.L., Lupski, J.R. and Cheung, S.W., 2008. Identification of chromosome abnormalities in subtelomeric regions by microarray analysis: a study of 5,380 cases. Am J Med Genet A. 146A, 2242-51. Sharp, A.J., Hansen, S., Selzer, R.R., Cheng, Z., Regan, R., Hurst, J.A., Stewart, H., Price, S.M., Blair, E., Hennekam, R.C., Fitzpatrick, C.A., Segraves, R., Richmond, T.A., Guiver, C., Albertson, D.G., Pinkel, D., Eis, P.S., Schwartz, S., Knight, S.J. and Eichler, E.E., 2006. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat Genet. 38, 1038-42. Sharp, A.J., Mefford, H.C., Li, K., Baker, C., Skinner, C., Stevenson, R.E., Schroer, R.J., Novara, F., De Gregori, M., Ciccone, R., Broomer, A., Casuga, I., Wang, Y., Xiao, C., Barbacioru, C., Gimelli, G., Bernardina, B.D., Torniero, C., Giorda, R., Regan, R., Murday, V., Mansour, S., Fichera, M., Castiglia, L., Failla, P., Ventura, M., Jiang, Z., Cooper, G.M., Knight, S.J., Romano, C., Zuffardi, O., Chen, C., Schwartz, C.E. and Eichler, E.E., 2008. A recurrent 15q13.3 microdeletion syndrome associated with mental retardation and seizures. Nat Genet. 40, 322-8. Sheffield, N.C., Thurman, R.E., Song, L., Safi, A., Stamatoyannopoulos, J.A., Lenhard, B., Crawford, G.E. and Furey, T.S., 2013. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res. 23, 777-88. Shendure, J. and Ji, H., 2008. Next-generation DNA sequencing. Nat Biotechnol. 26, 1135-45. Simon, I., Tenzen, T., Reubinoff, B.E., Hillman, D., McCarrey, J.R. and Cedar, H., 1999. Asynchronous replication of imprinted genes is established in the gametes and maintained during development. Nature. 401, 929-32. Slavotinek, A.M., 2008. Novel microdeletion syndromes detected by chromosome microarrays. Hum Genet. 124, 1-17. Smyth, G.K., 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 3, Article3. Snijders, A.M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., Hamilton, G., Hindle, A.K., Huey, B., Kimura, K., Law, S., Myambo, K., Palmer, J., Ylstra, B., Yue, J.P., Gray, J.W., Jain, A.N., Pinkel, D. and Albertson, D.G., 2001. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet. 29, 263-4. So, J., Warsh, J.J. and Li, P.P., 2007. Impaired endoplasmic reticulum stress response in B- lymphoblasts from patients with bipolar-I disorder. Biol Psychiatry. 62, 141-7. Spielmann, M. and Klopocki, E., 2013. CNVs of noncoding cis-regulatory elements in human disease. Curr Opin Genet Dev. 23, 249-56. Stankiewicz, P. and Lupski, J.R., 2002. Genome architecture, rearrangements and genomic disorders. Trends Genet. 18, 74-82. Stankiewicz, P. and Lupski, J.R., 2010. Structural variation in the human genome and its role in disease. Annu Rev Med. 61, 437-55. Statistics Canada., 2013. Disability in Canada: Initial findings from the Canadian Survey on Disability. Statistics Canada. Strachan, T., Read, A.P. and Strachan, T., 2011. Human molecular genetics, 4th ed. Garland Science, New York.

180

Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Koller, D., Montgomery, S., Tavare, S., Deloukas, P. and Dermitzakis, E.T., 2007. Population genomics of human gene expression. Nat Genet. 39, 1217-24. Suzuki, Y., Shimozawa, N., Imamura, A., Fukuda, S., Zhang, Z., Orii, T. and Kondo, N., 2001. Clinical, biochemical and genetic aspects and neuronal migration in peroxisome biogenesis disorders. J Inherit Metab Dis. 24, 151-65. Taft, R.J., Vanderver, A., Leventer, R.J., Damiani, S.A., Simons, C., Grimmond, S.M., Miller, D., Schmidt, J., Lockhart, P.J., Pope, K., Ru, K., Crawford, J., Rosser, T., de Coo, I.F., Juneja, M., Verma, I.C., Prabhakar, P., Blaser, S., Raiman, J., Pouwels, P.J., Bevova, M.R., Abbink, T.E., van der Knaap, M.S. and Wolf, N.I., 2013. Mutations in DARS cause hypomyelination with brain stem and spinal cord involvement and leg spasticity. Am J Hum Genet. 92, 774-80. Thomas, F. and Kutay, U., 2003. Biogenesis and nuclear export of ribosomal subunits in higher eukaryotes depend on the CRM1 export pathway. J Cell Sci. 116, 2409-19. Topper, S., Ober, C. and Das, S., 2011. Exome sequencing and the genetics of intellectual disability. Clin Genet. 80, 117-26. Tsou, W.L., Sheedlo, M.J., Morrow, M.E., Blount, J.R., McGregor, K.M., Das, C. and Todi, S.V., 2012. Systematic analysis of the physiological importance of deubiquitinating enzymes. PLoS One. 7, e43112. Tursun, B., Cochella, L., Carrera, I. and Hobert, O., 2009. A toolkit and robust pipeline for the generation of fosmid-based reporter genes in C. elegans. PLoS One. 4, e4625. Untergasser, A., Nijveen, H., Rao, X., Bisseling, T., Geurts, R. and Leunissen, J.A., 2007. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 35, W71-4. Valencia, A., 2013. Next generation sequencing technologies in medical genetics, Springer, New York. van Bokhoven, H., 2011. Genetic and epigenetic networks in intellectual disabilities. Annu Rev Genet. 45, 81-104. Veltman, J.A. and Brunner, H.G., 2010. Understanding variable expressivity in microdeletion syndromes. Nat Genet. 42, 192-3. Verdin, H., D'Haene, B., Beysen, D., Novikova, Y., Menten, B., Sante, T., Lapunzina, P., Nevado, J., Carvalho, C.M., Lupski, J.R. and De Baere, E., 2013. Microhomology- mediated mechanisms underlie non-recurrent disease-causing microdeletions of the FOXL2 gene or its regulatory domain. PLoS Genet. 9, e1003358. Visel, A., Blow, M.J., Li, Z., Zhang, T., Akiyama, J.A., Holt, A., Plajzer-Frick, I., Shoukry, M., Wright, C., Chen, F., Afzal, V., Ren, B., Rubin, E.M. and Pennacchio, L.A., 2009. ChIP- seq accurately predicts tissue-specific activity of enhancers. Nature. 457, 854-8. Visel, A., Minovitsky, S., Dubchak, I. and Pennacchio, L.A., 2007. VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88-92. Vissers, L.E., de Ligt, J., Gilissen, C., Janssen, I., Steehouwer, M., de Vries, P., van Lier, B., Arts, P., Wieskamp, N., del Rosario, M., van Bon, B.W., Hoischen, A., de Vries, B.B., Brunner, H.G. and Veltman, J.A., 2010a. A de novo paradigm for mental retardation. Nat Genet. 42, 1109-12. Vissers, L.E., de Vries, B.B. and Veltman, J.A., 2010b. Genomic microarrays in mental retardation: from copy number variation to gene, from research to diagnosis. J Med Genet. 47, 289-97. 181

Vissers, L.E., Gilissen, C. and Veltman, J.A., 2015. Genetic studies in intellectual disability and related disorders. Nat Rev Genet. Vissers, L.E. and Stankiewicz, P., 2012. Microdeletion and microduplication syndromes. Methods Mol Biol. 838, 29-75. Vissers, L.E., van Ravenswaaij, C.M., Admiraal, R., Hurst, J.A., de Vries, B.B., Janssen, I.M., van der Vliet, W.A., Huys, E.H., de Jong, P.J., Hamel, B.C., Schoenmakers, E.F., Brunner, H.G., Veltman, J.A. and van Kessel, A.G., 2004. Mutations in a new member of the chromodomain gene family cause CHARGE syndrome. Nat Genet. 36, 955-7. Vogel, C. and Marcotte, E.M., 2012. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 13, 227-32. Wang, H., Wang, X., Ke, Z.J., Comer, A.L., Xu, M., Frank, J.A., Zhang, Z., Shi, X. and Luo, J., 2015. Tunicamycin-induced unfolded protein response in the developing mouse brain. Toxicol Appl Pharmacol. 283, 157-67. Wang, J., Duncan, D., Shi, Z. and Zhang, B., 2013. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 41, W77-83. Wang, W., Budhu, A., Forgues, M. and Wang, X.W., 2005. Temporal and spatial control of nucleophosmin by the Ran-Crm1 complex in centrosome duplication. Nat Cell Biol. 7, 823-30. Watanabe, M., Fukuda, M., Yoshida, M., Yanagida, M. and Nishida, E., 1999. Involvement of CRM1, a nuclear export receptor, in mRNA export in mammalian cells and fission yeast. Genes Cells. 4, 291-7. Weaver, J.R., Susiarjo, M. and Bartolomei, M.S., 2009. Imprinting and epigenetic changes in the early embryo. Mamm Genome. 20, 532-43. Weterings, E. and van Gent, D.C., 2004. The mechanism of non-homologous end-joining: a synopsis of synapsis. DNA Repair (Amst). 3, 1425-35. White, L.M., Rogan, P.K., Nicholls, R.D., Wu, B.L., Korf, B. and Knoll, J.H., 1996. Allele- specific replication of 15q11-q13 loci: a diagnostic test for detection of . Am J Hum Genet. 59, 423-30. Wiegreffe, C., Simon, R., Peschkes, K., Kling, C., Strehle, M., Cheng, J., Srivatsa, S., Liu, P., Jenkins, N.A., Copeland, N.G., Tarabykin, V. and Britsch, S., 2015. Bcl11a (Ctip1) Controls Migration of Cortical Projection Neurons through Regulation of Sema3c. Neuron. 87, 311-25. Williamson C.M., Blake A., Thomas S., Beechey C.V., Hancock J., Cattanach B.M. and J., P., 2013. World Wide Web Site - Mouse Imprinting Data and References. MRC Harwell, Oxfordshire. World Health Organization., 1995. The world health report : report of the Director-General. World Health Organization, Geneva, pp. v. World Health Organization., 2013. WHO methods and data sources for global burden of disease estimates 2000-2011. World Health Organization, Geneva. Wright, C.F., Fitzgerald, T.W., Jones, W.D., Clayton, S., McRae, J.F., van Kogelenberg, M., King, D.A., Ambridge, K., Barrett, D.M., Bayzetinova, T., Bevan, A.P., Bragin, E., Chatzimichali, E.A., Gribble, S., Jones, P., Krishnappa, N., Mason, L.E., Miller, R., Morley, K.I., Parthiban, V., Prigmore, E., Rajan, D., Sifrim, A., Swaminathan, G.J., Tivey, A.R., Middleton, A., Parker, M., Carter, N.P., Barrett, J.C., Hurles, M.E., FitzPatrick, D.R., Firth, H.V. and study, D.D.D., 2015. Genetic diagnosis of

182

developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet. 385, 1305-14. Wu, J., Rutkowski, D.T., Dubois, M., Swathirajan, J., Saunders, T., Wang, J., Song, B., Yau, G.D. and Kaufman, R.J., 2007. ATF6alpha optimizes long-term endoplasmic reticulum function to protect cells from chronic stress. Dev Cell. 13, 351-64. Xu, D., Grishin, N.V. and Chook, Y.M., 2012. NESdb: a database of NES-containing CRM1 cargoes. Mol Biol Cell. 23, 3673-6. Yamamoto, K., Sato, T., Matsui, T., Sato, M., Okada, T., Yoshida, H., Harada, A. and Mori, K., 2007. Transcriptional induction of mammalian ER quality control proteins is mediated by single or combined action of ATF6alpha and XBP1. Dev Cell. 13, 365-76. Yang, Q., Nausch, L.W., Martin, G., Keller, W. and Doublie, S., 2014. Crystal structure of human poly(A) polymerase gamma reveals a conserved catalytic core for canonical poly(A) polymerases. J Mol Biol. 426, 43-50. Yassaee, V.R., Hashemi-Gorji, F., Soltani, Z. and Poorhosseini, S.M., 2014. A new approach for molecular diagnosis of TAR syndrome. Clin Biochem. 47, 835-9. Ye, T., Lipska, B.K., Tao, R., Hyde, T.M., Wang, L., Li, C., Choi, K.H., Straub, R.E., Kleinman, J.E. and Weinberger, D.R., 2012. Analysis of copy number variations in brain DNA from patients with schizophrenia and other psychiatric disorders. Biol Psychiatry. 72, 651-4. Yeshaya, J., Amir, I., Rimon, A., Freedman, J., Shohat, M. and Avivi, L., 2009. Microdeletion syndromes disclose replication timing alterations of genes unrelated to the missing DNA. Mol Cytogenet. 2, 11. Yeshaya, J., Shalgi, R., Shohat, M. and Avivi, L., 1998. Replication timing of the various FMR1 alleles detected by FISH: inferences regarding their transcriptional status. Hum Genet. 102, 6-14. Yoshida, H., Haze, K., Yanagi, H., Yura, T. and Mori, K., 1998. Identification of the cis-acting endoplasmic reticulum stress response element responsible for transcriptional induction of mammalian glucose-regulated proteins. Involvement of basic transcription factors. J Biol Chem. 273, 33741-9. Yoshida, H., Okada, T., Haze, K., Yanagi, H., Yura, T., Negishi, M. and Mori, K., 2000. ATF6 activated by proteolysis binds in the presence of NF-Y (CBF) directly to the cis-acting element responsible for the mammalian unfolded protein response. Mol Cell Biol. 20, 6755-67. Yoshikawa, A., Kamide, T., Hashida, K., Ta, H.M., Inahata, Y., Takarada-Iemata, M., Hattori, T., Mori, K., Takahashi, R., Matsuyama, T., Hayashi, Y., Kitao, Y. and Hori, O., 2015. Deletion of Atf6alpha impairs astroglial activation and enhances neuronal death following brain ischemia in mice. J Neurochem. 132, 342-53. Yuen, R.K., Jiang, R., Penaherrera, M.S., McFadden, D.E. and Robinson, W.P., 2011. Genome- wide mapping of imprinted differentially methylated regions by DNA methylation profiling of human placentas from triploidies. Epigenetics Chromatin. 4, 10. Zhang, B., Kirov, S. and Snoddy, J., 2005. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33, W741-8. Zhang, F., Khajavi, M., Connolly, A.M., Towne, C.F., Batish, S.D. and Lupski, J.R., 2009. The DNA replication FoSTeS/MMBIR mechanism can generate genomic, genic and exonic complex rearrangements in humans. Nat Genet. 41, 849-53.

183

Zhang, L., Nosak, C., Sollazzo, P., Odisho, T. and Volchuk, A., 2014. IRE1 inhibition perturbs the unfolded protein response in a pancreatic beta-cell line expressing mutant proinsulin, but does not sensitize the cells to apoptosis. BMC Cell Biol. 15, 29. Zheng, Y., Gery, S., Sun, H., Shacham, S., Kauffman, M. and Koeffler, H.P., 2014. KPT-330 inhibitor of XPO1-mediated nuclear export has anti-proliferative activity in hepatocellular carcinoma. Cancer Chemother Pharmacol. 74, 487-95.

184

Appendices

Appendix A Supplementary Tables and Figures for Chapter 2

Supplementary Table 2.1 Summary of Clinical Features in Previously Published and Newly Recruited Individuals with 2p15-16.1 Microdeletions

A) B) C)

4

ic et al. 2007 (1) 2007 al. et ic

Separovic et al. 2007 (2) 2007 al. et Separovic Separov

- -

Florisson et al. 2013 (1) 2013 al. et Florisson Rajcan 2011 et al. Prontera Rajcan 2008 et al. deLeeuw (2) 2013 al. et Florisson 2010 et al. Felix 2009 et al. Liang (2) 2012 et al. Piccone (1) 2012 et al. Piccone 2012 al. et Hucthagowder 2014 al. et Peter 2013 al. et Hancarova 2008 et al. Chabchoub 201 et al. Fannemel phenotype with cases of No. phenotype with cases % of 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 No. phenotype with cases of No. phenotype with cases % of phenotype with cases of No. phenotype with cases % of published Published cases New cases New (8) All (23) (15)

General

information Age at last reported

examination 4 yrs 6 yrs 9 yrs 8 yrs 32 yrs 13 yrs 4 yrs 4 yrs 6 m 7 m 4 yrs 2 yrs 11 yrs 11 yrs 16 yrs 21 yrs 4 yrs 20 m 3 yrs 22 m 16 yrs 5 yrs 11 m 12 yrs 12 yrs 7 8 7 1 14 9 Gender M M F F M F F F M F F M F M M M M M F M M M M M F M F M F

95

(Mb)

Del Size Del 6.7461 7.8918 3.5278 6.1122 3.4512 6.6777 3.3497 3.1441 2.5050 0.6430 2.4724 0.2030 0.4380 0.5833 0.2327 9.5744 2.0126 5.3624 0.9710 4.5927 0.35 2.6672 0.7947 1 Growth Phenotype Prenatal Intrauterine growth retardation – + + – + + + – + – – 6 40.0% + – + + – 3 37.5% 9 39.1% Postnatal Postnatal growth retardation + + – + + + + – 6 40.0% + – + 2 25.0% 8 34.8% Feeding problems – + + + + – + – + + + 8 53.3% + + + + + + + 7 87.5% 15 65.2% Measurement Height centile <3rd (+) – + + – + – – + + – + 97th – 6 40.0% – – + – + + – 3 37.5% 9 39.1% Weight centile <3rd (+) – + – – + + + – + + + – 6 40.0% 3rd – – + + – 2 25.0% 8 34.8% OCF centile + + + – + 12 80.0% ++ – ++ – 6 75.0% 18 78.3% <3rd (++ ); 5th-10th (+) ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++

185

. 2012 .

Separovic et al. 2007 (2) 2007 al. et Separovic (1) 2007 al. et Separovic

cases with phenotype with cases

- -

son et al. 2013 (1) 2013 al. et son

Floris Rajcan 2011 et al. Prontera Rajcan 2008 et al. deLeeuw (2) 2013 al. et Florisson 2010 et al. Felix 2009 et al. Liang (2) 2012 et al. Piccone (1) 2012 et al. Piccone al et Hucthagowder 2014 al. et Peter 2013 al. et Hancarova 2008 et al. Chabchoub 2014 et al. Fannemel phenotype with cases of No. phenotype with cases % of 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 No. phenotype with cases of No. phenotype with cases % of of No. phenotype with cases % of

published Published cases New cases New (8) All (23) (15) 2 CNS anomalies Phenotype Cognitive deficit ID + + + + + + + + + + + + + + + 15 100.0 + + + + + + + 7 87.5% 22 95.7 DD + + + + + + + + + + + + + + + 15 %100.0 + + + + + + + + 8 100.0 23 %100.0 Delayed language skills + + + + + + + + N + + + + + + 14 %93.3% + + + + N + + + 7 %87.5% 21 %91.3 Behaviour A M % features AUTISM + – + – + – – – + – – 4 26.7% NA – – 0 0.0% 4 17.4 Attention deficit behaviour + + + – – + + – – – 5 33.3% – – + – + 2 25.0% 7 %30.4 Other abnormal behaviour – – + + 2 13.3% + – + + + + 5 62.5% 7 %30.4 Neuromotor % deficits Hypotonia + + + + + + 6 40.0% + + + + + + 6 75.0% 12 52.2 Spasticity legs – + + + – + + – 5 33.3% – – + + 2 25.0% 7 %30.4 Other + + 2 13.3% 0 0.0% 2 %8.7% Neurostructural (Brain) abnormalities Structure brain abnormalities + + – + – – + – + – – – 5 33.3% + – 1 12.5% 6 26.1 (neuroimaging)Cortical dysplasia on cranial MRI + + – + – – + – + – – – 5 33.3% 0 0.0% 5 %21.7 % Neurological abnormal ities Seizure + – + – + – 3 20.0% – – – – 0 0.0% 3 13.0 Abnormal EEG + – + – + + – 4 26.7% – 0 0.0% 4 %17.4 Vision- + + – + + – – – + + + 7 46.7% – – + + 2 25.0% 9 %39.1 Vision-Optic nerve hypoplasia – + – + + – – + – – – – – – – 4 26.7% – – – – 0 0.0% 4 %17.4 Vision-Disturbed vision – + + + + – + + + – 7 46.7% – – – – – + 1 12.5% 8 %34.8 Hearing loss – + – – – + – – + – – – – + 4 26.7% – – – – – – – 0 0.0% 4 %17.4 Others + + + + + + + 7 46.7% – + – 1 12.5% 8 %34.8 3 Craniofacial Phenotype % abnormalities Head Microcephaly-congenital + + + + + + + + – + + – + – 11 73.3% + + + – + + + – 6 75.0% 17 73.9 Microcephaly-with simplified gyral + + + – – + – – 4 26.7% 0 0.0% 4 %17.4 pattern % Bitemporal narrowing + + + + + + – + + – – + 9 60.0% + + + + 4 50.0% 13 56.5 Receding short forehead – + – + + + – + – – – 5 33.3% + + – 2 25.0% 7 %30.4 Metopic prominence or Metopic + + +/ + - + – 4 26.7% + + + + 4 50.0% 8 %34.8 craniosynostosisOther head shape abnormality + + + + + + – + + 8 53.3% + + – – + + 4 50.0% 12 %52.2 %

186

t al. 2008 t al.

Separovic et al. 2007 (2) 2007 al. et Separovic (1) 2007 al. et Separovic

- -

era et al. 2011 et al. era

Florisson et al. 2013 (1) 2013 al. et Florisson Rajcan Pront Rajcan 2008 et al. deLeeuw (2) 2013 al. et Florisson 2010 et al. Felix 2009 et al. Liang (2) 2012 et al. Piccone (1) 2012 et al. Piccone 2012 al. et Hucthagowder 2014 al. et Peter 2013 al. et Hancarova e Chabchoub 2014 et al. Fannemel phenotype with cases of No. phenotype with cases % of 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 No. phenotype with cases of No. phenotype with cases % of phenotype with cases of No. phenotype with cases % of published Published cases New cases New (8) All (23) (15) 3 Craniofacial Phenotype abnormalities Eyes Epicanthal folds + + + + + + + + + + + – 11 73.3% – + + – + + 4 50.0% 15 65.2 Short palpebral fissures + + + + + – + + – – – + – 8 53.3% + + – 1 12.5% 9 %39.1 Down slanting palpebral fissure – + – + + – – – + – + + + 7 46.7% + + + + + 5 62.5% 12 %52.2 Ptosis + + + + + + + + – – – + + 10 66.7% + + + + – 4 50.0% 14 %60.9 Telecanthus + + + + + + + + – + + + + – 12 80.0% + + – + 3 37.5% 15 %65.2 Hypertelorism + + + + + + – + + + 9 60.0% + + – + 2 25.0% 11 %47.8 Bilateral tear duct obstruction or 0 0.0% + – + 2 25.0% 2 %8.7% absence Long, straight eyelashes – + – + + + + – + 6 40.0% + + + – + 4 50.0% 10 43.5 Long, thin eyebrows – + – – – – + 2 13.3% + + – 2 25.0% 4 %17.4 Synophrys + + 2 13.3% – + 1 12.5% 3 %13.0 Nose % Broad/high nasal root + + + + + + + + – – + + + + 12 80.0% + + + + + 5 62.5% 17 73.9 Prominent nasal tip – + + + – – – + + + + 7 46.7% + – – + 2 25.0% 9 %39.1 Ears % Large ears + + + + + – + – – – – – + 7 46.7% + + – – 2 25.0% 9 39.1 dysplastic ears + + – + + + + – 6 40.0% + – + + + 4 50.0% 10 %43.5 low set ears + + + + + + – 6 40.0% – – 0 0.0% 6 %26.1 Mouth % Smooth and long philtrum + + + + + + + + + + + + 12 80.0% + – + + – 3 37.5% 15 65.2 Smooth upper vermillion border – + + + + – – – – – + + + 7 46.7% + + + 3 37.5% 10 %43.5 Protruding tongue 0 0.0% – + 1 12.5% 1 %4.3% Everted lower lip + + + + + – – + – + + + – 9 60.0% + + – 2 25.0% 11 47.8 Thin upper lip + + + + 4 26.7% + – + 2 25.0% 6 %26.1 High narrow palate (or palate – + + + + – – – + – + + + + 9 60.0% + + – + + + 5 62.5% 14 %60.9 abnormalities)Retrognathia + + – – + – + + – + – – + 7 46.7% + + – + 3 37.5% 10 %43.5 Maxillary + + 2 13.3% + + 2 25.0% 4 %17.4 Cleft lip/palate 0 0.0% + + 2 25.0% 2 %8.7%

187

thphenotype

Separovic et al. 2007 (2) 2007 al. et Separovic (1) 2007 al. et Separovic

- -

Florisson et al. 2013 (1) 2013 al. et Florisson Rajcan 2011 et al. Prontera Rajcan 2008 et al. deLeeuw (2) 2013 al. et Florisson 2010 et al. Felix 2009 et al. Liang (2) 2012 et al. Piccone (1) 2012 et al. Piccone 2012 al. et Hucthagowder 2014 al. et Peter 2013 al. et Hancarova 2008 et al. Chabchoub 2014 et al. Fannemel phenotype with cases of No. phenotype with cases % of 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7 No. 8 No. phenotype with cases of No. wi cases % of phenotype with cases of No. phenotype with cases % of published All Published cases New cases New (8) All (23) (15) (23) 4 Ectodermal Phenotype abnormalities Teeth Dental (malocclusion) + + + + 4 26.7% + + + 3 37.5% 7 30.4% Hair Hair + + 2 13.3% + + + 3 37.5% 5 21.7% Skin Skin 0 0.0% – + + 2 25.0% 2 8.7% Immune Frequent upper respiratory infections – – – + + – – – – – + – 3 20.0% – – – + + 2 25.0% 5 21.7%

Frequent ear infections + 1 6.7% – + + + + 4 50.0% 5 21.7% 5 Thoracic Phenotype abnormalities Widened internipple distance – + + + + – + – – + – 6 40.0% – – + – 1 12.5% 7 30.4% Extra nipple – + – – – – – – + 2 13.3% – – – – 0 0.0% 2 8.7% Thoracic abnormalities (pectus, hernia, + – + + 3 20.0% + – + + + + 5 62.5% 8 34.8% refulx) Curvature of the spine + + + + + 5 33.3% + – + – + 3 37.5% 8 34.8% Cardiac defect – – – – – + – 1 6.7% – – + – – 1 12.5% 2 8.7% Laryngomalasia – + – – – – – – 1 6.7% – – – – 0 0.0% 1 4.3% Stomach + + 2 13.3% + – + + 3 37.5% 5 21.7% 6 Genital abnormalites Phenotype Testes + + + 3 20.0% – + + + 3 37.5% 6 26.1% Endocrine + – + 2 13.3% – – 0 0.0% 2 8.7% Hypogonadism – + – – – – – – + – – – 2 13.3% – – – 0 0.0% 2 8.7% Other genital abnormality + + + – 3 20.0% – + – 1 12.5% 4 17.4% 7 Urinary system Phenotype abnormality Hydronephrosis – + – + + – – – – – + – 4 26.7% – – – – 0 0.0% 4 17.4% Kidney – + + – – + – – 3 20.0% – – – 0 0.0% 3 13.0% Other urinary system abnormality + – 1 6.7% – – + 1 12.5% 2 8.7% 8 Digital abnormalities Phenotype Camptodactyly + + – + – + + – + – – – – 6 40.0% + – + + + – + 5 62.5% 11 47.8% Metatarsus abductus – + – + – – + + – – + – 5 33.3% + – – 1 12.5% 6 26.1% Other digital anomalies + + + + + – + + + 8 53.3% + – + + + 4 50.0% 12 52.2% 9 De Vries Score* 6 8 9 9 8 6 8 9 8 5 8 7 7 9 4 3 7 3

188

This table contains all reported phenotypes for published (15) and newly recruited (8) individuals with 2p15-16.1 microdeletions. Number of cases with each phenotype and the corresponding frequency of the phenotypes are reported for A) cases published to April 2014 (n=15), B) newly recruited cases (n=8), and C) combined cases (n=23). Percentages above 50% are shown in red while percentages between 25-49.9% are shown in green. Phenotypes not included in counts are those not reported for an individual (left blank) or are specified as not assessed (NA), not mentioned (NM), or are listed as borderline (±).

189

Supplementary Table 2.2 Summary of Breakpoints and Additional Genomic Information for 2p15-16.1 Microdeletion Carriers

Case Chr Start End Size Gender Inheritance Platform Additional reported genomic findings de novo, parental Affymetrix 250K Nsp1 SNP array, fine Florisson et al. 2013 (1) 2 55,616,146 62,362,249 6,746,103 min del Male origin ND mapping by qPCR None reported 55,580,038 62,416,010 6,835,972 bal de novo, paternal Affymetrix Genome-Wide Human Rajcan-Separovic et al. 2007 (2) 2 55,627,639 63,519,476 7,891,837 Male allele deleted SNP Array 6.0 (*) Maternal del at 16p13.11 (1.1 Mb) Paracentric inversion of chromosome 7 and an apparently balanced translocation between chromosome 1 and 7, involving the inverted chromosome 7 de novo, parental Affymetrix Genome-Wide Human [46,XX,der(7)inv(7)(q21.1q32.1)t(1;7)(q Prontera et al. 2011 2 56,853,162 60,380,981 3,527,819 Female origin ND SNP Array 6.0 23q32.1)] de novo, parental Affymetrix Genome-Wide Human Rajcan-Separovic et al. 2007 (1) 2 56,919,993 63,032,165 6,112,172 Female allele NI SNP Array 6.0 (*) Paternal del at Xp22.31 (1.4 Mb) de novo, parental Fine mapping by qPCR in Florisson Mosaic del 2p15-16.1 (20/30 cells de Leeuw et al. 2008 2 58,216,217 61,667,426 3,451,209 min del Male origin ND paper (*) observed with deletion)

58,024,114 61,873,699 3,849,585 bal de novo, parental Affymetrix 250K Nsp1 SNP array, fine Florisson et al. 2013 (2) 2 58,714,795 65,392,528 6,677,733 min del Female origin ND mapping by qPCR None reported 58,685,038 65,440,018 6,754,980 bal de novo, paternal Affymetrix Genome-Wide Human

A. Published Cases Published A. Felix et al. 2010 2 59,139,200 62,488,871 3,349,671 Female allele deleted SNP Array 6.0 None reported de novo, paternal Agilent Human Genome CGH Liang et al. 2009 2 59,241,620 62,385,716 3,144,096 Female allele deleted Microarray Kit 105A Polymorphic inversion of chr 9 de novo, parental Agilent Human Genome CGH 244A Piccione et al. 2012 (2) 2 60,257,496 62,762,496 2,505,000 Male origin ND array de novo Xq28 del (29 kb) de novo, paternal Agilent Human Genome CGH 44K Piccione et al. 2012 (1) 2 60,603,496 61,246,496 643,000 Female allele deleted array Paternal 6q12 del (930 kb) de novo, parental Affymetrix Genome-Wide Human Hucthagowder et al. 2012 2 60,672,255 63,144,695 2,472,440 Female origin ND SNP Array 6.0 None reported de novo, parental EmArray Cyto6000 v.2 (Emmory 2q13 del (343 kb) and 6p25.3 del (80 Peter et al. 2014 2 60,689,299 60,830,491 141,192 min del Male origin ND University) kb), unknown origin 203,000 reported Large run of homozygosity (ROH) on proximal side of 2p deletion, extends Illumina Human CYtoSNP-12 to ~61,809,113 bp (supplemental de novo, paternal BeadChip array, fine mapping of figure SI), negative testing for fragile X Hancarova et al. 2013 2 60,689,977 61,127,979 438,002 allele deleted proximal bpt by MLPA and Rett syndromes (MECP2)

190

Case Chr Start End Size Gender Inheritance Platform Additional reported genomic findings None Reported, Negative testing for de novo, parental Affymetrix CytoScan 750K Array, (FBN1) and Chabchoub et al. 2008 2 61,203,258 61,786,583 583,325 Male origin ND hg19 (*) Williams-Beuren (ELN) Fragile site at 12q13.2 (benign variant), maternal 9p24.3 dup (493,399 kb), maternal 9p24.3 dup de novo, parental Agilent 180K SurePrint G3 Human (408,965 kb), paternal 17q25.3 dup Fannemel et al. 2014 2 61,500,346 61,733,075 232,729 Male origin ND CGH (656,833 kb) de novo, parental Affymetrix CytoScan 750K Array, No additional pathogenic CNVs No. 1 2 55,676,099 65,250,541 9,574,442 Male origin ND hg19 Reported de novo 12p11.21-q11 del (6.5 Mb), de de novo, parental Affymetrix CytoScan 750K Array, No. 2 2 57,606,726 59,619,316 2,012,590 Male novo 2p16.1 BCL11A intronic del (17 origin ND hg19 kb) , 2p16.1 intergenic del (22.55 kb)

de novo, paternal Affymetrix Cytogenetics Whole- No additional pathogenic CNVs

No. 3 2 59,017,244 64,379,673 5,362,429 Male allele deleted Genome 2.7M Array Reported de novo, parental Affymetrix CytoScan 750K Array, No additional pathogenic CNVs No. 4 2 60,650,589 61,621,631 971,042 Female origin ND hg19 Reported de novo, parental Signature Genomics No additional pathogenic CNVs No. 5 2 61,060,687 65,653,379 4,592,692 Male origin ND SignatureChipOSTM Reported B. New Cases B. de novo, parental Affymetrix Genome-Wide Human No additional pathogenic CNVs No. 6 2 61,438,499 61,797,959 359,460 Male origin ND SNP Array 6.0 Reported de novo, maternal allele Affymetrix Cytogenetics Whole- No additional pathogenic CNVs No. 7 2 61,585,906 64,253,124 2,667,218 Male deleted Genome 2.7M Array Reported de novo, maternal allele Affymetrix Cytogenetics Whole- No additional pathogenic CNVs No. 8 2 61,739,766 62,534,498 794,732 Male deleted Genome 2.7M Array Reported

This table contains a summary of the genomic information for 2p15-16.1 microdeletion cases for A) published cases to April 2014 (n=15) and B) newly recruited cases (n=8). Breakpoints for all 2p15-16.1 microdeletions (n=23) have been converted to hg19 and are updated to include the newest information where possible (*). Platforms used to detect the 2p15-16.1 CNV are listed along with additional genomic findings. In cases where a new array has been run, the highest resolution array result is reported. In additional cases, fine mapping qPCR results reported by Florisson et al. 2012 (Table 1 and Table SI) are used to update the original report. Where possible, the inheritance (deleted parental allele) is reported. Cases where parental origin have not been determined are listed as ND and in cases where testing was not informative, NI.

191

Supplementary Table 2.3 Genes in the 6.5 Mb de novo 12p Deletion in New Case No. 2

Coding/ OMIM RefSeq Gene ID non-coding Gene OMIM Phenotype DENND5B mRNA DENND5B-AS1 ncRNA METTL20 mRNA Yes N/A AMN1 mRNA H3F3C mRNA Yes N/A KIAA1551 mRNA RNU6-78P ncRNA BICD1 mRNA FGD4 mRNA Yes Charcot -Marie-Tooth disease, type 4H

Encephalopahty, lethal, due to defective DNM1L mRNA Yes mitochondrial peroxisomal fission Myopathy, lactic acidosis, and sideroblastic YARS2 mRNA Yes anemia 2

PKP2 mRNA Yes Arrhythmogenic right ventricular dysplasia 9 SYT10 mRNA Long QT syndrome, acquired, reduced ALG10 mRNA Yes susceptibility to

This table contains a list of RefSeq genes contained within the 6.5 Mb de novo 12p11.21-q11 deletion (chr12:31495280-37969194) reported for new Case No. 2. Known Online Mendelian Inheritance in Man (OMIM) genes are listed along with phenotypes when applicable.

192

Supplementary Table 2.4 Fold Change for Genes within the 2p15-16.1 Microdeletion Region

Separovic et al. al. et Separovic al. et Separovic

- -

an Gene ID Chr Pos Probe ID Probe Coordinates

Strand Rajcan 2007 (2) Rajc 2007 (1) 3 No. Case 7 No. Case No.8 Case #1 Male Ref #2 Male Ref #1 Ref Female CCDC104 2p16.1d + ILMN_2103014 55,624,994 55,625,009 1.030 0.946 0.981 0.993 0.973 0.9881 0.9691 1.0444 55,625,547 55,625,580 SMEK2 2p16.1d - ILMN_1661650 55,629,253 55,629,302 0.672 0.881 1.069 1.130 1.084 1.0257 1.0257 0.9504 PNPT1 2p16.1d - ILMN_2051408 55,715,981 55,716,030 0.873 1.097 1.209 1.278 1.350 0.8909 1.2086 0.9287 EFEMP1 2p16.1d - ILMN_1735877 55,946,706 55,946,755 1.001 0.913 1.036 0.971 1.052 0.9801 0.9747 1.0468 EFEMP1 2p16.1d - ILMN_2350634 55,951,505 55,951,554 1.019 1.063 1.030 0.973 0.984 0.9311 1.0098 1.0636 EFEMP1 2p16.1d - ILMN_1673880 55,998,906 55,998,906 0.984 0.987 0.954 0.949 0.960 0.9604 0.9819 1.0604 56,002,999 56,003,047 CCDC85A 2p16.1d -p16.1c + ILMN_2361614 56,465,611 56,465,660 1.039 1.016 0.981 0.947 0.935 0.9257 1.0379 1.0408 CCDC85A 2p16.1d-p16.1c + ILMN_1669982 56,466,634 56,466,683 0.910 0.876 0.992 0.832 0.877 0.9339 1.0507 1.0191 VRK2 2p16.1b + ILMN_1750088 58,227,028 58,227,077 1.125 0.648 0.984 1.187 1.020 1.1152 0.9975 0.8990 FANCL 2p16.1b - ILMN_2188909 58,239,913 58,239,962 0.909 1.106 1.045 0.938 1.157 0.9947 1.0030 1.0023 BCL11A 2p16.1a - ILMN_2255133 60,537,857 60,537,906 0.761 0.426 1.267 1.120 1.457 0.9000 0.9952 1.1165 BCL11A 2p16.1a - ILMN_2342271 60,626,653 60,626,702 0.738 0.641 0.900 0.816 1.258 0.8053 0.8401 1.4780 BCL11A 2p16.1a - ILMN_1659800 60,626,889 60,626,938 0.740 0.703 0.873 0.815 1.091 0.7551 1.0350 1.2796 PAPOLG 2p16.1a + ILMN_1786275 60,879,402 60,879,451 0.910 0.861 0.831 0.954 0.889 0.9648 1.0471 0.9899 REL 2p16.1a + ILMN_2124064 61,002,816 61,002,865 0.726 0.668 0.680 1.077 1.063 0.9588 1.1562 0.9021 PUS10 2p16.1a - ILMN_1810591 61,023,038 61,023,072 0.903 1.016 0.947 1.068 1.028 0.9555 0.9913 1.0558 61,025,648 61,025,662 PEX13 2p16.1a -p15d + ILMN_2113957 61,129,776 61,129,825 1.009 0.961 0.957 0.953 0.994 0.9722 1.0514 0.9783 KIAA1841 2p15d + ILMN_1735063 61,218,272 61,218,321 0.827 0.917 0.992 0.893 0.863 0.9328 1.0437 1.0272 AHSA2 2p15d + ILMN_1798308 61,267,121 61,267,136 0.646 0.470 0.509 1.211 1.543 0.7488 1.6084 0.8303 61,267,256 61,267,289 USP34 2p15d - ILMN_1739454 61,268,865 61,268,914 0.498 0.503 0.619 0.587 1.112 1.0377 0.9682 0.9954 XPO1 2p15d - ILMN_1725121 61,558,962 61,559,011 0.989 0.885 1.002 1.176 1.059 0.9223 1.1199 0.9682 FLJ13305 2p15c - ILMN_2120072 61,916,739 61,916,748 0.896 1.020 0.974 0.956 0.912 1.0291 0.9837 0.9878 61,920,060 61,920,099 CCT4 2p15c - ILMN_1776073 61,948,906 61,948,955 0.917 0.845 0.912 0.948 0.857 1.0994 0.9901 0.9187 COMMD1 2p15c + ILMN_1761242 62,216,595 62,216,644 0.698 0.628 0.657 0.601 0.535 1.1251 0.8706 1.0210 B3GNT2 2p15c + ILMN_1711102 62,303,804 62,303,853 0.869 0.929 0.747 0.801 0.793 0.8880 1.0102 1.1147 TMEM17 2p15b - ILMN_2210386 62,581,366 62,581,415 0.648 1.498 1.360 1.991 2.134 0.7186 1.8704 0.7440

193

et al. al. et

Separovic et al. al. et Separovic Separovic

- -

Gene ID Chr Pos Probe ID Probe Coordinates

Strand Rajcan 2007 (2) Rajcan 2007 (1) 3 No. Case 7 No. Case No.8 Case #1 Male Ref #2 Male Ref #1 Ref Female EHBP1 2p15b + ILMN_1803348 63,126,795 63,126,844 0.808 0.629 0.760 0.826 1.153 1.0417 0.9842 0.9754 OTX1 2p15b + ILMN_1691180 63,137,638 63,137,687 0.843 0.896 0.898 0.889 0.948 1.0913 0.9781 0.9369 LOC51057 2p15b-p15a - ILMN_2360929 63,202,286 63,202,335 1.027 0.816 0.918 0.925 0.903 1.1019 0.9208 0.9855 LOC51057 2p15b-p15a - ILMN_1717010 63,462,593 63,462,642 0.969 1.101 1.041 0.997 0.981 1.0091 1.0740 0.9227 MDH1 2p15a + ILMN_1656913 63,686,024 63,686,031 1.209 1.059 0.709 0.714 1.278 1.2599 0.8198 0.9682 63,686,577 63,686,618 LOC388955 2p15a - ILMN_2163315 63,702,949 63,702,998 1.061 0.995 0.989 0.999 0.951 0.9720 1.0830 0.9500 UGP2 2p15a + ILMN_2284181 63,922,712 63,922,761 1.090 0.997 1.003 0.999 0.982 0.9908 0.9860 1.0236 UGP2 2p15a + ILMN_2389151 63,968,128 63,968,177 1.138 1.084 0.550 0.565 1.092 0.9571 1.0546 0.9908 UGP2 2p15a + ILMN_2389155 63,970,755 63,970,804 1.026 1.026 0.535 0.676 1.186 0.9772 1.1381 0.8992 VPS54 2p15a-p14c - ILMN_1761086 63,973,413 63,973,462 1.186 0.875 0.809 0.849 1.078 1.1142 0.9835 0.9126 VPS54 2p15a-p14c - ILMN_2386967 64,000,589 64,000,633 0.999 1.060 0.971 0.971 1.112 0.9335 1.1580 0.9251 PELI1 2p14c - ILMN_1679268 64,174,168 64,174,217 0.903 0.580 0.318 0.685 0.704 0.9682 1.4077 0.7337

Relative expression ratios for genes from the 2p15-16.1 deletion region are shown in the table above. Expression values for genes located within a deletion are shown on a green background. Genes with expression values reduced by more than 30% compared to normal controls are shown in red. Copy number sensitive genes, or genes that show a reduced expression when deleted (REL, AHSA2, USP34 and COMMD1), are also shown in red.

194

Supplementary Table 2.5 RMSK Elements in Regions Flanking 2p15-16.1 Deletions

Deletion Flank RMSK Element Chr Start Stop Case_Flank Family Class Name Start End strand Size (bp) chr2 55675599 55676099 Case_1_distal L1 LINE L1MC4a 55675474 55676356 - 882 chr2 59619316 59619816 Case_2_proximal ERVL-MaLR LTR THE1D 59619754 59620128 - 374 chr2 64379673 64380173 Case_3_proximal L1 LINE L1MEd 64379813 64380363 - 550 chr2 64379673 64380173 Case_3_proximal L1 LINE L1M5 64379554 64379772 - 218 chr2 61621631 61622131 Case_4_proximal Alu SINE AluJb 61621499 61621693 + 194 chr2 61060188 61060688 Case_5_distal Alu SINE AluJb 61060472 61060559 + 87 chr2 61060188 61060688 Case_5_distal MIR SINE MIR 61060605 61060709 + 104 chr2 61438000 61438500 Case_6_distal Alu SINE AluJb 61437854 61438072 - 218 chr2 61797959 61798459 Case_6_proximal Alu SINE AluSp 61798046 61798342 - 296 chr2 61797959 61798459 Case_6_proximal Alu SINE FRAM 61797931 61797967 - 36 chr2 61797959 61798459 Case_6_proximal L1 LINE L1MC4 61797984 61798046 + 62 chr2 61797959 61798459 Case_6_proximal L1 LINE L1MC4 61798342 61798364 + 22 chr2 61797959 61798459 Case_6_proximal Alu SINE AluSg4 61798369 61798668 + 299 chr2 61585407 61585907 Case_7_distal L1 LINE L1MB4 61585365 61585565 + 200 chr2 61585407 61585907 Case_7_distal L1 LINE L1MB4 61585567 61586788 + 1221 chr2 64253124 64253624 Case_7_proximal L2 LINE L2a 64253622 64253756 + 134 chr2 64253124 64253624 Case_7_proximal L2 LINE L2a 64253123 64253196 + 73 chr2 64253124 64253624 Case_7_proximal L2 LINE L2a 64253483 64253546 + 63 chr2 61739267 61739767 Case_8_distal L1 LINE L1M5 61739735 61739895 + 160 chr2 55615646 55616146 Florisson-2013-1_distal L2 LINE L2a 55615767 55615850 + 83 chr2 55615646 55616146 Florisson-2013-1_distal Low_complexity Low_complexity AT_rich 55616074 55616123 + 49 chr2 62362249 62362749 Florisson-2013-1_proximal MIR SINE MIRb 62362209 62362422 + 213 chr2 62362249 62362749 Florisson-2013-1_proximal Alu SINE AluY 62362489 62362784 - 295 chr2 63519476 63519976 Rajcan-Separovic-2007-2_proximal L1 LINE L1M4c 63519861 63521356 + 1495

195

Deletion Flank RMSK Element Chr Start Stop Case_Flank Family Class Name Start End strand Size (bp) chr2 56852663 56853163 Prontera-2010_distal ERVL LTR LTR33A 56852798 56853220 - 422 chr2 60380981 60381481 Prontera-2010_proximal ERVL-MaLR LTR MLT1F2 60380678 60381217 - 539 chr2 60380981 60381481 Prontera-2010_proximal L2 LINE L2a 60381300 60381435 - 135 chr2 56919494 56919994 Rajcan-Separovic-2007-1_distal L1 LINE L1PREC2 56919183 56920080 - 897 chr2 63032165 63032665 Rajcan-Separovic-2007-1_proximal L1 LINE L1ME4a 63032167 63032640 + 473 chr2 63032165 63032665 Rajcan-Separovic-2007-1_proximal hAT-Charlie DNA MER1B 63031826 63032167 + 341 chr2 63032165 63032665 Rajcan-Separovic-2007-1_proximal Alu SINE AluSz 63032659 63032949 - 290 chr2 58215717 58216217 deLeeuw-2008_distal ERVL LTR MLT2B2 58215515 58216042 + 527 chr2 61667426 61667926 deLeeuw-2008_proximal Alu SINE AluSq2 61667653 61667952 - 299 chr2 58714295 58714795 Florisson-2013-2_distal ERV1 LTR LTR27 58714137 58714616 - 479 chr2 59138700 59139200 Felix-2010_distal Simple_repeat Simple_repeat (TG)n 59138982 59139028 + 46 chr2 62488871 62489371 Felix-2010_proximal Alu SINE AluSx3 62488796 62489058 + 262 chr2 62488871 62489371 Felix-2010_proximal ERV1 LTR LTR49-int 62489058 62489255 - 197 chr2 62385716 62386216 Liang-2009_proximal L1 LINE L1M4b 62385941 62386419 + 478 chr2 60256997 60257497 Piccone-2012-2_distal MIR SINE MIRb 60257339 60257530 - 191 chr2 62762496 62762996 Piccone-2012-2_proximal ERV1 LTR LTR12F 62762587 62763076 - 489 chr2 60602997 60603497 Piccone-2012-1_distal ERVL LTR LTR16A 60603042 60603467 - 425 chr2 61246496 61246996 Piccone-2012-1_proximal L1 LINE L1M5 61246365 61247111 - 746 chr2 60671756 60672256 Hucthagowder-2012_distal L2 LINE L2a 60672083 60672212 + 129 chr2 63144695 63145195 Hucthagowder-2012_proximal Alu SINE AluSz 63145033 63145331 - 298 chr2 63144695 63145195 Hucthagowder-2012_proximal ERVL LTR LTR40c 63144876 63144974 - 98 chr2 63144695 63145195 Hucthagowder-2012_proximal MIR SINE MIRc 63144639 63144711 - 72 chr2 60689478 60689978 Hancarova-2012_distal MIR SINE MIR3 60689893 60689972 + 79 chr2 61127979 61128479 Hancarova-2012_proximal Alu SINE AluSq2 61128279 61128595 - 316 chr2 61202758 61203258 Chabchoub-2008_distal Alu SINE AluSx1 61202886 61203196 - 310 chr2 61202758 61203258 Chabchoub-2008_distal MIR SINE MIRb 61203230 61203342 + 112 chr2 61786583 61787083 Chabchoub-2008_proximal Alu SINE AluJr 61786377 61786593 - 216 chr2 61786583 61787083 Chabchoub-2008_proximal Alu SINE AluSx3 61786653 61786948 + 295 196

Deletion Flank RMSK Element Chr Start Stop Case_Flank Family Class Name Start End strand Size (bp) chr2 61786583 61787083 Chabchoub-2008_proximal Alu SINE AluY 61786950 61787253 + 303 chr2 61499846 61500346 Fannemel-2014_distal L2 LINE L2 61499566 61499856 290 chr2 61733075 61733575 Fannemel-2014_proximal Alu SINE AluSz 61733275 61733575 300 chr2 61733075 61733575 Fannemel-2014_proximal Simple_repeat Simple_repeat (TAAA)n 61733225 61733249 24

This table contains the Repeating Elements by RepeatMasker (RMSK) that overlap with regions flanking 2p15-16.1 microdeletions (500 bp upstream & downstream. Genomic positions for the regions flanking each deletion (proximal or distal) are provided along with the case identifier. A total of 56 RMSK elements are listed along with their corresponding Family, Class, Name, Start, End, strand, and size (bp). The UCSC rmsk track includes up to ten different classes of repeats: short interspersed nuclear elements (SINE), which include ALUs, long interspersed nuclear elements (LINE), long terminal repeat elements (LTR), which include retroposons, DNA repeat elements (DNA), simple repeats (micro-satellites), low complexity repeats, satellite repeats, RNA repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA), other repeats, which includes class RC (Rolling Circle), and a category labeled unknown. Build used is Human Feb. 2009 (GRCh37/hg19)

197

Supplementary Figure 2. 1 Microsatellite Selection for Parent of Origin Studies in 2p15-16.1 Deletion

Carriers

The selected microsatellite (24xTA, circled) with its chromosomal coordinates are shown with reference to position on chromosome 2 (ideogram, red line) and within the gene, XPO1. Forward (F) and reverse (R) primer sequences are displayed.

198

Appendix B Supplementary Tables and Figures for Chapter 3

Supplementary Table 3.1 Genes (107) Involved in the UPR Pathway with Expression Ratios in Individuals from Family A and C

1q21.1 Illumina Expression Array (Fold Change)

Gene Synonym(s) Probe A1 A2 A3 D1 D2 Female Female Male Control 1 Control 2 Control 1

ds ATF6

1 ERO1LB ERO1b ILMN_2121816 1.83528 0.95727 0.87236 0.80664 0.97942 1.03958 1.07624 0.89379 2 HYOU1 ORP150 ILMN_2141790 0.7476 0.67752 0.7373 0.73424 0.76968 0.93519 0.99608 1.07351 3 HSP90B1 GRP94 ILMN_2096116 0.48858 0.81038 1.14605 0.66742 0.57302 0.84479 0.91806 1.28937 4 PDIA4 ERp72 ILMN_1815261 0.72247 0.85678 1.16959 0.70515 0.63994 0.79223 0.9924 1.27191 5 DERL3 Derlin-3 AVERAGED 0.99718 1.02427 1.15285 0.95909 0.92802 1.00642 1.02122 0.97355 6 SEL1L ILMN_1726496 0.8762 0.76755 1.69741 0.90836 1.00998 0.88536 0.9236 1.22292 7 SYVN1 HRD1 AVERAGED 0.85832 1.17557 1.43841 0.97553 0.77301 0.85004 1.19047 0.9935 8 SDF2L1 ILMN_1749213 0.32384 0.91595 1.00928 0.44545 0.48074 0.8255 0.86654 1.39797 9 CRELD2 ILMN_1748707 0.4383 1.21419 1.18921 0.64618 0.40053 0.77916 0.95264 1.34723 10 PIGA AVERAGED 1.29135 1.05049 1.00356 1.18365 1.1928 0.97872 0.99649 1.02813 11 ATP2A2 SERCA2 AVERAGED 1.18067 0.88846 0.95018 0.91517 0.99601 0.93666 0.93677 1.14532 12 ORMDL2 ILMN_1774708 1.21363 0.98305 0.97829 0.99195 1.05142 1.1106 0.93325 0.96482 13 GALNT3 ILMN_1671039 1.19196 1.15535 1.42932 1.30345 1.19278 1.05287 1.13866 0.83412 14 HOXA1 AVERAGED 1.00388 1.02065 1.04392 0.99892 1.07755 0.99715 1.02549 0.98108 15 FAR1 Mlstd2 ILMN_2143250 0.86894 0.9522 0.98441 0.92488 0.88352 1.01279 1.02196 0.96616 16 NUCB2 nucleobindin 2 ILMN_1655913 0.97874 0.62764 0.96527 0.96527 0.54074 1.1321 0.75891 1.16393 17 CALM1 calmodulin 1 ILMN_1778242 0.86254 0.94387 1.81085 0.7457 1.03288 0.82169 1.04729 1.16205 18 CEACAM20 ILMN_1774781 0.97019 1.0044 0.99126 1.10216 0.93844 0.99815 1.0485 0.9555 19 CCDC134 ILMN_1809883 0.79959 0.92552 0.85817 0.70874 0.85402 0.90083 0.97761 1.1355 20 HRG ILMN_2193936 1.0149 1.0355 1.14578 1.03766 1.01701 0.93974 0.94825 1.1222 21 LIN52 ILMN_1797055 1.07699 1.02172 1.30586 0.99516 1.19334 1.02527 1.07549 0.90689 22 TBCCD1 ILMN_2076415 0.8827 0.96393 0.92402 0.95066 0.9132 0.98692 0.95066 1.06585

199

1q21.1 Illumina Expression Array (Fold Change) Gene Synonym(s) Probe A1 A2 A3 D1 D2 Female Female Male Control 1 Control 2 Control 1 ds ATF6 cont. 23 C1orf9 Dd25 AVERAGED 1.28622 1.07784 1.17292 1.09156 0.99803 0.97841 1.0306 1.00263 24 DDIT3 ILMN_1676984 1.11626 2.21709 0.87762 1.09632 0.80367 0.98055 1.41291 0.7218 25 EHD4 ILMN_1720083 0.97265 1.03526 1.37554 1.19748 1.77769 0.95926 0.80107 1.30134 26 EMD ILMN_1801421 0.45271 0.9908 0.772 0.86254 0.7457 1.04006 1.0693 0.89917 27 FA2H ILMN_1791531 0.94759 1.03838 0.96282 0.90083 1.09835 0.99815 0.99954 1.00231 28 HMGCS1 ILMN_1797728 2.11893 0.52316 1.51922 2.57279 2.64512 1.42734 0.75436 0.92873 29 HRC ILMN_1738773 0.96594 0.97739 0.98966 0.95661 1.02243 0.90563 1.02101 1.08147 30 HSD11B2 ILMN_1813350 0.90417 1.04875 1.02574 0.98191 0.97581 0.92317 1.09101 0.99286 31 ICMT ILMN_1723021 0.82989 0.75733 0.88332 0.69496 0.77647 0.96126 1.04464 0.99585 32 KDELR3 AVERAGED 0.90795 0.67299 0.58774 0.84194 0.57887 1.54592 0.75955 0.8669 33 C14orf147 MGC112883 ILMN_1699676 0.70973 1.07276 0.91404 0.89772 0.94628 0.95749 1.00162 1.04271 34 NR4A2 ILMN_2339955 1.45565 0.95904 1.0745 1.31555 1.1799 0.95904 1.01795 1.02432 35 OS9 ILMN_2361807 1.04174 0.81057 1.04102 1.05629 0.98282 0.85619 1.01396 1.15189 36 SLC7A11 ILMN_1655229 1.02954 0.82702 0.8944 0.85382 0.92019 0.94606 1.232 0.85797 37 TMEM50B ILMN_2047599 0.73086 1.11703 1.46172 1.04512 1.17419 0.88618 0.95044 1.18729 38 CALR CRT ILMN_1736256 0.96371 1.04729 0.89296 0.7006 0.71532 1.04729 0.86254 1.10701 ds XBP1 1 DNAJB9 erDJ4 ILMN_1773742 0.68618 1.38191 1.17013 0.86854 0.58873 0.81038 1.28046 0.96371 2 SERP1 RAMP4 ILMN_1706817 1.32134 1.03168 1.3491 1.15509 0.88638 0.85619 1.02172 1.14314 3 XBP1 AVERAGED 0.55384 0.65143 1.95507 0.59121 0.66748 0.75096 1.10725 1.20375 4 MGAT2 AVERAGED 1.09359 0.95963 1.27363 0.95839 1.02437 0.98279 0.9779 1.04915 5 MAP3K8 EST ILMN_1741159 1.09429 1.25701 0.8179 1.32869 1.05702 1.1487 1.04972 0.82932 6 SULT1E1 EST ILMN_1704163 0.84109 0.98101 1.1829 0.97965 1.03766 0.92424 1.04343 1.03694 7 MGP ILMN_2071809 1.05117 1.10496 0.98966 1.00835 1.00696 1.02811 0.98146 0.99103 8 SFRS1 ILMN_1795341 0.84479 0.83316 1.34412 0.83316 0.97041 0.85658 0.91806 1.27162 9 GCS1 MOGS ILMN_1727642 1.184 0.83085 0.9511 0.88006 0.75559 1.01302 1.15082 0.85777

200

1q21.1 Illumina Expression Array (Fold Change) Gene Probe A1 A2 A3 D1 D2 Female Female Male Control 1 Control 2 Control 1 ds XBP1 cont. 13 ICAM1 Intercellular adhesion ILMN_1812226 1.02385 0.92723 1.10114 1.02456 1.24143 1.05263 0.93174 1.0196 molecule

14 IFRD1 Interferon-related AVERAGED 1.39229 1.15512 0.99639 1.19821 1.20863 0.98275 1.04118 0.98836 developmental regulator 1

15 UBE2E1 Ubiquitin-conjugating AVERAGED 0.99949 1.22827 1.11896 1.09684 1.04207 0.91447 1.16984 0.94805 enzyme E2E 1

16 PDIA3 GRP58, Phospholipase C ILMN_1669753 0.93174 1.04899 1.0203 0.97536 0.98078 1.00486 0.94671 1.05117

17 PDIA6 P5, PDI-P5 ILMN_1680626 0.58385 0.98396 0.75716 0.66465 0.62402 0.99769 1.03288 0.97041 ds ATF6&XBP1

1 DNAJC3 p58IPK ILMN_1659843 0.9876 1.01255 0.77378 0.96393 0.68682 0.96527 1.01607 1.0196 2 DNAJB11 Erdj3, HEDJ ILMN_1753243 0.55607 0.8971 1.05214 0.54841 0.63434 0.83702 1.05214 1.1355 3 HSPA5 BiP ILMN_1773865 0.48498 0.79333 1.29235 0.59543 0.64216 0.80218 1.00069 1.24574 4 EDEM1 edem ILMN_1779828 1.17013 0.99769 1.00463 0.69096 0.89917 0.91806 0.97041 1.12246 5 HERPUD1 Herp ILMN_2374159 0.75611 0.68618 0.97041 0.81038 0.56907 1.07674 0.81038 1.14605 6 ARMET MANF ILMN_2183510 0.3503 0.94387 1.16205 0.50933 0.51644 0.81038 0.97041 1.27162

IRE1 - upregulated & downregulated

1 BHLHA15 MIST1; BHLHB8 ILMN_1673590 0.8870 1.0629 1.1696 0.9227 0.9781 0.9461 1.0777 0.9808 2 FKBP11 FKBP19 ILMN_1787345 0.3618 0.7756 1.2255 0.6213 0.4424 0.7649 1.1199 1.1674 3 TMED3 p26; P24B; C15orf22 ILMN_1719316 0.9931 1.1408 0.5471 1.3013 0.7320 1.1892 1.0070 0.8351 4 SMIM14 C4orf34 ILMN_2224907 0.5935 1.7080 1.5182 1.0413 0.7649 0.7580 1.2331 1.0698 5 ZNF25 Zfp9; KOX19 ILMN_1808765 1.2263 1.0507 0.8130 1.1339 0.9947 1.1014 1.0305 0.8811 6 EDEM2 C20orf31; C20orf49; ILMN_1711909 0.5195 1.0009 0.9940 0.5915 0.5375 0.8511 1.1261 1.0434 bA4204.1

201

1q21.1 Illumina Expression Array (Fold Change) Gene Probe A1 A2 A3 D1 D2 Female Female Male Control 1 Control 2 Control 1 IRE1 - upregulated & downregulated cont. 7 SELM SEPM ILMN_1651429 0.6454 0.6590 0.5642 0.5989 0.5721 1.2793 0.8335 0.9378 8 OAT OKT; GACR; HOGA; OATASE ILMN_2068747 0.7686 1.0255 0.5001 0.5685 0.6450 1.0142 0.9715 1.0149

9 CALU ILMN_1727194 1.0293 0.9485 0.7720 0.9504 0.7992 1.2489 0.9624 0.8320 10 PPAPDC1B DPPL1; HTPAP ILMN_1675406 0.9326 1.2306 1.3327 1.1682 1.0362 1.0312 0.9662 1.0037 11 GGCT GGC; GCTG; CRF21; C7orf24 ILMN_2101526 1.6586 0.9395 0.8123 1.2226 0.8236 0.8351 1.2746 0.9395

12 SEC23B CDAII; CDAN2; CDA-II; AVERAGED 0.7633 0.8676 0.9613 0.8793 0.8355 1.0333 0.9076 1.0705 HEMPAS

13 AMIGO3 AMIGO-3 ILMN_1738233 1.0203 0.9781 1.0147 1.0007 1.0168 1.0077 1.0875 0.9126 14 TMEM66 XTP3; SARAF; FOAP-7; ILMN_1780141 1.3348 1.1701 1.5764 1.1947 1.0693 0.8566 1.0473 1.1147 HSPC035

15 RINT1 RINT-1 ILMN_1784584 1.0918 0.9361 0.8337 1.0236 1.0144 1.1548 0.9393 0.9219 16 UAP1 AGX; AGX1; AGX2; SPAG2 ILMN_1742461 1.1080 0.9266 1.4123 1.1157 0.8247 1.0345 0.8919 1.0837

17 PDIA2 PDI; PDA2; PDIP; PDIR ILMN_1804444 0.9833 1.0656 0.9867 0.9413 0.9010 0.9887 1.0524 0.9610

18 TM6SF1 ILMN_1750961 0.9439 1.3156 0.4406 0.6120 0.8337 0.9105 1.3247 0.8291 19 TXNDC11 EFP1 ILMN_1771862 0.8505 0.9562 1.7080 0.7394 0.7358 0.8799 0.9947 1.1426 20 ISG20 CD25; HEM45 ILMN_1659913 0.8160 0.9374 2.0946 0.8048 1.0187 0.7457 1.2454 1.0767 21 GGCX VKCFD1 ILMN_1758232 1.0163 1.0609 0.9919 0.9450 0.9648 1.0676 1.0405 0.9002 22 SEC61A1 SEC61; HSEC61; SEC61A ILMN_1659564 0.4988 1.1620 1.3348 0.8217 0.6358 0.7937 1.2030 1.0473

23 SLC39A7 KE4; HKE4; ZIP7; RING5; H2- AVERAGED 1.0252 0.9028 1.0301 1.1592 0.9594 0.9908 1.0693 0.9443 KE4; D6S115E; D6S2244E

24 FGB ILMN_2114972 1.1006 0.9830 0.7897 0.8301 1.0016 0.9824 1.0507 0.9688 202

1q21.1 Illumina Expression Array (Fold Change) Gene Probe A1 A2 A3 D1 D2 Female Female Male Control 1 Control 2 Control 1 IRE1 - upregulated & downregulated cont. 25 TMEM39A ILMN_1770373 1.0154 1.0666 1.1408 0.9395 0.8456 0.9304 1.0958 0.9808 26 ARFGAP3 ILMN_2227800 1.1903 0.9851 1.2348 0.8469 1.0072 0.9769 1.0220 1.0016 27 ZCCHC12 SIZN; SIZN1; PNMA7A ILMN_1679984 1.0986 0.9617 1.0322 1.0857 1.0165 1.0165 1.0025 0.9812

28 SLC30A7 ZNT7; ZnT-7; ZnTL2 ILMN_1789999 1.0958 1.0733 1.4162 1.0295 1.1909 0.9176 0.9673 1.1266 29 SSR1 TRAPA ILMN_1750693 1.0305 0.9954 1.1837 0.8312 0.7702 0.9033 0.9287 1.1920 30 OSTC DC2 ILMN_2056167 1.2283 1.4012 1.2198 0.9977 0.9840 1.0187 0.8390 1.1701 31 FAM114A1 Noxp20 ILMN_1694070 1.1352 0.9304 0.8276 1.0295 0.8938 1.1631 0.9063 0.9487 32 SEC11C SPC21; SPCS4C; SEC11L3 ILMN_1701681 0.5297 0.8971 1.1121 0.5757 0.7388 0.9954 0.8312 1.2086

33 TMED9 p25; GMP25; HSGP25L2G ILMN_1743655 0.7596 0.9817 1.1199 0.8665 0.6127 0.9352 1.1837 0.9033

34 HLA-G MHC-G ILMN_1656670 0.7203 0.6271 1.5227 0.7405 0.7509 1.0842 1.3074 0.7055 35 GMDS GMD; SDR3E1 ILMN_1711227 0.7104 1.3441 1.2454 1.0693 0.7720 0.9571 0.9571 1.0918 36 GMPPB AVERAGED 0.7384 0.8871 1.0057 0.8332 0.8863 0.9455 0.9461 1.1391 37 TMEM120A NET29; TMPIT ILMN_1654516 0.7992 0.5597 0.4845 0.7551 0.5616 0.8954 2.7638 0.4041 38 MAGT1 IAP; XMEN; MRX95; OST3B; ILMN_1721349 1.6133 1.4142 1.2226 1.4142 1.2658 1.0570 1.1487 0.8236 PRO0756; bA217H1.1; RP11- 217H1.1

39 PAOX PAO AVERAGED 0.8703 0.9201 1.1987 0.8796 1.0246 0.9146 1.0746 1.0195 40 ERCC6L2 SR278; RAD26L; C9orf102 ILMN_1753190 0.8983 0.7676 1.2323 1.1788 1.0565 0.8878 0.9450 1.1920

41 GCG GLP1; GLP2; GRPP ILMN_1724396 1.0288 0.9747 0.9606 1.0718 1.0105 1.1471 0.9727 0.8963 42 BCAM AU; LU; CD239; MSK19 AVERAGED 0.9715 1.0480 0.9335 0.9317 0.9619 1.0306 0.9845 0.9882

43 KCNIP1 VABP; KCHIP1 AVERAGED 0.9951 1.0595 1.0374 0.9878 1.0388 0.9787 0.9951 1.0280 203

Gene Probe A1 A2 A3 D1 D2 Female Female Male Control 1 Control 2 Control 1 IRE1 - upregulated & downregulated cont. 44 NEDD1 GCP-WD; TUBGCP7 ILMN_2206151 1.5900 0.8730 1.1655 1.0681 1.3794 1.0733 0.9938 0.9376 45 MAOB ILMN_1727360 0.8661 0.9017 0.9617 1.1629 0.9328 0.9458 1.0039 1.0531

Other

1 WFS1 WFRS, WFS, WFSL ILMN_1759023 0.3911 1.0060 0.8465 0.5240 0.4574 0.9684 1.0236 1.0088

Genes with expression ratios >1.28 are in blue font with blue fill and genes with expression ratios <0.72 are in pink font with pink fill. ds=downstream

204

Supplementary Table 3.2 Genomic Characteristics of 1q21.1 CNVs and Associated Phenotypes in CNV Carriers

Family Subject Deletion/ Breakpoints Breakpoints FISH Size (kb) Genes Prenatal findings/ Mild- Language Growth delay Craniofacial Hands and Feet Heart Eye Neurological Behavioural Organs/ Duplicat- Original Affymetrix clones newborn period Moderate delays dysmorphisms Abnormalities Anomalies anomalies Systems ion detection 2.7M Intellectual defects method Disability (ID) A 1-proband 1q21.1 145,110,000 - 144,966,524 - RP11- 1,329.67 LOC728989, Pregnancy Yes Severe Short stature; Microcephaly, , None None Headaches Anger and Grade 4 Deletion 146,190,000 & 146,296,189 433J22 PRKAB2, complicated by articulation at age 10 Doliocephaly, prominent associated frustration right PDIA3P, hyperemesis and disorder; height and upslanting fingertip pads, with associated kidney FMO5, pre-eclampsia; spoke first weight at 3rd palpebral broad thumbs, photophobia with reflux CHD1L, BCL9, mother words and head fissures, high hypermobile and nausea difficulty to with a ACP6, GJA5, hospitalized for between circumference and prominent joints, especially express duplex GJA8, hypertension, mild 14-18 at 10th forehead, of mid-phalangeal himself ureter, GPR89B, edema and months; percentile bilateral and metatarsal tortuous GPR89C, thrombocytopenia. put two epicanthal joints ureter, PDZK1P1, Proband born by C words folds, mild dilated NBPF11 section due to together at retrognathia; renal maternal age 3 long eyelashes, pelvis hypertension upslanting palpebral fissures 2- 1q21.1 FISH 144,966,524 - RP11- 1,329.67 LOC728989, Born at 37 weeks, Learning No Short stature; mild posterior Hyperextensibility Mitral None Memory Panic Possible proband's Deletion 146,296,189 433J22 PRKAB2, and required difficulties; at 40 years rotation of at mid-phalangeal valve problems, attacks, Meckel's mother PDIA3P, hospital/ repeated height at 5 both ears joints and elbows anomaly, difficulty Restless leg diver- FMO5, incubator care for grade 2; percentile, ventricular expressing syndrome ticulum CHD1L, BCL9, one month required weight at 75 septal thought, removed ACP6, GJA5, learning percentile, defect. migraines at 16yrs. GJA8, assistance head Re- GPR89B, until grade circumference current GPR89C, 10 at 25th otitis PDZK1P1, percentile media NBPF11 requiring tympano- stomy tubes,

3- 1q21.1 FISH 144,975,442 - RP11- 1,322.02 LOC728989, unknown No; No known Short stature; None None None None Migraines None None proband's Deletion 146,297,463 433J22 PRKAB2, finished delay at 59 years maternal PDIA3P, 11/12 <3 rd grand- FMO5, grades percentile mother CHD1L, BCL9, ACP6, GJA5, GJA8, GPR89B, GPR89C, PDZK1P1, NBPF11

205

Family Subject Deletion/ Breakpoints Breakpoints FISH Size (kb) Genes Prenatal findings/ Mild- Language Growth delay Craniofacial Hands and Feet Heart Eye Neurological Behavioural Organs/ Duplicat- Original Affymetrix clones newborn period Moderate delays dysmorphisms Abnormalities Anomalies anomalies Systems ion detection 2.7M Intellectual defects method Disability (ID) B 1-proband 1q21.1 144,798,337- 144,967,161 - RP11- 1,329.03 LOC728989, Toxemia, maternal Yes; Grade Significant Short stature; High and Small hands, None Strabismus Schizophrenia ADHD Mild Deletion 146,290,832** 146,296,189 242B17; PRKAB2, smoking; born at 6 - at 26 years prominent short fingers hearing RP11- PDIA3P, 37 weeks, 3178g; education; expressive height 3rd and forehead, low loss, 533N14 FMO5, induced VD with PIQ~VIQ at and head set simple ears, recurrent CHD1L, BCL9, forceps after age 26 receptive; circumference mild ear ACP6, GJA5, spontaneous ROM; years delays by 50th hypertelorism, infect- GJA8, spina bifida occulta 5 years; percentile, narrow ions GPR89B, L4-5; laminectomy VIQ

Family Subject Deletion/ Breakpoints Breakpoints FISH Size (kb) Genes Prenatal findings/ Mild- Language Growth delay Craniofacial Hands and Feet Heart Eye Neurological Behavioural Organs/ Duplicat- Original Affymetrix clones newborn period Moderate delays dysmorphisms Abnormalities Anomalies anomalies Systems ion detection 2.7M Intellectual defects method Disability (ID) B 3 LOC728989, proband's PRKAB2, brother PDIA3P, cont. FMO5, CHD1L, BCL9, ACP6, GJA5, GJA8, GPR89B, GPR89C, PDZK1P1, NBPF11 (del C 1-proband 1q21.1 144,510,700 - 145,056,290 - RP11- 1,241.17 PRKAB2, Normal pregnancy Yes First At 5 years, Appearance of Persistent toe None None Normal MRI Mild ADHD, Mal- Duplica- 146,294,854 * 146,297,463 433J22 PDIA3P, sentences height, weight macrocephaly, and fingertip not treated rotated tion FMO5, at 3 years and HC was at ear pit, pads, mild bowel, CHD1L, BCL9, of age the 25th, 50th prominence of hypotonia consti- ACP6, GJA5, and 98 the forehead, pation; GJA8, percentile; midfacial normal GPR89B, hypoplasia, renal GPR89C, short upturned ultra- PDZK1P1, nose sound NBPF11 2- 1q21.1 FISH 144,963,437 - RP11- 1,332.54 LOC728989, unknown Learning None None None None None None None Mild ADHD None proband's Duplicat- 146,295,976 433J22 PRKAB2, challenges, as a child, father ion PDIA3P, had to not treated FMO5, work hard CHD1L, BCL9, to pass ACP6, GJA5, grades, GJA8, completed GPR89B, grade 12 GPR89C, and had PDZK1P1, some NBPF11 college training

Original array platforms: Nimblegen 385K Whole-Genome Tiling array (§ ), Agilent 105K whole genome oligonucleotide array (*), Signature Genomics SignatureChipWGTM 1.1 (**); SVD = spontaneous vaginal delivery, PIQ = Performance IQ; VIQ = Verbal IQ RDS = respiratory distress syndrome

207

Supplementary Table 3.3 Top 100 Genes from Expression/Copy Number Correlation Analysis

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

chromodomain helicase DNA binding protein 1-like (CHD1L),

CHD1L mRNA. 1 1q21.1c + ILMN_1786016 NM_004284.3 NP_004275.3 2.42E-05 Positive

tripartite motif-containing 6 (TRIM6), transcript variant 2,

TRIM6 mRNA. 11 11p15.4c + ILMN_1656910 NM_058166.3 NP_477514.1 5.53E-05 Positive

FMNL2 formin-like 2 (FMNL2), mRNA. 2 2q23.3d + ILMN_1730491 NM_052905.3 NP_443137.2 7.31E-05 Negative

chromosome 1 open reading frame 2 (C1orf2), transcript

C1orf2 variant 1, mRNA. 1 1q22a - ILMN_1795026 NM_006589.2 NP_006580.2 2.02E-04 Positive

RPL10 ribosomal protein L10 (RPL10), mRNA. X Xq28g + ILMN_2084182 NM_006013.2 NP_006004.1 2.41E-04 Positive

PREDICTED: major histocompatibility complex, class II, DQ

HLA-DQA1 alpha 1, transcript variant 10 (HLA-DQA1), mRNA. 6p21.32b ILMN_1808405 XM_936128.2 XP_941221.1 2.73E-04 Positive

ZNF559 zinc finger protein 559 (ZNF559), mRNA. 19 19p13.2c + ILMN_1677785 NM_032497.1 NP_115886.1 2.89E-04 Positive

eukaryotic translation initiation factor 1A, X-linked (EIF1AX),

EIF1AX mRNA. X Xp22.12b - ILMN_1813240 NM_001412.3 NP_001403.1 3.44E-04 Negative

UBD ubiquitin D (UBD), mRNA. 6 6p22.1a - ILMN_1678841 NM_006398.2 NP_006389.1 3.66E-04 Negative

protein kinase, AMP-activated, beta 2 non-catalytic subunit

PRKAB2 (PRKAB2), mRNA. 1 1q21.1c - ILMN_1786021 NM_005399.3 NP_005390.1 5.97E-04 Positive

SAV1 salvador homolog 1 (Drosophila) (SAV1), mRNA. 14 14q22.1b - ILMN_2050654 NM_021818.2 NP_068590.1 6.25E-04 Negative

SIGLEC10 sialic acid binding Ig-like lectin 10 (SIGLEC10), mRNA. 19 19q13.33d - ILMN_1655549 NM_033130.2 NP_149121.2 7.08E-04 Positive

radial spoke head 1 homolog (Chlamydomonas) (RSPH1),

RSPH1 mRNA. 21 21q22.3b - ILMN_1684571 NM_080860.2 NP_543136.1 7.08E-04 Positive

208

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

open reading frame 124 (C14orf124),

C14orf124 mRNA. 14 14q12a - ILMN_1771629 NM_020195.1 NP_064580.1 7.57E-04 Positive

protein phosphatase 2, regulatory subunit B', alpha isoform

PPP2R5A (PPP2R5A), mRNA. 1 1q32.3b + ILMN_1738784 NM_006243.2 NP_006234.1 8.30E-04 Positive

LYRM1 LYR motif containing 1 (LYRM1), mRNA. 16 16p12.2c + ILMN_1749244 NM_020424.2 NP_065157.1 8.51E-04 Positive

solute carrier family 10 (sodium/bile acid cotransporter NM_001029998 NP_001025169.

SLC10A7 family), member 7 (SLC10A7), transcript variant 2, mRNA. 4 4q31.22b - ILMN_1732489 .2 1 9.00E-04 Negative

GPR63 G protein-coupled receptor 63 (GPR63), mRNA. 6 6q16.1f - ILMN_1653648 NM_030784.1 NP_110411.1 9.17E-04 Positive

BCAS1 breast carcinoma amplified sequence 1 (BCAS1), mRNA. 20 20q13.2c - ILMN_1733042 NM_003657.1 NP_003648.1 9.42E-04 Positive

VCAN versican (VCAN), mRNA. 5 5q14.3a + ILMN_1687301 NM_004385.2 NP_004376.2 9.49E-04 Positive

SH2B2 SH2B adaptor protein 2 (SH2B2), mRNA. 7 7q22.1e + ILMN_1669833 NM_020979.2 NP_066189.2 9.86E-04 Negative

RAP1, GTP-GDP dissociation stimulator 1 (RAP1GDS1),

RAP1GDS1 mRNA. 4 4q23a + ILMN_2106167 NM_021159.3 NP_066982.2 1.28E-03 Negative

2',5'-oligoadenylate synthetase 1, 40/46kDa (OAS1),

OAS1 transcript variant 1, mRNA. 12 12q24.13b + ILMN_1672606 NM_016816.2 NP_058132.2 1.33E-03 Positive

BLK B lymphoid tyrosine kinase (BLK), mRNA. 8 8p23.1b + ILMN_1668277 NM_001715.2 NP_001706.2 1.34E-03 Negative

IDUA iduronidase, alpha-L- (IDUA), mRNA. 4 4p16.3c + ILMN_1703041 NM_000203.3 NP_000194.2 1.35E-03 Negative

glucocorticoid modulatory element binding protein 2

GMEB2 (GMEB2), mRNA. 20 20q13.33e - ILMN_1683204 NM_012384.2 NP_036516.1 1.38E-03 Positive

blocked early in transport 1 homolog (S. cerevisiae) (BET1),

BET1 mRNA. 7 7q21.3a - ILMN_1684042 NM_005868.4 NP_005859.1 1.43E-03 Negative

DSCR6 Down syndrome critical region gene 6 (DSCR6), mRNA. 21 21q22.13a + ILMN_1709257 NM_018962.1 NP_061835.1 1.43E-03 Positive

209

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

discs, large homolog 3 (neuroendocrine-dlg, Drosophila)

DLG3 (DLG3), transcript variant 1, mRNA. X Xq13.1c + ILMN_2336728 NM_021120.2 NP_066943.2 1.47E-03 Positive

GPR89A G protein-coupled receptor 89A (GPR89A), mRNA. 1 1q21.1d + ILMN_2116594 NM_016334.2 NP_057418.1 1.55E-03 Positive

ribosomal protein L23a pseudogene 13 (RPL23AP13), non-

RPL23AP13 coding RNA. 2 2p16.2a + ILMN_2124757 NR_002229.1 1.64E-03 Negative

MRFAP1 Mof4 family associated protein 1 (MRFAP1), mRNA. 4 4p16.1f + ILMN_2055165 NM_033296.1 NP_150638.1 1.74E-03 Positive

BTBD12 BTB (POZ) domain containing 12 (BTBD12), mRNA. 16 16p13.3c - ILMN_1732885 NM_032444.2 NP_115820.2 1.74E-03 Positive

TUBB4Q tubulin, beta polypeptide 4, member Q (TUBB4Q), mRNA. 4 4q35.2d - ILMN_1750100 NM_020040.3 NP_064424.3 1.75E-03 Positive

ZNF626 zinc finger protein 626 (ZNF626), transcript variant 2, mRNA. 19 19p12d - ILMN_1794823 NM_145297.3 NP_660340.1 1.81E-03 Positive

COL24A1 collagen, type XXIV, alpha 1 (COL24A1), mRNA. 1 1p22.3d - ILMN_1810996 NM_152890.4 NP_690850.1 1.85E-03 Positive

ArfGAP with coiled-coil, ankyrin repeat and PH domains 3

ACAP3 (ACAP3), mRNA. 1 1p36.33a - ILMN_1743847 NM_030649.1 NP_085152.1 1.88E-03 Negative

EFHA1 EF-hand domain family, member A1 (EFHA1), mRNA. 13 13q12.11c - ILMN_1738346 NM_152726.1 NP_689939.1 1.91E-03 Negative

SH3 domain binding glutamic acid-rich protein like 3

SH3BGRL3 (SH3BGRL3), mRNA. 1 1p36.11b + ILMN_1737163 NM_031286.3 NP_112576.1 1.92E-03 Negative

FYN oncogene related to SRC, FGR, YES (FYN), transcript

FYN variant 2, mRNA. 6 6q21i - ILMN_1781207 NM_153047.1 NP_694592.1 1.98E-03 Negative

lysosomal-associated membrane protein 2 (LAMP2),

LAMP2 transcript variant LAMP2A, mRNA. X Xq24d - ILMN_2243687 NM_002294.1 NP_002285.1 2.00E-03 Negative

BCL9 B-cell CLL/lymphoma 9 (BCL9), mRNA. 1 1q21.1c + ILMN_1704452 NM_004326.2 NP_004317.2 2.01E-03 Positive

carboxypeptidase X (M14 family), member 1 (CPXM1),

CPXM1 mRNA. 20 20p13c - ILMN_1712046 NM_019609.3 NP_062555.1 2.13E-03 Positive

210

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

TUBB6 tubulin, beta 6 (TUBB6), mRNA. 18 18p11.21e + ILMN_1702636 NM_032525.1 NP_115914.1 2.21E-03 Positive

complement component 4 binding protein, beta (C4BPB), NM_001017365 NP_001017365.

C4BPB transcript variant 3, mRNA. 1 1q32.2a + ILMN_1694588 .1 1 2.27E-03 Positive

copper metabolism (Murr1) domain containing 1

COMMD1 (COMMD1), mRNA. 2 2p15c + ILMN_1761242 NM_152516.2 NP_689729.1 2.41E-03 Positive

KIAA0247 KIAA0247 (KIAA0247), mRNA. 14 14q24.1e + ILMN_2226917 NM_014734.2 NP_055549.1 2.65E-03 Negative

nucleotide binding protein 1 (MinD homolog, E. coli)

NUBP1 (NUBP1), mRNA. 16 16p13.13d + ILMN_1689342 NM_002484.2 NP_002475.2 2.69E-03 Positive

Smg-7 homolog, nonsense mediated mRNA decay factor (C.

SMG7 elegans) (SMG7), transcript variant 4, mRNA. 1 1q25.3e + ILMN_2368597 NM_201569.1 NP_963863.1 2.98E-03 Negative

CRYZ crystallin, zeta (quinone reductase) (CRYZ), mRNA. 1 1p31.1h - ILMN_1672389 NM_001889.2 NP_001880.2 3.07E-03 Positive

vesicle-associated membrane protein 5 (myobrevin)

VAMP5 (VAMP5), mRNA. 2 2p11.2f + ILMN_1809467 NM_006634.2 NP_006625.1 3.17E-03 Negative

matrix metallopeptidase 7 (matrilysin, uterine) (MMP7),

MMP7 mRNA. 11 11q22.2a - ILMN_2192072 NM_002423.3 NP_002414.1 3.22E-03 Positive

acyl-Coenzyme A dehydrogenase family, member 10

ACAD10 (ACAD10), mRNA. 12 12q24.12b + ILMN_1687303 NM_025247.4 NP_079523.3 3.23E-03 Positive

LDHC lactate dehydrogenase C (LDHC), transcript variant 2, mRNA. 11 11p15.1c + ILMN_2326324 NM_017448.1 NP_059144.1 3.29E-03 Negative

phosphatidylinositol-4-phosphate 5-kinase, type I, gamma

PIP5K1C (PIP5K1C), mRNA. 19 19p13.3e - ILMN_1668514 NM_012398.1 NP_036530.1 3.29E-03 Negative

XRN2 5'-3' exoribonuclease 2 (XRN2), mRNA. 20 20p11.22b + ILMN_2196479 NM_012255.3 NP_036387.2 3.30E-03 Negative

JMJD1B jumonji domain containing 1B (JMJD1B), mRNA. 5 5q31.2c + ILMN_1706539 NM_016604.3 NP_057688.2 3.32E-03 Positive

211

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

KIF3B kinesin family member 3B (KIF3B), mRNA. 20 20q11.21b + ILMN_2081398 NM_004798.2 NP_004789.1 3.44E-03 Negative

Ras association (RalGDS/AF-6) domain family member 6

RASSF6 (RASSF6), transcript variant 1, mRNA. 4 4q13.3d - ILMN_1657381 NM_177532.3 NP_803876.1 3.45E-03 Positive

protein disulfide isomerase family A, member 3 pseudogene

PDIA3P (PDIA3P), non-coding RNA. 1 1q21.1c + ILMN_2075436 NR_002305.1 3.48E-03 Negative

uncoupling protein 2 (mitochondrial, proton carrier) (UCP2),

UCP2 nuclear gene encoding mitochondrial protein, mRNA. 11 11q13.4b - ILMN_1685625 NM_003355.2 NP_003346.2 3.52E-03 Negative

ITM2A integral membrane protein 2A (ITM2A), mRNA. X Xq21.1b - ILMN_2076602 NM_004867.3 NP_004858.1 3.75E-03 Negative

G protein-coupled receptor 56 (GPR56), transcript variant 1,

GPR56 mRNA. 16 16q13d + ILMN_1697228 NM_005682.4 NP_005673.3 3.80E-03 Negative

immunoglobulin-like domain containing receptor 1 (ILDR1),

ILDR1 mRNA. 3 3q13.33c - ILMN_2043079 NM_175924.2 NP_787120.1 3.91E-03 Negative

sulfatase modifying factor 2 (SUMF2), transcript variant 4, NM_001042470 NP_001035935.

SUMF2 mRNA. 7 7p11.2b + ILMN_1685371 .1 2 3.95E-03 Negative

FLJ22662 hypothetical protein FLJ22662 (FLJ22662), mRNA. 12 12p13.1a - ILMN_1707286 NM_024829.4 NP_079105.3 3.99E-03 Positive

1p34.3e-

EIF2C1 eukaryotic translation initiation factor 2C, 1 (EIF2C1), mRNA. 1 p34.3d + ILMN_1671326 NM_012199.2 NP_036331.1 4.20E-03 Positive

COL4A4 collagen, type IV, alpha 4 (COL4A4), mRNA. 2 2q36.3b - ILMN_1778308 NM_000092.4 NP_000083.3 4.25E-03 Negative

leucine rich repeat containing 8 family, member B (LRRC8B),

LRRC8B mRNA. 1 1p22.2c + ILMN_1712128 NM_015350.1 NP_056165.1 4.35E-03 Negative

TRAPPC6A trafficking protein particle complex 6A (TRAPPC6A), mRNA. 19 19q13.32a - ILMN_1775703 NM_024108.1 NP_077013.1 4.38E-03 Positive

212

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

UDP-GlcNAc:betaGal beta-1,3-N-

B3GNT4 acetylglucosaminyltransferase 4 (B3GNT4), mRNA. 12 12q24.31c + ILMN_1771260 NM_030765.2 NP_110392.1 4.40E-03 Negative

T-cell /lymphoma 1B (TCL1B), transcript variant 1,

TCL1B mRNA. 14 14q32.13b + ILMN_2382309 NM_004918.2 NP_004909.1 4.51E-03 Positive

GIMAP2 GTPase, IMAP family member 2 (GIMAP2), mRNA. 7 7q36.1c + ILMN_2135272 NM_015660.2 NP_056475.1 4.58E-03 Negative

11q13.3c-

CTTN cortactin (CTTN), transcript variant 2, mRNA. 11 q13.3d + ILMN_2393712 NM_138565.1 NP_612632.1 4.66E-03 Positive

TARS threonyl-tRNA synthetase (TARS), mRNA. 5 5p13.3a + ILMN_1685480 NM_152295.3 NP_689508.3 4.72E-03 Negative

MYH10 myosin, heavy chain 10, non-muscle (MYH10), mRNA. 17 17p13.1c - ILMN_1815154 NM_005964.1 NP_005955.1 4.77E-03 Positive

family with sequence similarity 20, member C (FAM20C),

FAM20C mRNA. 7p22.3d ILMN_1712684 NM_020223.2 NP_064608.2 4.87E-03 Negative

OMG oligodendrocyte myelin glycoprotein (OMG), mRNA. 17 17q11.2d - ILMN_1739235 NM_002544.3 NP_002535.3 4.88E-03 Positive

MYO1D myosin ID (MYO1D), mRNA. 17 17q11.2e - ILMN_1805999 NM_015194.1 NP_056009.1 5.12E-03 Negative

H2A histone family, member Y (H2AFY), transcript variant 2,

H2AFY mRNA. 5 5q31.1f - ILMN_2275437 NM_004893.2 NP_004884.1 5.12E-03 Negative

oxysterol binding protein-like 2 (OSBPL2), transcript variant

OSBPL2 2, mRNA. 20 20q13.33c + ILMN_1656482 NM_144498.1 NP_653081.1 5.13E-03 Negative

ACP6 acid phosphatase 6, lysophosphatidic (ACP6), mRNA. 1 1q21.1c - ILMN_2234343 NM_016361.2 NP_057445.2 5.14E-03 Positive

DCTN6 dynactin 6 (DCTN6), mRNA. 8 8p12e + ILMN_2204983 NM_006571.2 NP_006562.1 5.23E-03 Negative

baculoviral IAP repeat-containing 3 (BIRC3), transcript

BIRC3 variant 1, mRNA. 11 11q22.2a + ILMN_1776181 NM_001165.3 NP_001156.1 5.31E-03 Positive

213

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

IQCB1 IQ motif containing B1 (IQCB1), transcript variant 3, mRNA. 3 3q13.33c - ILMN_2316104 NM_001023571.1 NP_001018865.1 5.44E-03 Negative

transforming growth factor beta regulator 4 (TBRG4),

TBRG4 transcript variant 2, mRNA. 7 7p13c - ILMN_2414848 NM_030900.2 NP_112162.1 5.53E-03 Positive

RTKN rhotekin (RTKN), transcript variant 2, mRNA. 2 2p13.1b - ILMN_1680591 NM_033046.2 NP_149035.1 5.54E-03 Positive

ATMIN ATM interactor (ATMIN), mRNA. 16 16q23.2b + ILMN_2223720 NM_015251.2 NP_056066.2 5.66E-03 Negative

ubiquitin-conjugating enzyme E2D 4 (putative) (UBE2D4),

UBE2D4 mRNA. 7 7p13e + ILMN_1707084 NM_015983.2 NP_057067.1 5.85E-03 Positive

TBX2 T-box 2 (TBX2), mRNA. 17 17q23.2b + ILMN_1792256 NM_005994.3 NP_005985.3 5.92E-03 Negative

acyl-CoA synthetase medium-chain family member 3

ACSM3 (ACSM3), transcript variant 1, mRNA. 16 16p12.2c + ILMN_1685952 NM_005622.3 NP_005613.2 5.94E-03 Positive

integrin, beta 1 (fibronectin receptor, beta polypeptide,

antigen CD29 includes MDF2, MSK12) (ITGB1), transcript

ITGB1 variant 1A, mRNA. 10 10p11.22b - ILMN_1723467 NM_002211.2 NP_002202.2 6.02E-03 Negative

NM_001002901 NP_001002901.

FCRLB Fc receptor-like B (FCRLB), mRNA. 1 1q23.3b + ILMN_1782015 .2 1 6.20E-03 Negative

TMEM30A transmembrane protein 30A (TMEM30A), mRNA. 6 6q14.1a - ILMN_1735680 NM_018247.2 NP_060717.1 6.49E-03 Positive

dipeptidyl-peptidase 4 (CD26, adenosine deaminase

DPP4 complexing protein 2) (DPP4), mRNA. 2 2q24.2d - ILMN_1692535 NM_001935.3 NP_001926.2 6.56E-03 Negative

phenazine biosynthesis-like protein domain containing NM_001033083 NP_001028255.

PBLD (PBLD), transcript variant 2, mRNA. 10 10q21.3d - ILMN_2304404 .1 1 6.81E-03 Positive

214

Gene ID Gene Name Chr Cytoband Strand Illumina Probe Accession Protein Product p value Correlation

COX10 homolog, cytochrome c oxidase assembly protein,

heme A: farnesyltransferase (yeast) (COX10), nuclear gene

COX10 encoding mitochondrial protein, mRNA. 17 17p12b + ILMN_1670901 NM_001303.2 NP_001294.2 6.82E-03 Positive

calcium/calmodulin-dependent protein kinase I (CAMK1),

CAMK1 mRNA. 3 3p25.3c - ILMN_2140990 NM_003656.3 NP_003647.1 6.88E-03 Negative

8q22.1d-

PGCP plasma glutamate carboxypeptidase (PGCP), mRNA. 8 q22.1e + ILMN_2058795 NM_016134.2 NP_057218.1 6.91E-03 Negative

Genes in the most commonly deleted 1q21.1 CNV region are highlighted in grey.

215

Supplementary Figure 3.1 Correlation of Expression and Copy Number for Probes from All Chromosomes

The negative log10 p values from the correlation of gene expression and 1q21.1 copy number for all probes across all chromosomes are shown in in this figure.

216

Appendix C Supplementary Tables and Figures for Chapter 4

Supplementary Table 4.1 Closest Known or Predicted Imprinted Gene(s) to CNVs from Individuals with ID

Status/ Expressed Chr Position (parent Gene/ Chr Start Chr Stop Source Lab ID Allele/ Origin transmitting CNV) Prioritized Gene from CNV Predicted Paternal 1p34 24,161,566 24,204,820 FUCA1 GeneImprint ARID1A, FAM46B, NR0B2, RPS6KA1, 06-01AG de novo 1p36 dup 26,575,529 27,392,834 Array CGH SLC9A1 Predicted Paternal 1p35-p32 40,213,902 40,264,532 BMP8B GeneImprint 07-53AG Familial 1p34.1dup 45,014,474 45,200,817 EIF2B3, PLK3, PTCH2, RPS8 Array CGH Predicted Paternal 1p22 92,930,317 92,962,432 GFI1 GeneImprint Common 1p21.1 103,956,050 104,113,271 AMY1A, AMY1B, AMY1C, AMY2A Array CGH Predicted Paternal 1p13.3 108,037,749 108,058,249 NDUFA4P1 GeneImprint 08-22AG Familial 1q21 dup 144,510,700 146,294,854 ACP6, FMO5, GJA5, GJA8, PRKAB2 Array CGH ACP6, BCL9, CHD1L, FMO5, GJA5, 04-18NG Familial 1q21.1 dup 145,110,000 146,190,000 Array CGH GJA8, GPR89B, PRKAB2 Predicted Maternal 1q23 161,484,035 161,506,686 HSPA6 GeneImprint Common 1q31.3 195,011,144 195,104,407 CFHR1, CFHR3 Array CGH Predicted Maternal 1q32.2 214,521,010 214,734,641 PTPN14 GeneImprint Predicted Paternal 1q42.13 228,385,860 228,576,574 OBSCN GeneImprint 06-67NG Familial 1q44 dup 244,230,000 244,530,000 SMYD3 Array CGH OR2G6, OR2M4, OR2M7, OR2T1, OR2T12, OR2T2, OR2T29, OR2T3, Common 1q44 246,442,498 246,806,773 Array CGH OR2T33, OR2T34, OR2T4, OR2T5, OR2T6 Predicted Paternal 1q44 247,994,229 248,015,197 OR11L1 GeneImprint Predicted Maternal 2p21 44,056,102 44,115,604 ABCG8 GeneImprint EFEMP1, OTX1, PEX13, PNPT1, REL, 03-02SG, 03-57SG de novo 2p15.1-16 del 55,499,483 63,368,196 Array CGH XPO1 Predicted Paternal 2p16.1 56,401,257 56,623,308 CCDC85A GeneImprint Predicted Maternal 2p13 63,267,964 63,294,313 OTX1 GeneImprint Predicted Maternal 2p13 71,117,719 71,170,575 VAX2 GeneImprint 06-108AG de novo 2p13 del 72,140,702 72,924,626 CYP26B1, EXOC6B Array CGH 06-40AG Familial 2p12 dup 78,428,495 79,496,295 REG1A, REG1B, REG3A, REG3G Array CGH Imprinted 80,382,514 80,384,998 LRRTM1 igc.otago Common 2p11.2 89,370,000 89,910,000 IGKC, IGKV1D-42 Array CGH C2orf25, EPC2, KIF5C, LYPD6, 07-21AG de novo 2q23.1 del 148,595,892 150,837,093 Array CGH LYPD6B, MBD5 Predicted Paternal 2q37.1 233,402,778 233,425,225 TIGD1 GeneImprint 07-32AG Familial 3p26.2 del 4,206,159 4,434,847 SETMAR, SUMF1 Array CGH Predicted Maternal 3q24 147,117,180 147,144,505 ZIC1 GeneImprint Common 3q26.1 163,940,972 164,109,000 No gene Array CGH Predicted Paternal 3q28 193,843,933 193,866,395 HES1 GeneImprint Common 4q13.2 68,901,010 69,166,014 TMPRSS11B, UGT2B17 Array CGH Imprinted Paternal 4q22.1 89,607,065 89629022 NAP1L5 GeneImprint Imprinted 89,836,089 89,838,046 NAP1L5 (DRLM) igc.otago 05-MDL-8AG Familial 4q32 dup 160,736,514 162,776,644 FSTL5 Array CGH 05-01AG Familial 4q35 dup 187,178,973 187,370,217 TLR3, CYP4V2, FAM149A, FLJ38576 Array CGH Predicted Maternal 5p15 5,130,442 5,330,411 ADAMTS16 GeneImprint 04-14AG Familial 5p12 del 12,517,732 12,802,101 No gene Array CGH Predicted Paternal 5p15.2 19,463,140 19,849,352 CDH18 GeneImprint

217

Status/ Expressed Chr Position (parent Gene/ Chr Start Chr Stop Source Lab ID Allele/ Origin transmitting CNV) Prioritized Gene from CNV ACOT12, CCNH, HAPLN1, MEF2C, 05-05NG de novo 5q14.1 dup 80,370,000 90,150,000 Array CGH RASA1, VCAN, XRCC4 Predicted Maternal 5q31.1 131,399,484 131,421,858 CSF2 GeneImprint Predicted Maternal 6p21.3 32,352,512 32,384,899 BTNL2 GeneImprint Common 6p21.32 32,519,735 32,673,183 HLA-DRB1, HLA-DRB5, HLA-DRB9 Array CGH Imprinted Maternal 7q22 96,477,643 96,502,078 DLX5 GeneImprint 04-38NG Familial 7q31.1 dup 113,010,000 113,430,000 PPP1R3A Array CGH 07-01AG Familial 7q31.1 dup 117,187,756 118,908,220 ANKRD7, CTTNBP2, LSM8 Array CGH Imprinted Maternal 7q32 129,922,973 129,974,019 CPA4 GeneImprint Imprinted Paternal 7q32 130,116,045 130,156,132 MEST GeneImprint Imprinted Paternal 7q32 130,116,897 130,141,012 MESTIT1 GeneImprint Paternal 7q32 130,136,079 130,363,597 COPG2 (Conflicting Data) GeneImprint Imprinted Maternal 7q32.3 130,407,477 130,428,859 KLF14 GeneImprint Predicted Maternal 7q35 150,746,656 150,783,613 SLC4A2 GeneImprint Predicted Maternal 7q35 150,763,707 150,787,950 FASTK GeneImprint EN2, MNX1, PTPRN2, SHH, UBE3C, 02-06SG de novo 7q36.3-7qter del 154,930,000 158,600,000 Array CGH VIPR2 Imprinted Paternal 8p23 1,439,568 1,666,641 DLGAP2 GeneImprint Common 8p23.1 7,261,218 8,132,339 DEFB103A, DEFB104B, DEFB106A Array CGH Predicted Paternal 8p11 30,843,320 30,901,230 PURG GeneImprint Common 8p11.23 39,341,324 39,499,952 ADAM5P Array CGH Predicted Paternal 8q12.3 63,151,500 63,913,627 NKAIN3 GeneImprint Predicted Paternal 9q21.13 74,516,422 74,598,372 C9orf85 GeneImprint 05-49NG de novo 9q21.1 dup 78,630,000 81,090,000 GNA14, GNAQ, PSAT1, VPS13A Array CGH Predicted Maternal 9q21.32 84,593,686 84,620,170 FLJ46321 GeneImprint Imprinted 9q31.1 107,533,282 107,700,435 ABCA1 GeneImprint Common 9q32 114,865,612 114,972,775 SLC31A2, FKBP15 Array CGH Predicted Maternal 9q34 129,366,747 129,473,310 LMX1B GeneImprint Predicted Paternal 9q34.3 138,377,025 138,401,760 C9orf116 GeneImprint CACNA1B, EHMT1, PNPLA7, WDR85, 06-MDL-3SG de novo 9qter del 139,523,178 140,007,383 Array CGH ZMYND19 Predicted Paternal 9q34.3 139,547,376 139,577,129 EGFL7 GeneImprint Predicted Maternal 9q34.3 139,733,255 139,755,489 PHPT1 GeneImprint ABI1, ACBD5, ANKRD26, BAMBI, 07-27AG de novo 10p12.2 del 26,714,221 29,234,923 Array CGH PDSS1, YME1L1 Common 10q11.22 45,455,157 47,735,672 ANXA8, FAM21B, PPYR1 Array CGH 07-21AG de novo 10q21.1del 57,249,135 58,942,307 ZWINT Array CGH Maternal 10q22.2 67,669,724 69,465,948 CTNNA3 (Provisional Data) GeneImprint 08-40AG Familial 10q22 dup 74,661,716 74,809,824 MRPS16, ANXA7, TTC18, KIAA0974 Array CGH COMTD1, DUPD1, DUSP13, MYST4, 05-36NG Familial 10q dup 76,353,000 77,439,000 Array CGH SAMD8, VDAC2, ZNF503 02-27SG de novo 11q12.3-13.1dup 62,270,860 64,896,754 PRDX5, PYGM, SF1 Array CGH Predicted Maternal 11q12 65,782,631 65,811,538 RAB1B GeneImprint Conflicting Data Paternal 11q23 111,452,831 111,481,726 SDHD GeneImprint 04-48SG de novo 11q24 dup 123,150,000 128,190,000 CHEK1, NRGN, ST3GAL4 Array CGH 05-40SG de novo 11q24 del 124,290,000 129,030,000 CHEK1, ETS1, KCNJ5 , RICS, ST3GAL4 Array CGH ACAD8, ADAMTS8, APLP2, B3GAT1, 04-48SG de novo 11qter del 128,250,000 134,425,035 Array CGH JAM3, OPCML Predicted Paternal 11q25 131,230,370 132,216,715 NTM GeneImprint 03-24NG Familial 11q dup 133,845,000 134,449,035 LOC729305 Array CGH 03-MDL-24AG Familial 12p13 dup 5,952,451 6,280,226 CD9, VWF Array CGH 218

Status/ Expressed Chr Position (parent Gene/ Chr Start Chr Stop Source Lab ID Allele/ Origin transmitting CNV) Prioritized Gene from CNV Imprinted Maternal 12p13.31 7,266,279 7,291,465 RBP5 GeneImprint Predicted Maternal 13q21.1 56,603,052 56,626,073 FLJ40296 GeneImprint Common 13q21.1 56,650,362 56,729,961 No gene Array CGH OR11H12, OR4K2, OR4K5, OR4M1, Common 14q11.1 18,090,000 19,497,223 Array CGH OR4N2 Predicted Paternal 14q13 29,226,286 29,248,870 FOXG1 GeneImprint Predicted Paternal 14q22.1 53,313,988 53,427,814 FERMT2 GeneImprint 05-MDL-7AG Familial 14q23 del 72,245,680 72,400,074 DPF3 Array CGH HERC2P3, OR11K1P, POTEB, Common 15q11.2 18,261,000 20,317,192 Array CGH Q8N9W7 Imprinted 21,361,547 21,364,259 MKRN3 (ZNF127) igc.otago Imprinted 21,439,789 21,444,086 MAGEL2 (NDNL1) igc.otago Imprinted Paternal 15q11.2 21,471,646 21,493,542 NDN GeneImprint Imprinted 30,341,799 30,342,179 H73492 (provisional) igc.otago DUOX1, DUOX2, GATM, SLC28A2, 06-54AG Familial 15q21 dup 42,843,447 43,520,794 Array CGH SORD 04-36SG de novo 16p13.3 del 1 1,094,450 AXIN1, HBA1, MPG, SOX8, STUB1 Array CGH Predicted Paternal 16p13.3 1,021,807 1,046,978 SOX8 GeneImprint 06-32AG Familial 16p11.2 dup 29,500,084 30,027,413 ALDOA, CDIPT, MAZ, PPP4C, QPRT Array CGH Common 16p11.2 31,804,684 33,758,282 SLC6A10P, ZNF267 Array CGH Predicted Maternal 16q22.1 67,681,414 67,704,717 ACD GeneImprint ATBF1, CHST4, DHODH, DHX38, HPR, 06-14AG de novo 16q22 del 69,620,645 73,055,007 Array CGH KIAA0174, TAT Predicted Maternal 17p13.1 7,748,383 7,769,416 TMEM88 GeneImprint 04-14AG Familial 17p dup 10,557,304 11,130,994 TMEM220 Array CGH Predicted Paternal 17q11 23,567,715 23,589,212 PYY2 GeneImprint AATF, ACACA, DDX52, LHX1, TADA2L, 06-99AG de novo 17q12 del 31,503,207 33,323,172 Array CGH TCF2 CRHR1, IMP5, KIAA1267, MAPT, 06-88AG de novo 17q21del 41,011,330 41,700,962 Array CGH RPS26P8 Common 17q21.31 41,550,000 41,730,000 KIAA1267, ARL17P1, LRRC37A2 Array CGH Predicted Maternal 17q21 46,610,016 46,632,392 HOXB2 GeneImprint Predicted Maternal 17q21.3 46,616,231 46,661,809 HOXB3 GeneImprint Predicted Paternal 17q25.3 76,690,599 76,712,249 LOC100131170 GeneImprint C17orf101, CD7, CSNK1D, FOXK2, 08-38AG de novo 17q25.3 dup 77,660,313 78,154,619 Array CGH HEXDC, SLC16A3, UTS2R APCDD1, C18orf30, C18orf58, 06-40AG Familial 18p11 dup 9,985,330 10,890,656 Array CGH FAM38B, NAPG Predicted Paternal 18q12.1 29,837,462 30,060,446 FAM59A GeneImprint Predicted Maternal 19p13 271,042 301434 PPAP2C GeneImprint AES, EEF2, OAZ1, S1PR4, SIRT6, 05-37SG de novo 19p13 dup 1,709,657 5,057,195 Array CGH THOP1 Common 19p13.2 8,743,041 8,864,039 ZNF558, MBD3L1 Array CGH Predicted Paternal 19p12 21,323,726 21,370,097 ZNF738 GeneImprint Imprinted 41,569,734 41,603,949 L3MBTL igc.otago 06-01AG Familial 20q dup 41,768,223 41,798,860 FAM112A, MYBL2 Array CGH Imprinted Paternal 20q13.12 42,133,052 42,180,534 L3MBTL GeneImprint Imprinted Paternal 20q13.32 57,383,972 57,435,957 GNASAS GeneImprint Isoform Imprinted 20q13.3 57,404,794 57,496,249 GNAS GeneImprint Dependent 06-139AG de novo 20q13.33 dup 58,699,464 62,363,774 CHRNA4, LAMA5, OPRL1 Array CGH Predicted Maternal 20q13.33 61,417,804 61,441,944 C20orf20 GeneImprint 219

Status/ Expressed Chr Position (parent Gene/ Chr Start Chr Stop Source Lab ID Allele/ Origin transmitting CNV) Prioritized Gene from CNV Predicted Maternal 20q13.3 61,438,413 61,482,510 COL9A3 GeneImprint 06-117AG Familial 21q22 dup 34,648,298 34,821,005 C21orf51, DSCR1, KCNE1, KCNE2 Array CGH Predicted Paternal 21q22.2 38,061,990 38,132,509 SIM2 GeneImprint

This table contains the familial, de novo, and common CNVs in individuals with ID reported by Qiao et al. (Qiao et al., 2010) and the closest predicted/known imprinted gene(s) from two web sources (GeneImprint [http://www.geneimprint.org/] and the Catalogue of Parent of Origin Effects [http://igc.otago.ac.nz/home.html] CNVs from our ID cohort are shown in bold.

220

Supplementary Table 4.2 Probes Used for Replication Timing FISH Assays

A) Labeled BAC probes from selected CNVs in probands with ID Chr Clone ID Band Genomic position (hg 18) Size (bp) Overlapping Genes Comments RP11-1059K1 1q21.1 chr1:145546253-145719902 173,649 BCL9, ACP6, GJA5 Query probe RP11-242B17 1q21.1 chr1:145119363-145288686 169,323 FMO5, CHD1L Query probe RP11-433J22 1q21.3 chr1:145574142-145761156 187,014 ACP6, GJA5 Query probe CCDC57, SLC16A3, MIR6787, CSNK1D, CD7, SECTM1, TEX19, RP11-1022H22 17q25.3 chr17:77722579-77943450 220,871 UTS2R, OGFOD3 Query probe

SLC16A3, CSNK1D, CD7, SECTM1, TEX19, UTS2R, OGFOD3, HEXDC, C17orf62, RP11-598A16 17q25.3 chr17:77789501-78016849 227,348 NARF Query probe B) Labeled BAC probes from regions with known synchronous & asynchronyous replication timing Chr Clone ID Band Genomic position (hg 18) Size (bp) Overlapping Genes Comments PAX6, DKFZp686K1684, RP11-885H1 11p13 chr11:31780840-31819866 39,026 PAUPAR Control Probe - synchronous RP11-996I3 13q14.2 chr13:47825852-48030922 205,070 RB1, LPAR6, RCBTB2 Control Probe - synchronous IPW, PWAR1, Control Probe - asynchronous RP11-171C8 15q11.2 chr15:22908051-23068800 160,749 SNORD115, PWAR4 ~ 130 kb distal to SNRPN IPW, PWAR1, Control Probe - asynchronous RP11-441B20 15q11.2 chr15:22905050-23073407 168,357 SNORD115, PWAR4 ~ 130 kb distal to SNRPN C) Commercial probes Chr Probe ID Band Genomic position (hg 18) Size (bp) Gene/Color Comments 15q11.2 chr15:22641793-22811951 170,158 SNRPN (SR) Control Probe - asynchronous Cytocell SNRPN 15qter chr15:99916007-100021043 105,036 15q telomere (SG) Control Probe - synchronous

This table contains details of the probes used in the replication timing assays. Probe name, chromosomal position, overlap with known genes, and whether the probe is in a control or queried region is listed for A) Labeled BAC probes from selected CNVs in probands with ID, B) Labeled BAC probes from regions with known synchronous & asynchronyous replication timing, and C) The postion of commercial probes used and in this study (Cytocell) are provided for comparison.

221

Supplementary Figure 4 1 Position of Commercial and Non-commercial Probes Overlapping the 15q11.2 Imprinted Region

This figure shows the genomic location (hg18) of selected BACs (RP11-441B20 and RP11-171C8) and commercially available probes in the 15q11.2 imprinted region in the UCSC genome browser along with known genes (RefSeq) in the region. CC=Cytocell, LSI=Vysis. The position and color of the labeled genomic regions in the CC_SNRPN probe are shown on the ideogram. The VYSIS LSI SNRPN probe used by other groups (Hirsch et al., 2011; Nagler et al., 2010; Yeshaya et al., 2009) for FISH-RT in previous studies is shown for comparison. The tri- color probe is made up of 3 probes, a 125,073 bp spectrum red probe at 15q11.2 (chr15:22716001-228410740) that overlaps the SNRPN region (LSI_SNRPN), a 238,715 bp spectrum green control probe at 15q24.1 (chr15:71877721-72116436) (not shown), and an aqua Centromeric probe used to identify chromosome 15 (not shown).

222