Characterizing Genomic Duplication in Autism Spectrum Disorder by Edward James Higginbotham a Thesis Submitted in Conformity
Total Page:16
File Type:pdf, Size:1020Kb
Characterizing Genomic Duplication in Autism Spectrum Disorder by Edward James Higginbotham A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Molecular Genetics University of Toronto © Copyright by Edward James Higginbotham 2020 i Abstract Characterizing Genomic Duplication in Autism Spectrum Disorder Edward James Higginbotham Master of Science Graduate Department of Molecular Genetics University of Toronto 2020 Duplication, the gain of additional copies of genomic material relative to its ancestral diploid state is yet to achieve full appreciation for its role in human traits and disease. Challenges include accurately genotyping, annotating, and characterizing the properties of duplications, and resolving duplication mechanisms. Whole genome sequencing, in principle, should enable accurate detection of duplications in a single experiment. This thesis makes use of the technology to catalogue disease relevant duplications in the genomes of 2,739 individuals with Autism Spectrum Disorder (ASD) who enrolled in the Autism Speaks MSSNG Project. Fine-mapping the breakpoint junctions of 259 ASD-relevant duplications identified 34 (13.1%) variants with complex genomic structures as well as tandem (193/259, 74.5%) and NAHR- mediated (6/259, 2.3%) duplications. As whole genome sequencing-based studies expand in scale and reach, a continued focus on generating high-quality, standardized duplication data will be prerequisite to addressing their associated biological mechanisms. ii Acknowledgements I thank Dr. Stephen Scherer for his leadership par excellence, his generosity, and for giving me a chance. I am grateful for his investment and the opportunities afforded me, from which I have learned and benefited. I would next thank Drs. Brett Trost, Susan Walker, and Mehdi Zarrei for their considerate guidance and contributions made towards my training. A person could do far worse for friends and mentors. Drs. Fritz Roth and Mikko Taipale are role models and they have been gracious and good-humoured. Ada Chan, Lia D’Abate, and Charlotte Nguyen are good, kind friends and I am fortunate to have worked alongside all three. I give my thanks to Beverly Apresto, Dr. Lisa Bradley, Dr. Janet Buchanan, Elaine Chang, Wilson Chung, Dr. Bank Engchuan, Dr. Muhammed Faheem, Joanne Herbrick, Jen Howe, Barbara Kellam, Sylvia Lamoureux, Timothy Lau, Dr. Hin Lee, Dr. Si Lok, Miranda Lorenti, Patricia Lu, Jeff MacDonald, Dr. Roozbeh Manshaei, Dr. Christian Marshall, Dr. Karin Miron, Rohan Patel, Dr. Andrew Paterson, Dr. Tara Paton, Dr. Giovanna Pellecchia, Dr. Sergio Pereira, MyLinh Pham, Sanjeev Pullenayegum, Dr. Miriam Reuter, Dr. Marsha Speevak, Dr. James Stavropoulos, Bhooma Thiruvahindrapuram, Dr. Zhouzhi Wang, Dr. John Wei, Joseph Whitney, Dr. Richard Wintle, and Dr. Ryan Yuen. All have been generous with their time and help. I thank Dr. Bridget Fernandez who provided valuable phenotype information. And to Jingle Candelario-MacDonald, Guillermo Casallo, Emily Cornelius, and Marnita Manalo for the hard work they have contributed to my thesis. I last thank Dr. Eve J. Higginbotham, MS, MD, for her kind words at the outset of my MSc. Our chance encounter is most appreciated, and a reminder that every tree is a good tree. Finally, I am grateful to my parents, Ted and Louise, to each of my forebears, and to my spiritually-closest of their descendants: Dr. Stewart Higginbotham, DVM, Dr. Alexa Higginbotham, Charlotte Higginbotham, Nathaniel Rose, Sarah Rose, the lovely Nora York, and the Edwards family of Winnipeg, Manitoba. I love you all. iii Contents Abstract ii Acknowledgements iii List of Tables vi List of Figures vii List of Appendices vii List of Abbreviations ix 1 Introduction 1 1.1 Duplications in human evolution and disease 1 1.2 Complex duplications in human disease 12 1.3 Mechanisms of CNV formation 13 1.4 Autism Spectrum Disorder 16 1.5.1 The genetics of Autism Spectrum Disorder 18 1.5.2 Rare variation contributes significantly to ASD risk 20 1.6 The impact of CNVs on gene expression 23 1.7 Project rationale 24 2 Methods 25 2.1 Study subjects 25 2.2 CNV detection from WGS data 25 2.3 ASD gene lists 26 2.4 Characterization of duplication structures 26 2.5 Annotation of breakpoint sequences 28 2.6 Validation of predicted breakpoint junctions 28 2.7 Lymphoblastoid cell-line culture 28 2.8 Targeted gene expression analysis 29 iv 2.9 PCR-based validation of a SUPT16H-CHD8 fusion transcript 29 3 Results 33 Chapter 1 33 3.1 Cataloging complex duplication in the human genome 37 3.1.1 Characteristics of genome-wide complex duplications 37 3.1.2 Interchromosomal dispersed duplications 42 3.1.3 The functional impact of complex duplications 43 3.1.4 Transmission patterns of complex duplications 47 3.1.5 Properties of genome-wide control duplications with complex structures 50 3.1.6 The mechanistic impact of ASD-relevant duplications 52 3.1.7 Breakpoint context and mechanisms of CNV formation 60 Chapter 2 65 3.2.1 Functional characterization of five ASD-relevant duplications 67 3.2.2 Characterization of a SUPT16H-CHD8 fusion transcript 78 4 Discussion 82 4.1 Functional analysis of gene expression 82 4.2 CNV detection from WGS 84 5 Future Directions and Impact 85 5.1 Duplication mechanism/applicability to other disease 85 5.2 Data access and accessibility 86 5.3 Limitations 88 6 Appendix 90 7 Bibliography 135 v List of Tables Table 1: Examples of triplosensitive genes in human disease 2 Table 2: Functional mechanisms of select exonic duplications in human disease 5 Table 3: Examples of disease-associated duplications that result in position effect 8 Table 4: Examples of disease-associated duplications as founder mutations 9 Table 5: Primer sequences used for breakpoint junction assays 30 Table 6: Standard PCR protocol for breakpoint junction validation 31 Table 7: TaqMan assays used in targeted gene expression profiling 32 Table 8: Subjects harbouring known ASD-relevant CNVs 34 Table 9: Size distribution of ASD-relevant duplications 35 Table 10: Characteristics of complex rearrangements identified at ASD relevant loci 36 Table 11: Complex duplications impacting ASD-relevant genes 39 Table 12: The impact of ASD-relevant duplications on gene structure 53 Table 13: Tandem duplications identified at highly-penetrant ASD risk loci 57 Table 14: Sequence context of breakpoint sequences 62 Table 15: The mutational mechanisms of complex duplications 63 Table 16: Mechanisms of formation inferred from ASD-relevant tandem and NAHR-mediated duplication breakpoints 64 Table 17: Summary of associated gene expression changes in duplication carriers 71 Table 18: Genes impacted by candidate duplications 72 vi List of Figures Figure 1: Recombination-based and replication-based mechanisms of CNV formation 14 Figure 2: Characterization of structural variants using paired-end read sequencing 27 Figure 3: Transmission patterns of complex duplications identified in SPX and MPX ASD families 48 Figure 4: Duplication breakpoint junctions validated by PCR-based assays and Sanger sequencing 66 Figure 5: Gene expression alterations associated with duplications at 7q36.1, 16p13.3 and 19q13.32 68 Figure 6: Gene expression alterations associated with a 2.85 Mb duplication at 4q25 77 Figure 7: Characterization of a novel SUPT16H-CHD8 fusion gene 79 vii List of Appendices Appendix 1: List of known ASD-relevant CNVs 90 Appendix 2: Complex duplications impacting ASD-relevant genes 124 Appendix 3: Families segregating complex ASD-relevant duplications 128 Appendix 4: Families harboring tandem duplications identified at ASD risk loci 131 viii List of Abbreviations A-EJ Alternative end-joining ABA Applied behavioural analysis ADHD Attention-deficit hyperactivity disorder ADI-R Autism Diagnosis Interview-Revised ADOS Autism Diagnostic Observation Schedule AR Autosomal recessive ASD Autism spectrum disorder BAM Binary alignment map BIR Break-induced replication BP Bipolar disorder CAM Cell adhesion molecule CD Conduct disorder CF Cystic fibrosis CMA Chromosomal microarray analysis CNV Copy number variation DD Developmental delay ddPCR Droplet digital polymerase chain reaction DSM Diagnostic and Statistical Manual of Mental Disorders EGF Epidermal growth factor FoSTeS Fork-stalling and template switching GalNAcT UDP-N-acetyl-alpha-D-galactosamine:polypeptide N- acetylgalactosaminyltransferase GEF Guanine nucleotide exchange factor IBS Irritable bowel syndrome ID Intellectual disability IGV Integrative Genomics Viewer LCL Lymphoblastoid cell line LCR Low-copy repeat LD Learning disability LINE Long interspersed element LTR Long terminal repeat MCA Multiple congenital anomalies MMBIR Microhomology-mediated break-induced replication MPX Multiplex / multiple incidence family MRI Magnetic resonance imaging NAHR Non-allelic homologous recombination NDD Neurodevelopmental disorder NHEJ Nonhomologous end-joining OCD Obsessive-compulsive disorder ODD Oppositional defiant disorder PCR Polymerase chain reaction PDD-NOS Pervasive developmental disorder not otherwise specified PSD Postsynaptic density qPCR Quantitative polymerase chain reaction ix RLCR Repetitive and low-complexity region RTK Receptor tyrosine kinase SCZ Schizophrenia SDR Short-chain dehydrogenase/reductase SINE Short interspersed element SNP Single nucleotide polymorphism SNV Single nucleotide variant SPX Simplex / single incidence family ssDNA Single-stranded DNA SV Structural variation TCAG The Centre for Applied Genomics TF Transcription factor WES Whole exome sequencing WGS Whole