Human-Specific Duplicate Genes: New Frontiers for Disease and Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Human-specific duplicate genes: new frontiers for disease and evolution Xander Nuttle A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2015 Reading Committee: Evan E. Eichler, Chair Chris T. Amemiya Christine Queitsch Program Authorized to Offer Degree: Genome Sciences ©Copyright 2015 Xander Nuttle University of Washington Abstract Human-specific duplicate genes: new frontiers for disease and evolution Xander Nuttle Chair of the Supervisory Committee: Professor Evan E. Eichler Department of Genome Sciences Gene duplication is a fundamental force contributing to the evolution of novel traits, genomic diversity among species and individuals, and disease. In this dissertation, I characterize the evolutionary history, diversity, functional potential, and disease relevance of gene families that emerged specifically along the lineage leading to human. I leveraged a haploid clone library to resolve the sequence and structure of four human SRGAP2 paralogs, adding ~380 kbp of sequence to the human reference genome. Analyzing this high-quality sequence, I found that the promoter and first nine exons of SRGAP2 duplicated three times across chromosome 1, ~3.4-1 million years ago. All paralogs produce mRNA transcripts, but SRGAP2C is most highly expressed and has fixed in copy number in the human population, making it the most likely functional duplicate. To screen large cohorts of autism and intellectual disability patients for mutations that disrupt SRGAP2C, I developed a method to genotype paralog-specific copy number and sequence variation using molecular inversion probes. I demonstrated that this method was broadly applicable to large-scale genotyping of previously inaccessible duplicated genes. Using this method, I also discovered regions of interlocus gene conversion between duplicated sequences >80 Mbp apart on the same chromosome and refined unequal crossover breakpoints for copy number polymorphisms at the RH locus. Finally, I employed my genotyping method and strategies used to characterize SRGAP2 duplications to study BOLA2, a gene duplicated specifically in Homo sapiens located at chromosome 16p11.2. Sequencing this region in orangutan and chimpanzee revealed drastic rearrangements between species, including six inversions affecting 47 genes. I determined that an ~95 kbp segment including BOLA2 duplicated ~282 thousand years ago, specifically predisposing humans to 2 recurrent microdeletions and microduplications associated with autism. I demonstrate that despite its young age and its conferring susceptibility to rearrangements, the BOLA2 duplication has nearly fixed in the human lineage. I show that BOLA2 duplication resulted in a Homo sapiens-specific in-frame fusion transcript and that expression correlates with genomic copy number. Collectively, my work provides new insights into the birth, evolution, and disease relevance of duplicate genes, pioneers new genotyping technology, and identifies specific gene innovations as novel candidates for the evolution of uniquely human traits. 3 Table of Contents List of Figures .............................................................................................................................................. 6 List of Tables ............................................................................................................................................... 7 Acknowledgements ..................................................................................................................................... 8 1. Introduction ........................................................................................................................................... 10 1.1 Duplicate Genes and Evolution ......................................................................................................... 10 1.2 Duplicate Genes and Disease ............................................................................................................ 11 1.3 The Paradox of Interspersed Duplications ........................................................................................ 12 1.4 Human-Specific Duplicate Genes ..................................................................................................... 13 1.5 Research Goals .................................................................................................................................. 15 1.6 Chapter Organization ........................................................................................................................ 16 2. Evolution of Human-Specific Neural SRGAP2 Genes by Incomplete Segmental Duplication ..... 18 2.1 Summary ........................................................................................................................................... 18 2.2 Introduction ....................................................................................................................................... 18 2.3 Results ............................................................................................................................................... 19 2.3.1 Genome Sequencing .................................................................................................................. 19 2.3.2 Evolutionary History of SRGAP2 .............................................................................................. 23 2.3.3 SRGAP2 mRNA Expression and Paralog Gene Structure ......................................................... 26 2.3.4 SRGAP2 Copy Number Variation ............................................................................................. 28 2.4 Discussion ......................................................................................................................................... 32 2.5 Experimental Procedures .................................................................................................................. 36 2.5.1 Fluorescent in situ Hybridization ............................................................................................... 36 2.5.2 Cloning Using a Complete Hydatidiform Mole Library ............................................................ 36 2.5.3 Sequencing and Assembly ......................................................................................................... 37 2.5.4 Phylogenetic Analysis ................................................................................................................ 37 2.5.5 SRGAP2 Transcript Analysis ..................................................................................................... 37 2.5.6 Paralog-Specific Copy Number Genotyping ............................................................................. 38 2.6 Notes ................................................................................................................................................. 38 3. Rapid and Accurate Large-Scale Genotyping of Duplicated Genes and Discovery of Interlocus Gene Conversions ...................................................................................................................................... 40 3.1 Summary ........................................................................................................................................... 40 3.2 Introduction ....................................................................................................................................... 40 3.3 Results ............................................................................................................................................... 41 4 3.3.1 Genotyping Strategy .................................................................................................................. 41 3.3.2 Copy-Number and Sequence Genotyping.................................................................................. 43 3.3.3 Internal SRGAP2 Deletion and Duplication Discovery ............................................................. 46 3.3.4 RH Gene Conversion, Copy Number and Breakpoint Resolution ............................................. 47 3.3.5 Discovery of Interlocus Gene Conversions in SRGAP2 ............................................................ 51 3.4 Discussion ......................................................................................................................................... 52 3.5 Notes ................................................................................................................................................. 54 4. Emergence of a Homo sapiens-Specific Gene Family and the Evolution of Autism Risk at Chromosome 16p11.2 ............................................................................................................................... 56 4.1 Summary ........................................................................................................................................... 56 4.2 Introduction ....................................................................................................................................... 57 4.3 Results ............................................................................................................................................... 57 4.3.1 Evolution and Structural Diversity of Chromosome 16p11.2 .................................................... 57 4.3.2 Human Copy Number Variation and the Rapid Non-Neutral Expansion of BOLA2 ................. 64 4.3.3 BOLA2