The Driver of Extreme Human-Specific Olduvai Repeat Expansion Remains Highly Active in the Human
Total Page:16
File Type:pdf, Size:1020Kb
Genetics: Early Online, published on November 21, 2019 as 10.1534/genetics.119.302782 1 The Driver of Extreme Human-Specific Olduvai Repeat Expansion Remains Highly Active in the Human 2 Genome 3 Ilea E. Heft,1,* Yulia Mostovoy,2,* Michal Levy-Sakin,2 Walfred Ma,2 Aaron J. Stevens,3 Steven 4 Pastor,4 Jennifer McCaffrey,4 Dario Boffelli,5 David I. Martin,5 Ming Xiao,4 Martin A. Kennedy,3 5 Pui-Yan Kwok,2,6,7 and James M. Sikela1 6 7 1Department of Biochemistry and Molecular Genetics, and Human Medical Genetics and 8 Genomics Program, University of Colorado School of Medicine, Aurora CO 80045 9 2Cardiovascular Research Institute, 6Department of Dermatology, and 7Institute for Human 10 Genetics, University of California, San Francisco, San Francisco, CA, USA 11 3Department of Pathology, University of Otago, Christchurch, New Zealand 8140 12 4School of Biomedical Engineering, Drexel University, Philadelphia, PA 19104 13 5Children's Hospital Oakland Research Institute, Oakland, CA, 94609 14 15 Corresponding author: James M. Sikela: [email protected] 16 *Ilea E. Heft and Yulia Mostovoy contributed equally to this article. 17 1 Copyright 2019. 18 Abstract 19 Sequences encoding Olduvai protein domains (formerly DUF1220) show the greatest 20 human lineage-specific increase in copy number of any coding region in the genome and have 21 been associated, in a dosage-dependent manner, with brain size, cognitive aptitude, autism, 22 and schizophrenia. Tandem intragenic duplications of a three-domain block, termed the Olduvai 23 triplet, in four NBPF genes in the chromosomal 1q21.1-.2 region are primarily responsible for 24 the striking human-specific copy number increase. Interestingly, most of the Olduvai triplets are 25 adjacent to, and transcriptionally co-regulated with, three human-specific NOTCH2NL genes 26 that have been shown to promote cortical neurogenesis. Until now, the underlying genomic 27 events that drove the Olduvai hyper-amplification in humans have remained unexplained. Here, 28 we show that the presence or absence of an alternative first exon of the Olduvai triplet 29 perfectly discriminates between amplified (58/58) and unamplified (0/12) triplets. We provide 30 sequence and breakpoint analyses that suggest the alternative exon was produced by an NAHR- 31 based mechanism involving the duplicative transposition of an existing Olduvai exon found in 32 the CON3 domain that typically occurs at the carboxy end of NBPF genes. We also provide 33 suggestive in vitro evidence that the alternative exon may promote instability through a 34 putative G-quadraplex-based (pG4) mechanism. Lastly, we use single-molecule optical mapping 35 to characterize the intragenic structural variation observed in NBPF genes in 154 unrelated 36 individuals and 52 related individuals from 16 families and show that the presence of pG4- 37 containing Olduvai triplets is strongly correlated with high levels of Olduvai copy number 38 variation. These results suggest that the same driver of genomic instability that allowed the 2 39 evolutionarily recent, rapid, and extreme human-specific Olduvai expansion remains highly 40 active in the human genome. 41 Introduction 42 Sequences encoding Olduvai protein domains (formerly DUF1220) (Sikela and van Roy 43 2018) have undergone a human-specific hyper-amplification that represents the largest human- 44 specific increase in copy number of any coding region in the genome (Popesco et al. 2006; 45 O’Bleness et al. 2012). The current human reference genome [hg38] is reported to contain 302 46 haploid copies (Zimmer and Montgomery 2015), approximately 165 of which have been added 47 to the human genome since the Homo/Pan split (O’Bleness et al. 2012). Olduvai sequences are 48 found almost entirely within the NBPF gene family (Vandepoele et al. 2005) and have 49 undergone exceptional amplification exclusively within the primate order, with copy numbers 50 generally decreasing with increasing phylogenetic distance from humans: humans have ~300 51 copies, great apes 97-138, monkeys 48-75, and non-primate mammals 1-8 (Popesco et al. 2006; 52 O’Bleness et al. 2012; Zimmer and Montgomery 2015). 53 Olduvai copy number increase has been implicated in the evolutionary expansion of the 54 human brain, showing a linear association with brain size, neuron number, and several other 55 brain-size-related phenotypes among primate species (Dumas and Sikela 2009; Dumas et al. 56 2012; Keeney et al. 2014, 2015; Zimmer and Montgomery 2015). Among humans, Olduvai copy 57 number (total or subtype specific) has been linked, in a dosage-dependent manner, to brain 58 size, gray matter volume, and cognitive aptitude in healthy populations (Dumas et al. 2012; 59 Davis et al. 2015b), as well as with brain size pathologies (microcephaly/macrocephaly) (Dumas 3 60 et al. 2012). Increasing Olduvai copy number has also been linearly associated with increasing 61 severity of autism symptoms (Davis et al. 2014, 2015a, 2019), as well as increasing severity of 62 negative schizophrenia symptoms, which are phenotypically similar to the social deficits 63 associated with autism (Searles Quick et al. 2015). Conversely, decreasing copy number shows a 64 linear relationship with the severity of positive symptoms of schizophrenia (e.g., hallucinations, 65 delusions). Lending further support for a role in these disorders, 1q21.1-associated duplications 66 and deletions that encompass many Olduvai copies are among the most common structural 67 variations (SVs) associated with autism and schizophrenia, respectively (Brunetti-Pierri et al. 68 2008; Mefford et al. 2008; International Schizophrenia Consortium et al. 2009). Taken together, 69 these findings suggest that variation in Olduvai copy number can produce both beneficial (e.g., 70 increased brain size and cognitive function) and deleterious brain-related phenotypic outcomes 71 (Dumas and Sikela 2009). Such a duality in function fits well with models that propose that the 72 key genes that drove human brain evolution are also significant contributors to autism and 73 schizophrenia (Crow 1995; Burns 2007; Dumas and Sikela 2009; Sikela and Searles Quick 2018). 74 Finally, recent studies have shown that most human-specific Olduvai copies are encoded by 75 four NBPF genes on 1q21.1-2, three of which are adjacent to and co-regulated with three 76 human-specific NOTCH2NL genes (Fiddes et al. 2019). The three NOTCH2NL genes have been 77 shown to promote cortical neurogenesis (Fiddes et al. 2018; Suzuki et al. 2018), suggesting that 78 the human-specific Olduvai expansions and their adjacent NOTCH2NL partners may work in a 79 coordinated, complementary manner to drive human brain expansion in a dosage-related 80 fashion (Fiddes et al. 2019). 81 Olduvai domains are on average 65 amino acids in length (ranging from 61 to 74) (Finn 4 82 et al. 2014) and are encoded within a small exon and large exon doublet (Popesco et al. 2006). 83 The domains have been subdivided into 6 primary subtypes based on sequence similarity: 84 Conserved 1-3 (CON1-3) and Human Lineage Specific 1-3 (HLS1-3) (O’Bleness et al. 2012). 85 Sequences encoding the various subtypes are almost always found in the same order within 86 NBPF genes: one or more CON1 domains; a single CON2 domain; one or more instances of a 87 triplet composed of HLS1, HLS2, and HLS3 domains; and a single CON3 domain at the carboxy 88 end. A significant majority (114 copies) of the human-specific copies have been generated by 89 intragenic domain amplification (i.e., increases in the number of tandemly arranged copies) 90 primarily involving 4 of the 24 human NBPF genes (O’Bleness et al. 2012). These tandem 91 expansions are predominantly organized as contiguous three domain blocks, named Olduvai 92 triplets, each of which is approximately 4.7kb in length and composed of an HLS1, HLS2, and 93 HLS3 subtype (O’Bleness et al. 2012). While Olduvai sequences show a human-specific range in 94 copy number, the domains are highly copy number polymorphic among humans, exhibiting a 95 broad normal distribution (Davis et al. 2014). Despite the rapid and extreme nature of the 96 Olduvai triplet expansion in the human species, the genomic factors that drove the process 97 have remained unexplained. Similarly, the genomic mechanism that underlies the extensive 98 Olduvai copy number variation observed within the human population remains unknown. 99 G-quadruplexes (G4s) are non-B form DNA structures characterized by two or more 100 stacked planar guanine tetrads. The characteristic sequence motif of these structures is four or 101 more closely spaced runs of three or more guanines (Bochman et al. 2012). The guanines are 102 stabilized by non-canonical H-bonding in a coplanar arrangement to form G-quartets and a 103 stacked array of G-quartets make up a G4 structure. G4 sequence motifs are over-represented 5 104 in meiotic and mitotic double-stranded breaks and are thought to promote genomic instability 105 (Bochman et al. 2012; Maizels and Gray 2013). Here, we provide evidence supporting the view 106 that a potential G4 motif specific to Olduvai sequences may be behind both their evolutionary 107 hyperamplification and the high degree of variation associated with Olduvai triplets in existing 108 humans. 109 110 Materials and Methods 111 Sequence analysis of predicted breakpoint regions 112 The coordinates of the relevant introns were obtained as described in Astling et al. 113 (Astling et al. 2017). Reference sequence data for the relevant introns were extracted from 114 hg38 utilizing bedtools v2.26.0 (Quinlan and Hall 2010). The sequences were aligned with 115 Clustal Omega (Rice et al. 2000), and the alignments were visualized with Geneious v10.2.3. 116 Analysis of the pG4 sequence in non-human primates 117 Olduvai sequences in non-human primate genomes were located as described in 118 Zimmer and Montgomery (Zimmer and Montgomery 2015).