Download Human Unmapped Download Nonhuman Primate MI Collection Bam Files from 1KGP Assembly Pairs from UCSC
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.07.27.218867; this version posted July 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 The landscape of micro-inversions provides clues 2 for population genetic analysis of humans 3 Li Qu1,2, Luotong Wang1,3, Feifei He1,3, Yilun Han1,3, Longshu Yang1,3, May D. Wang2, 4 Huaiqiu Zhu1,2,3* 5 1 State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical 6 Engineering, College of Engineering, Peking University, Beijing, China 7 2 Wallace H. Coulter Department of Biomedical Engineering at Georgia Tech and Emory, Atlanta, 8 Georgia, United States of America 9 3 Center for Quantitative Biology, Peking University, Beijing, China 10 *Corresponding author, Email: [email protected] (HZ) 11 Abstract 12 Background: Variations in the human genome have been studied extensively. However, 13 little is known about the role of micro-inversions (MIs), generally defined as small 14 (<100 bp) inversions, in human evolution, diversity, and health. Depicting the pattern 15 of MIs among diverse populations is critical for interpreting human evolutionary 16 history and obtaining insight into genetic diseases. 17 Results: In this paper, we explored the distribution of MIs in genomes from 26 human 18 populations and 7 nonhuman primate genomes and analyzed the phylogenetic structure 19 of the 26 human populations based on the MIs. We further investigated the functions of 20 the MIs located within genes associated with human health. With hg19 as the reference 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.27.218867; this version posted July 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 21 genome, we detected 6,968 MIs among the 1,937 human samples and 24,476 MIs 22 among the 7 nonhuman primate genomes. The analyses of MIs in human genomes 23 showed that the MIs were rarely located in exonic regions. Nonhuman primates and 24 human populations shared only 82 inverted alleles, and Africans had the most inverted 25 alleles in common with nonhuman primates, which was consistent with the “Out of 26 Africa” hypothesis. The clustering of MIs among the human populations also coincided 27 with human migration history and ancestral lineages. 28 Conclusions: We propose that MIs are potential evolutionary markers for investigating 29 population dynamics. Our results revealed the diversity of MIs in human populations 30 and showed that they are essential to constructing human population relationships and 31 have a potential effect on human health. 32 33 Keywords: Micro-inversions; Structural variations; Genome; High-throughput 34 sequencing; Evolution 35 36 1 Background 37 As a kind of structural variation (SV) in genomes, inversions are defined as 38 chromosome rearrangements in which a segment of a chromosome is reversed end to 39 end [1]. Usually, an inversion occurs when a single chromosome undergoes breakage 40 and rearrangement within itself. Although it has long been known that inversions are 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.27.218867; this version posted July 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 41 associated with primate evolution [2 ], they were only recently found to play an 42 important role in human evolution and diseases [3,4]. A number of studies have focused 43 on inversions in the human genome [ 5 , 6 ], and for decades, many detectable 44 macroinversion polymorphisms in humans have been verified by experiments and 45 proved to have been significant in human evolutionary history [7 , 8 ]. oor example, 46 inversions located in 8p23.1 have been reported to be associated with autoimmune and 47 cardiovascular diseases [3]. In addition, inversions have been designated as 48 evolutionary markers of human phenotypic diversity [9,10]. oor example, a common 49 900 kb inversion polymorphism at 17q21.31 associated with Parkinson’s disease 50 suggested multiple distributions of inversions among ethnic populations [7]. Indeed, 51 with the development of inversion detection methods, inversions are being increasingly 52 recognized as one of the most important mechanisms underlying genetic diversity. 53 However, the studies mentioned above were mainly limited to the detection of 54 large-scale inversions, usually >100 kb in length. Recently, small-scale inversions (with 55 lengths much shorter than 10 kb) have been used in several studies for purposes such 56 as phylogenetic inference. These studies found an influence of small inversions on the 57 formation of unusual flanking sequences in human and chimpanzee genomes [11] and 58 developed tools to detect in-place inversions [12]. Nevertheless, the results of these 59 studies were highly variable because of differences in the size used to define small 60 inversions [13 ]. Moreover, current studies focus mainly on the differences in small 61 inversions between human (Homo sapiens) and chimpanzee (Pan troglodytes) genomes 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.27.218867; this version posted July 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 62 [11], excluding the genomes of more distantly related primates such as gorillas 63 (Gorilla), orangutans (Pongo pygmaeus), gibbons (Hylobates sp.), baboons (Papio 64 anubis), and rhesus macaques (Macaca mulatta). 65 Among all the inversions of small size, micro-inversions (MIs) [13], which are a 66 type of extremely small inversion (generally <100 bp) found remarkably in the human 67 genome, have an uncertain function in the research community. However, statistical 68 analysis has suggested that MIs, which are usually similar to other rare genomic 69 changes, may serve as phylogenetic markers because of their rare occurrence and low 70 homoplasy [14]. ourthermore, many studies have examined the functions of inversions 71 among multiple nonprimate species, such as yeast, sticklebacks, grasshoppers, 72 Drosophila, ducks, chicken, and mice [15-21]. Regarding primates, which have larger 73 genomes, comparative studies of the influence of chromosomal inversions occurring 74 within chromosomes in the human and chimpanzee genomes can be traced back to the 75 last century [2]. Nevertheless, such studies on human inversions are rare because of the 76 limitations of detection techniques [22]. Great efforts have been focused on developing 77 experimental techniques to discover large-scale inversions, including methods based on 78 single-cell sequencing [23 , 24 ], read assembly [25 ], and probe hybridization and 79 amplification [26]. However, these experimental methods are usually expensive, time- 80 consuming, and not aimed at detecting small inversions. With the advent of next- 81 generation sequencing (NGS), the 1000 Genomes Project (1KGP) has provided a large 82 amount of whole-genome sequencing data for individuals from a variety of populations 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.27.218867; this version posted July 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 83 across the world [27,28]. However, the short reads generated by NGS in the 1KGP are 84 generally very short (<100 bp), making both the detection and analysis of MIs difficult. 85 In contrast to well-studied large inversions, which cause more abnormal short reads and 86 thus are discovered more frequently based on NGS short reads, MIs have not yet been 87 studied because it is difficult to detect MIs shorter than the read length, with most 88 identified as unmapped reads and thus discarded before further analysis. However, a 89 large proportion of the sequenced short reads in the 1KGP are unmapped short reads 90 that may contain SVs, many of which could be MIs. oor example, in NA18525, a low- 91 coverage sequencing sample in the 1KGP, 608,982,899 short reads are mapped reads, 92 while 360,144,413 (37.2% of the total) are unmapped reads. The Micro-inversion 93 Detection (MID) method developed in our previous study [13] uses these unmapped 94 reads to detect MIs with good performance. The algorithm of MID was designed based 95 on a dynamic programming path-finding approach, which can efficiently and reliably 96 identify MIs from unmapped short NGS reads. This process subsequently facilitates the 97 analysis of MIs across many populations based on high-throughput sequencing data. 98 Therefore, an increased understanding of the MI landscape across various human 99 populations will facilitate comparative analyses among individuals by taking individual 100 differences into account. 101 Although efforts have been made to analyze small inversions that are still >100 bp 102 in nonhuman organisms [11,12], comprehensive studies of the effects of MIs, which are 103 < 100 bp in length, on human diversity, evolution, and diseases in a large number of 5 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.27.218867; this version posted July 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 104 human genomes in the 1KGP are lacking. In this study, we set out to detect MIs and 105 further investigate the roles of MIs in the diversity and evolution of 26 human 106 populations and 7 nonhuman primate populations. Overall, we explored the distribution 107 of MIs in all 26 populations from the 1KGP [27] and detected 6,968 MIs within all 108 1,937 human samples and 24,476 MIs in 7 nonhuman primate genomes.