Characterization of X-Linked SNP Genotypic Variation in Globally Distributed Human Populations Genome Biology 2010, 11:R10
Total Page:16
File Type:pdf, Size:1020Kb
Casto et al. Genome Biology 2010, 11:R10 http://genomebiology.com/2010/11/1/R10 RESEARCH Open Access CharacterizationResearch of X-Linked SNP genotypic variation in globally distributed human populations Amanda M Casto*1, Jun Z Li2, Devin Absher3, Richard Myers3, Sohini Ramachandran4 and Marcus W Feldman5 HumanAnhumanulation analysis structurepopulationsX-linked of X-linked variationand provides de geneticmographic insights variation patterns. into in pop- Abstract Background: The transmission pattern of the human X chromosome reduces its population size relative to the autosomes, subjects it to disproportionate influence by female demography, and leaves X-linked mutations exposed to selection in males. As a result, the analysis of X-linked genomic variation can provide insights into the influence of demography and selection on the human genome. Here we characterize the genomic variation represented by 16,297 X-linked SNPs genotyped in the CEPH human genome diversity project samples. Results: We found that X chromosomes tend to be more differentiated between human populations than autosomes, with several notable exceptions. Comparisons between genetically distant populations also showed an excess of X- linked SNPs with large allele frequency differences. Combining information about these SNPs with results from tests designed to detect selective sweeps, we identified two regions that were clear outliers from the rest of the X chromosome for haplotype structure and allele frequency distribution. We were also able to more precisely define the geographical extent of some previously described X-linked selective sweeps. Conclusions: The relationship between male and female demographic histories is likely to be complex as evidence supporting different conclusions can be found in the same dataset. Although demography may have contributed to the excess of SNPs with large allele frequency differences observed on the X chromosome, we believe that selection is at least partially responsible. Finally, our results reveal the geographical complexities of selective sweeps on the X chromosome and argue for the use of diverse populations in studies of selection. Background somes are generally more differentiated among human pop- In humans, females typically carry two X chromosomes ulations than are autosomes. On a worldwide scale, drift has while males are haploid for almost all X-linked loci, com- been invoked to explain why approximately 15% of the plementing their one X chromosome with the smaller Y genetic variation observed at X chromosomal single nucle- chromosome. This relatively small alteration to the stan- otide polymorphisms (SNPs) is between populations for the dard of simple diploidy followed by all 22 autosomes has 51 Centre D'etude du Polymorphism Humaine-Human profound consequences for X-linked markers relative to Genome Diversity Project (CEPH-HGDP) populations their autosomal counterparts. Even under conditions of gen- while that figure is only 10% for the autosomes [3]. This der equality with respect to migration and population size, observation raises the possibility that X-linked markers the smaller effective population size of the X chromosome may be superior to autosomal ones for distinguishing means that drift may have a more profound influence upon closely related populations. In addition, each X chromo- it compared to the autosomes. Some repercussions of this some spends two-thirds of its time carried by a female. This are suggested by the results of Rosenberg et al. [1] and means that X-linked markers are disproportionately influ- Ramachandran et al. [2], who observed that X chromo- enced by female demography, making them useful for detecting differences in the demographies of the two gen- * Correspondence: [email protected] ders. Indeed, many recent studies have found evidence on 1 Department of Genetics, Stanford University, Mail Stop 5120, Stanford, the X chromosome for skewed female to male population California 94305, USA size and migration rate ratios [4-6], suggesting that such © 2010 Casto et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Casto et al. Genome Biology 2010, 11:R10 Page 2 of 19 http://genomebiology.com/2010/11/1/R10 differences may be the norm rather than the exception in the non-pseudoautosomal region of the X chromosome. As human history. the CEPH-HGDP sample set includes 383 females and 615 Just as the interaction between demographic factors and males, this dataset contains information from 1,261 X chro- genetic variation is special for the X chromosome, so too is mosomes. The non-pseudoautosomal region of the X chro- the interaction between selective forces and X-linked mosome consists of approximately 148 Mb of genome genetic variation. For the autosomes, a recessive mutation sequence, yielding a marker density of about 22 SNPs per must become sufficiently common to be present in homozy- 200 kb. This is about half the marker density of the auto- gotes before selection can act upon it; this is not the case for somal SNPs in this dataset (reported by Pickrell et al. [10] the X chromosome, where recessive mutations are always to be 40/200 kb), which is expected given that a tag SNP exposed to selection in males. Consequently, given other- strategy was used to select markers for the Illumnia 650K wise equal conditions, recessive beneficial mutations aris- chip and that the average recombination rate on the X chro- ing on the X chromosome are more likely to go to fixation mosome is about 60% of the average autosomal rate [11]. than those arising on the autosomes, while recessive, dele- The genotypes were phased using the program fastPHASE terious mutations are more likely to be lost [7]. The X chro- [12]; for the X chromosome, known haplotypes from male mosome's haploid state in males and its smaller overall chromosomes were also used in phasing the female chro- effective population size also mean that selection-driven mosomes. fixation or loss of non-neutral X-linked alleles proceeds more rapidly than comparable processes on the autosomes, Population structure regardless of the initial frequency of the selected allele [8]. Given the X chromosome's disproportionate sensitivity to While it remains unclear how important recessive, non-neu- female demography, it is possible that X-linked genomic tral mutations are to human adaptation and to evolution in variation has a different underlying population structure general, there is some evidence that positive selection act- than autosomal variation. To investigate this, we analyzed ing on recessive, beneficial mutations has been important in the X chromosome data with frappe [13], a maximum like- shaping patterns of X-linked genetic variation in humans lihood based method that establishes K ancestry groups [9]. based on allele frequency patterns and then assigns each Given the special features of the X chromosome and of its individual K percentages that correspond to his or her pro- interactions with the forces that influence human genetic portional membership in each group. As about two-thirds of variation, the analysis of patterns of X-linked genetic varia- the individuals in our sample are haploid for the X chromo- tion both independently and in comparison to autosomal some while one-third are diploid, we ran frappe on individ- patterns has the potential to reveal features of large ual X chromosomes rather than on individuals. The results genome-wide genotypic datasets that cannot be detected of this analysis with K = 7 are shown in Figure 1a. The X using autosomal markers alone. Here we use a number of chromosomes are partitioned into seven clusters that corre- methods to characterize the data represented by the approx- spond to the seven major continental cohorts - Africa, the imately 16,000 X-linked SNPs typed as part of a genome- Middle East, Europe, Central Asia, East Asia, Oceania, and wide panel in the 51 globally distributed CEPH-HGDP America - represented in the CEPH-HGDP sample set. (In populations. We begin by examining the population struc- contrast, the data contained in 20 X-linked microsatellites ture underlying variation on the X chromosome. We then was only able to resolve the CEPH-HGDP samples into 5 use Fst values and pairwise allele frequency differences to distinct groups [2]). These are the same 7 groups that were examine population differentiation and explore what the observed when frappe was run on 640,698 autosomal mark- results of these analyses indicate about past demographic ers [3]. The major difference between these previous auto- patterns. Finally, we scan the X chromosome for haplotype somal results and Figure 1a is the failure of the Eurasian X structure consistent with the influence of selection. We fin- chromosomes to cleanly separate into Middle Eastern, ish by discussing two regions we identified as being clear European, and Central Asian groups. While most Middle outliers from the rest of the chromosome with respect to Eastern, European, and Central Asian X chromosomes have SNP allele frequency distribution and linkage disequilib- their largest contributions from their respective continents rium patterns. of origin, most also have sizable contributions from the other two Eurasian continents. This suggests a lack of clear Results and discussion genetic