Recent Human Adaptation: Genomic Approaches, Interpretation and Insights
Total Page:16
File Type:pdf, Size:1020Kb
REVIEWS Recent human adaptation: genomic approaches, interpretation and insights Laura B. Scheinfeldt1,2 and Sarah A. Tishkoff1,3 Abstract | The recent availability of genomic data has spurred many genome-wide studies of human adaptation in different populations worldwide. Such studies have provided insights into novel candidate genes and pathways that are putatively involved in adaptation to different environments, diets and disease prevalence. However, much work is needed to translate these results into candidate adaptive variants that are biologically interpretable. In this Review, we discuss methods that may help to identify true biological signals of selection and studies that incorporate complementary phenotypic and functional data. We conclude with recommendations for future studies that focus on opportunities to use integrative genomics methodologies in human adaptation studies. Reproductive fitness Evidence from the archaeological record as well as from polymorphism (SNP) and whole-genome sequencing A measure of how well an genetic and genomic studies demonstrates that anatomi- (WGS) data from many different populations, giving us organism survives and cally modern humans emerged in Africa ~200 thou- the ability to take a ‘top-down’ approach to the study of reproduces; it is generally sand years ago (kya) and rapidly migrated across the human adaptation in globally diverse populations. Thus, determined by the genetic contribution of an individual globe into new environments ~80–50 kya (reviewed studies of human adaptation have moved from focusing to the gene pool of the next in REFS 1,2). One of the implications of this history is on specific genes that are known to be involved in certain generation. that our Late Pleistocene (~125–12 kya) ancestors were traits of interest to looking across the genome for signa- subjected to a diverse range of novel selective pressures. tures of selection. There are several advantages to genome- Gene pathways Therefore, phenotypes such as thermoregulation in cold wide approaches, the most important of which is that there Sets of interacting gene products that are related to environments, tolerance to hypoxia at high altitude and is no need for a priori candidate gene hypotheses. Recent particular functions, including light skin pigmentation in regions with low amounts of work using these genomic approaches has identified signalling and metabolic sunlight are likely to have increased reproductive fitness novel candidate adaptive genes as well as gene pathways pathways. and thus to have been affected by adaptive pressures. that are potentially involved in human adaptation, In addition, during the Neolithic period (~12–4 kya), several examples of which are discussed below4–6. many of our ancestors began to adopt more sedentary However, the genomic strategies to identify candidate lifestyles, resulting in increased population densities, as adaptive genes have their limitations, and it is therefore well as distinct diets that were associated with pastoral crucial to use appropriate methodologies and to carry and agricultural technologies. These changes are also out in‑depth follow-up studies of putative functional likely to have resulted in distinct adaptive pressures. For variation to separate false positives from biological sig- example, increased population densities are correlated nals. False-positive signals (also known as type I errors) 1Department of Genetics, University of Pennsylvania, with increased infectious-disease loads, and phenotypes occur when negative results are not rejected. In the case Philadelphia, Pennsylvania related to the immune response are thought to have been of genome-wide scans for selection, false positives gen- 19104, USA. affected by this environmental shift3. Thus, the identi- erally refer to candidate adaptive loci which are actually 2Coriell Institute for Medical fication of genetic signatures of human adaptation to neutrally evolving or deleterious. The primary difficulty Research, Camden, New Jersey 08103, USA. such pressures is informative not only for understanding in identifying candidate adaptive loci is that the evolu- 3Department of Biology, the ways in which adaptation has shaped genetic varia- tionary processes generating genomic variation are vari- University of Pennsylvania, tion in contemporary populations, but also because the able across loci, resulting in a low signal-to‑noise ratio Philadelphia, Pennsylvania phenotypic consequences of adaptive genetic variants for tests of selection. Furthermore, most strong candi- 19104, USA. may have a role in the biological variation and health of date adaptive loci that have been identified in global e‑mails: [email protected]. upenn.edu; lscheinfeldt@ contemporary humans. human populations are located not in protein-coding coriell.org Recent advances in genomic technology have led regions but in intergenic regions instead, and these loci doi:10.1038/nrg3553 to the availability of genome-wide single nucleotide are more likely to be involved in complex mechanisms 692 | OCTOBER 2013 | VOLUME 14 www.nature.com/reviews/genetics © 2013 Macmillan Publishers Limited. All rights reserved REVIEWS such as gene regulation, which generate phenotypic vari- genes or genomic regions that are tagged or represented ation. Indeed, regulatory elements can be located hun- by common SNPs8–12. The variants on SNP arrays were dreds of thousands of base pairs away from the genes chosen according to various criteria23, most often based that they regulate7. These combined characteristics make on variant identification, allele frequencies and patterns it particularly challenging to identify genetic loci that are of LD in a limited number of populations, mostly in involved in recent human adaptation. Eurasians9,24; however, the ascertainment strategy is often In this Review, we describe the current status of unclear. This design was optimized for genome-wide genome-wide tests of neutrality, regions of the genome association studies (GWASs), but the same data have that contain the strongest adaptive signatures and inte- also been used for genome-wide scans for adaptation. grative methods to investigate the functional conse- Thus, the variants captured were mainly those common quences of variation at these candidate adaptive loci. to a particular population, and many common vari- In particular, we highlight examples of recently char- ants in other continental regions were probably missed. acterized candidate adaptive genes and pathways, some Another problem that has arisen from this strategy of which have been studied in geographically diverse involves allele frequency bias that can affect inferences populations. In addition, we discuss some of the recent of demographic history23, although the Human Origins opportunities to take advantage of phenotypic, regula- SNP array uses a transparent ascertainment scheme that tory and functional data in the interpretation of results is particularly designed for demographic inferences25,26. that have been generated with genome-wide scans for Thus, genome-wide tests of neutrality that incorporate Demographic history adaptation, as well as some of the challenges that this demographic parameters using genome-wide SNP data The study of population- field of research has faced. may be limited. level changes over time, More recently, WGS data from global populations including changes in Identifying signatures of selection (for example, 1000 Genomes and Complete Genomics population size, migration, and gene flow between Genomic technologies have made the collection and 69 Genomes) have been made available for interrogation. populations. analysis of large amounts of population-level data pos- These types of data have several advantages, including sible. In turn, these data have been used to study the the removal of the bias introduced by variant ascertain- International HapMap ways in which adaptation has shaped genomic variation ment strategies, as well as the ability to directly capture Project A publically available across human populations. There are several genome- rare and functional variants. genome-wide data set of wide strategies to identify signatures of selection; how- common single nucleotide ever, the general principle involves identifying regions of Outlier approaches. The most common approach to polymorphisms generated the genome with patterns of variation that are unlikely identify signatures of positive selection in genome-wide from global populations. to have been shaped by neutral processes (such as data is to apply a statistical test that is sensitive to genetic demographic history Linkage disequilibrium ) alone. Here, we consider the data signatures of adaptation and to identify the ‘outlier’ vari- The non-random association of that are currently available, the various methods that ants or the variants with values that are in the extreme two or more genetic variants. have been used to collect these data and the limitations tails of the empirical distribution27. The statistical tests of these methods. that have been developed for this strategy are primarily Positive selection (FIG. 1a) A type of natural selection in designed to identify classic selective sweeps , in which a trait that increases Genomic data sets. Over the past decade or so, several which a novel adaptive variant arises de novo in a popu- reproductive fitness becomes genome-wide sets of common SNPs in global population lation and rapidly increases in frequency to (or near) more common over time in a samples