<<

Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Adaptive selection of an incretin in Eurasian populations

Chia Lin Chang2, James J. Cai3,4, Chiening Lo5,

Jorge Amigo6, Jae-Il Park7, and Sheau Yu Teddy Hsu1

1 Reproductive Biology and Stem Research Program, Department of Obstetrics and

Gynecology, Stanford University School of Medicine, Stanford, CA 94305-5317, USA

2 Department of Obstetrics and Gynecology, Chang Gung Memorial Hospital Linkou Medical

Center, Chang Gung University, 5 Fu-Shin Street, Kweishan, Taoyuan, Taiwan

3 Department of Biology, Stanford University, Stanford, CA 94305, USA

4 Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX

77840, USA

5 Department of Clinical and Experimental Epilepsy, Institute of Neurology, University College

London, United Kingdom

6 Genomic Medicine Group, University of Santiago de Compostela, CIBERER, Santiago de

Compostela, Spain

7 Hormone Research Center and School of Biological Sciences and Technology, Chonnam National

University, Kwangju 500-712, Republic of Korea

Corresponding author: Sheau Yu Teddy Hsu, Stanford University School of Medicine, Department of Obstetrics and Gynecology, 300 Pasteur Dr., Boswell Building, Room A344D, Stanford, CA

94305-5317, USA. Tel: 650-799-3496; Fax: 650-725-7102; E-mail: [email protected]

Running Title: Adaptive evolution of the GIP gene

1 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Abstract

Diversities in human physiology have been partially shaped by adaptation to natural environments and changing cultures. Recent genomic analyses have revealed single nucleotide polymorphisms (SNPs) that are associated with adaptations in immune responses, obvious changes in human body forms, or adaptations to extreme climates in select human populations. Here, we report that the human GIP was differentially selected among human populations based on the analysis of a nonsynonymous SNP (rs2291725). Haplotype structure analysis suggests that, owing to positive selection, the derived at 2291725 arose to in East Asians ~8.1 thousand years ago. In addition, comparative and functional analyses showed that the human GIP gene encodes a cryptic glucose-dependent insulinotropic polypeptide (GIP) isoform (GIP55S or GIP55G) that encompasses the SNP and is resistant to serum degradation relative to the known mature GIP peptide. Importantly, we found that GIP55G, which is encoded by the derived allele, exhibits a higher bioactivity as compared to GIP55S, which is derived from the ancestral allele. The combined results suggested that rs2291725 represents a functional and may contribute to the population genetics observation. Given that GIP signaling plays a critical role in homeostasis regulation at both enteroinsular and enteroadipocyte axes, our study highlights the importance of understanding adaptations in energy-balance regulation in the face of the emerging diabetes and obesity epidemics.

2 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Introduction

Recent studies have revealed that genetic variation underlie a variety of diversities in human physiology and pathology (Sabeti et al. 2005; Voight et al. 2006; Sulem et al. 2007; Tishkoff et al.

2007; Leslie 2010; Simonson et al. 2010; Voight et al. 2010; Yi et al. 2010). Among the sundry forms of genetic variation, single nucleotide polymorphisms (SNPs) with high population differentiation are regarded as candidates of adaptation to recent changes in human environment and culture, and have been shown to play an important role in acquiring distinct physiological traits and susceptibility to different diseases among populations (Barreiro and Quintana-Murci 2010; Chen et al. 2010; Gibbons 2010; Ingelsson et al. 2010; Laland et al. 2010; Luca et al. 2010; Richerson et al.

2010). Thus, the identification of causal SNPs with signatures of positive selection and underlying functional changes is crucial to a better understanding of the relationship between genomic variation and human health as well as gene-environmental interactions (Nielsen et al. 2007; Laland et al.

2010; Nei et al. 2010). To systematically analyze the contributions of genetic variation in intercellular signaling molecules to physiological diversities in humans, we studied SNPs in the coding region of 839 human polypeptide hormones and their cognate receptors for evidence of selection using the data from the International HapMap project phases I and II (The International

HapMap Project 2003; International HapMap Consortium et al. 2007). We focused on these polypeptide ligands and receptors because they represent half of the targets of modern medicine

(Drews 2000) and because the associated with intercellular communication or responses to environmental factors (e.g., pathogens and food sources) have been implicated in the evolution of a variety of common traits and pathologies in humans and other vertebrates (Seminara et al. 2003;

Sabeti et al. 2005; Hoekstra et al. 2006; Lalueza-Fox et al. 2007; Sulem et al. 2007; Chambers et al.

2008; Prokopenko et al. 2008; Shiao et al. 2008; Shimomura et al. 2008; Anderson et al. 2009;

3 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Topaloglu et al. 2009). Here, based on genomic and biochemical analyses, we show that variants in an incretin hormone gene GIP were differentially selected in human populations, and a nonsynonymous SNP, rs2291725, represents a functional mutation. Because changes in food source represented one of the most important selection pressures during transitions of human culture and that incretin hormones play critical roles in homeostasis maintenance at the enteroinsular and enteroadipocyte axes, future studies of potential genotype–phenotype relationships for the selected

GIP variants could provide a better understanding of which and how these variants contribute to phenotypic variation in energy-balance regulation among individuals or human populations.

4 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Results

A nonsynonymous SNP (rs2291725) in the human glucose-dependent insulinotropic polypeptide gene (GIP) exhibits high population differentiation

To investigate whether variation in polypeptide intercellular signaling molecules contribute to physiological diversities in humans, we curated 457 human G -coupled receptors (GPCR) and their cognate ligand genes as well as 382 human nonGPCR and ligand genes (Ben-

Shlomo et al. 2003; Semyonov et al. 2008)(Supplementary Table 1). We started by using the population differentiation statistic FST to identify leads for functional characterization (Lewontin and Krakauer 1973; Akey et al. 2002; Li et al. 2008).

We computed the FST for coding SNPs of GPCR and ligand genes, and compared the result with those for coding SNPs of all other human genes. FST was computed between all possible pairs of HapMap II populations (YRI [African, Yoruba from Ibadan]), CEU [European, U.S. residents with northern and western European ancestry], and ASN [East Asian, pooled samples of Chinese from Beijing (CHB) and Japanese from Tokyo (JPT)]). Distributions of FST for either synonymous or nonsynonymous SNPs between any two HapMap II populations showed no difference (all p >

0.01, Kolmogorov–Smirnov Test, Supplementary Fig. 1). Likewise, studies of the FST of coding

SNPs in 382 nonGPCR receptor and ligand genes have shown similar cumulative distribution function (CDF) plots, suggesting that there is no difference in FST distribution between the GPCR and the nonGPCR groups (p > 0.01, Kolmogorov–Smirnov Test, Supplementary Fig. 2). These results suggest that, when analyzed as a whole set, coding SNPs of human polypeptide receptor and ligand genes do not have significantly elevated FST.

5 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Nevertheless, at the individual-gene level, dozens of coding SNPs in these receptor and ligand genes have a high FST (> 0.5) between select pairs of populations (Supplementary Tables 2 and 3). Among GPCRs and their cognate ligands, DRD5, DARC, CELSR1, CCL23, GIP, MC1R,

EMR1, GRM1, CALCR, CXCR6, GPR39, and DRD3 were found to have nonsynonymous SNPs with a high FST, which suggests that these SNPs are likely to be targets of selection. On the other hand, nonsynonymous SNPs in EDAR, which has been repeatedly shown to be under positive selection (Bryk et al. 2008; Fujimoto et al. 2008), and several immune-response-related genes (e.g.,

IL4R, TNFRSF10A, TRAF3, IL29, IL20RA, IL1RL1, TNFRSF6B, and PTPRA) in the nonGPCR group were found to have high FST scores in select pair(s) of populations.

Importantly, we found that the nonsynonymous SNP (rs2291725) in exon 4 of the glucose- dependent insulinotropic polypeptide gene (or gastric inhibitory peptide, GIP; referred to as the

GIP103T/C mutation in the following text) on 17 is highly linked with a cluster of neighboring SNPs within a 250-kb region (starting from rs8079874 to rs2291726), and has an FST value in the top 0.5% of all nonsynonymous SNPs in comparisons between ASN and YRI populations (Fig. 1A, Supplementary Table 2). In contrast, FST estimates in comparisons between

CEU and YRI, or between CEU and ASN were not different from the genome average (Fig. 1, B and C). Consistently, data from the HGDP-CEPH project (Centre d’Etude du Polymorphisme

Humain- Diversity Panel; 944 unrelated individuals from 52 populations)(Cann et al. 2002; Rosenberg 2006; Li et al. 2008) and the Human Genome Center at the University of

Tokyo (752 Japanese individuals, GIP103C/GIP103T = 0.732/0.268) showed that the derived GIP103C allele frequency at rs2291725 is much higher (>60%) in the majority of East Asian populations and varies widely among other populations: ranging from 0.0-9.5% in Sub-Saharan Africans and

6 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

increasing to >40.0% in European and Middle Eastern populations (Fig. 1D, Supplementary Table

5).

GIP encodes one of the two incretin hormones (glucagon-like peptide-1[GLP-1] and GIP) in humans, and plays a critical role in normal carbohydrate and lipid metabolism (Kim and Egan

2008). After ingestion of nutrients, GIP secreted from duodenal and jejunal K-cells acts on pancreatic β-cells to stimulate the release of insulin, which thereby ensures prompt uptake of glucose and lipids into the tissues. Abnormal regulation of GIP signaling leads to altered carbohydrate metabolism and lipid accumulation at the enteroinsular and enteroadipocyte axis, respectively (Miyawaki et al. 2002; Fulurija et al. 2008; Isken et al. 2008; Kim and Egan 2008). In mice, the deletion of the GIP receptor (Gipr) led to impaired first-phase glucose-stimulated insulin release (Miyawaki et al. 2002) whereas exogenous GIP was found to worsen postprandial hyperglycemia in type 2 diabetes patients (Chia et al. 2009). In addition to effects on glucose homeostasis, GIP has been shown to promote obesity in mice that were fed a high-fat diet (McClean et al. 2007; Gniuli et al. 2010). Because the incretin effect induced by the oral glucose intake leads to a higher insulin response as compared to that from a matched intravenous glucose stimulation, the regulation of glucose levels by GIP represents a critical endocrine circuit to monitor exogenous energy intake and regulate subsequent storage (Kim and Egan 2008). In addition, the critical role of

GIP signaling in energy balance regulation has been highlighted by recent studies that showed that variants at the GIPR locus are associated with glucose levels 2 hr after an oral glucose challenge test used in the diagnosis of type 2 diabetes (Ingelsson et al. 2010; Saxena et al. 2010). On the other hand, epidemiological studies have shown that the prevalence of different forms of diabetes and obesity as well as the regulation of glucose metabolism vary widely among ethnic groups

(Buchanan and Xiang 2005; Kim and Egan 2008). Thus, the observed high population

7 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

differentiation in GIP polymorphisms could be associated with adaptations of incretin physiology to recent changes in diets and cultures in select human populations. To test this hypothesis, we fine mapped the GIP locus and tested the functions of GIP variants in vitro and in vivo.

Variants at the GIP locus were partially selected in Eurasian populations

Analyses of linkage disequilibrium (LD) in the GIP region for the three HapMap II populations showed that extended LD blocks are present in CEU and ASN , and these blocks encompass GIP and neighboring UBE2Z, SNF8, and ATP5G1 genes (Supplementary Fig. 3,

A and B, square, D’=1, likelihood of odds [LOD] scores >2). In contrast, the majority of SNP pairs in the GIP region of YRI chromosomes exhibited a low LOD and a low r2 (Supplementary

Fig. 3C, blue square, D’=1, LOD scores < 2). Consistent with the LD analysis, plots depicting the haplotype map showed that most of the chromosomes with the derived GIP103C allele in the ASN population have haplotypes extending over 200 kb and are significantly longer as compared to chromosomes with the ancestral GIP103T allele (Fig. 2A, middle panel, p<0.001). By contrast, most chromosomes in YRI did not exhibit an extended haplotype surrounding the GIP103 allele (Fig. 2A, bottom panel). These results were reflected in the plots of extended haplotype homozygosity (EHH) decay curves (Fig. 2B). In ASN and CEU, the EHH curve for chromosomes carrying the derived

GIP103C allele extends much further than that for the ancestral allele. To test whether the area under the EHH curve is greater for a selected allele than for a neutral allele, we calculated the integrated haplotype score (iHS) for rs2291725 as a core marker (Voight et al. 2006). Using a coalescent model that generated 10,000 data sets from the same length of genome region (750 kb) with the same sample size (180 chromosomes), we found that the observed iHS for the derived allele in ASN

(iHS = -0.753)—but not CEU (iHS = -0.131)—deviated significantly from the neutral distribution

(p < 1x10-4; Supplementary Fig. 4A). Because a bottleneck in population growth or the presence of

8 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

recombination hotspots could contribute to the generation of putatively adaptive haplotype configurations, we also performed coalescent simulations that took into account the effects of demography and extreme recombination rates, respectively (Supplementary Fig. 4, A and B;

Supplementary Methods). Consistently, we found that no iHS for simulated haplotype sets is more negative than the observed iHS for rs2291725 under the bottleneck scenario (empirical p < 1x10-4), and that only a few are in the presence of a recombination hotspot (empirical p < 8x10-

4)(Supplementary Fig. 4A). These results indicated that the observed iHS has deviated significantly from the neutral distribution even after adjusting for bias that may have been introduced by a bottleneck in demography or heterogeneous recombination events. Thus, it is unlikely that neutral evolution alone explains the observed long haplotypes that carry the derived GIP103C allele in the

ASN population.

Further examination of extended haplotype blocks in ASN showed that rs2291725 is highly linked with another 43 SNPs within the adjacent 250-kb region, and, in a 70-kb core region

(rs1962412 to rs2291726; Genome build 36.3, Ch. 17, 44325258-44394253), the derived GIP103C allele is associated with a single haplotype (Fig. 2C, haplotype GIP-A; Supplementary Fig. 3D;

Supplementary Table 4). By contrast, the ancestral GIP103T allele found in the majority of YRI chromosomes is represented by 12 different haplotypes (Fig. 2C, haplotypes GIP-A, B, and D-M;

Supplementary Fig. 3D). An unrooted tree analysis of these haplotypes confirmed that the evolutionary trajectory of the GIP103C-associated haplotype is distinct from other haplotypes (Fig.

2C). Given the presence of several characteristic patterns, including highly differentiated , high-frequency derived alleles, and relatively long derived haplotypes, these data suggested that pre-existing polymorphisms at the GIP locus were partially selected in ASN and possibly in CEU at times postdating the separation of YRI and Eurasian populations (Smith and Haigh 1974).

9 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GIP encodes a GIP peptide containing the variable residue (Ser103 or Gly103) at rs2291725

Whereas the selection at the GIP locus could be attributed to a single variant or a combination of SNPs, the nonsynonymous rs2291725 provided a tangible target for functional analyses of the causal mutation. To explore whether rs2291725 represented a causal variant and provided a benefit to its carriers, we investigated the function of GIP peptides containing the variable residue (Ser103 for GIP103T or Gly103 for GIP103C). GIP was originally characterized as a

42-amino-acid peptide derived from proteolytic processing at monobasic cleavage sites at residues

51 and 94 of the GIP open reading frame (Moody et al. 1984; Takeda et al. 1987; Kim and Egan

2008). Although residue 103 is located outside the conventional mature GIP sequence (residues 52-

93; Fig. 3A, upper panel), we noticed that a conserved dibasic cleavage site (Arg-Lys, residues 106-

107) is located 13 amino acids downstream from the conventional cleavage site in primates, , cats, cows, pigs, and (Fig. 3A, upper panel). Thus, post-translational cleavage at this alternative processing site could generate an extended GIP isoform 13 amino acids longer (Fig. 3A, lower panel; the ancestral GIP55S and the derived GIP55G). To investigate this possibility, we analyzed whether the GIP55 peptide is expressed in gut cells of the proximal small intestine. In support of our hypothesis, immunohistochemical analysis of human duodenum sections with a rabbit GIP55-specific and a mouse anti-GIP antibody showed that immunoreactive GIP55 and GIP are colocalized in select duodenum cells, suggesting that GIP55 is present in gut cells that normally express GIP (Fig. 3B). To study whether GIP55 is secreted into general circulation, we then analyzed the presence of GIP55 in serums of individuals 30 min after a regular breakfast meal using a sandwich ELISA assay that detects the extended C-terminal sequences specific to GIP55.

Consistently, we found that GIP55 is present in human serum, and constitutes approximately 1-3% of the total GIP after a meal (GIP55, 2.91±0.41 pmole/L; total GIP, 95.1±9.41 pmole/L, N=6).

10 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Thus, depending on the genotype, humans could contain two (GIP103T/T: GIP+GIP55S; GIP103C/C:

GIP+GIP55G) or three GIP isoforms (GIP103T/C: GIP+GIP55S+GIP55G)(Fig. 3A).

Because the receptor-activation domain of GIP is located at the N-terminus of the peptide, we reasoned that alternative processing at the C-terminus of GIP55 peptides is unlikely to decimate their bioactivity. Indeed, functional testing of synthetic GIP isoforms (GIP, GIP55G, and GIP55S) in vivo showed that, similar to conventional GIP, the extended GIP55G and GIP55S suppress hyperglycemia to similar extents in a time-dependent manner in fasting rats (Fig. 3C).

The ancestral GIP55S and the derived GIP55G peptides exhibit distinct bioactivity profiles in vitro

To compare the bioactivity of GIP isoforms and variants, we measured their receptor- activation activities in vitro using HEK293T cells expressing a recombinant human GIP receptor.

As expected, treatments of GIP led to dose-dependent increases of cAMP production in transfected cells (Fig. 4A, top left panel). Unlike conventional GIP, which exhibits an EC50 of ~0.9±0.21 nM, the extended GIP isoforms have ~3-fold lower potencies (GIP55G, 3.2±0.21 nM; GIP55S, 2.6±0.23 nM; Supplementary Table 6). Importantly, we found that the derived GIP55G consistently increases cAMP production to significantly higher levels as compared to the ancestral GIP55S (Fig. 4A, top left panel). Because GIP is known to be susceptible to serum degradation in vivo, we also studied the stability of GIP isoforms in human serum in vitro. Surprisingly, we found that GIP55G and

GIP55S are more resistant to degradation by either pooled normal human serum (Fig. 4A) or pooled complement-preserved human serum (Supplementary Fig. 5). The ranking of potency on receptor activation shifted from GIP>GIP55G>GIP55S at 0 hr to GIP=GIP55G=GIP55S and

GIP55G>GIP55S≥GIP, respectively, after a 6-hr or a 12-hr pre-incubation with either normal serum

11 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

(Fig. 4A), or complement-preserved serum (Supplementary Fig. 5). Plots of EC50 data in relation to the length of incubation showed that the slope of changes in bioactivity for GIP is significantly steeper than that of GIP55G (Fig. 4B, p = 0.0023). On the other hand, coincubation with a recombinant dipeptidyl peptidase IV (DPP IV) led to similar extents of degradation of these peptides (Supplementary Table 7). Thus, the resistance to serum degradation by GIP55G cannot be attributed to a resistance of DPP IV, which represents the major processing enzyme that degrades

GIP and GLP-1 in vivo by cleaving these peptides at position 2 of the N-terminus (Kim and Egan

2008). These data suggested that the rise in the frequency of GIP103C in Eurasian populations could be associated with the quantitative increase in the overall potency of GIP55G. Although the nature of the selective advantage conferred by GIP55 peptides is not clear, we speculate that GIP103C could represent a risk allele in ancestors of YRI populations but provide a protective effect in Eurasians.

The derived GIP103C allele arose to dominance in East Asians ~8.1 thousand years ago

Because energy-balance regulation-related loci are subject to selection pressures that fluctuate over time in response to environmental and culture changes, genetic responses to changes in human diet are more likely associated with incomplete signatures of selection or even signatures of balancing selection (Charlesworth 2006; Pritchard et al. 2010). Based on this understanding, we speculated that the GIP103C and GIP103T alleles likely confer distinct advantages depending on the history of culture changes (Allison 1956; Turner et al. 1979) and that the selection of GIP103C could occur at a time when humans experienced major shifts in subsistence culture. To investigate this possibility, we estimated the age of the GIP103C-associated haplotype. Under the neutrality, the average age of a polymorphism with frequency p is estimated to be -4Ne[p(logp)/(1-p)] (Kimura and

Ota 1973; Slatkin and Rannala 2000). With the assumption of Ne = 5,000 for each population, this

12 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

yielded 77.5, 350, and 425 thousand years for the derived allele to arise to its current frequencies in

YRI, CEU, and ASN populations, respectively. These estimates are obviously incompatible with archaeological evidence showing that modern humans originated ~195 kya in Sub-Saharan Africa and that a first wave of migration to the Arabic peninsula occurred ~60-55 kya, which was followed by migration toward Northern Eurasia ~40 kya (McDougall et al. 2005; Klein 2009). Consequently, we estimated the age of the GIP103C-associated haplotype on the basis of the decay of haplotypes

(Reich 1998; Stephens et al. 1998). Based on a recombination rate derived from estimates of LD of the HapMap dataset (McVean et al. 2004), the analysis showed that ASN contains a dominant ancestral haplotype and that GIP103C-associated haplotypes arose to dominance approximately 8.1 kya in East Asians. Because this dating approach relies on the linkage map derived from estimates of LD, and that discrepancies between LD maps and the pedigree-based recombination maps are significant in many genomic regions (Clark et al. 2010), we also performed the analysis using alternative estimates of recombination rate, which range from 0.5 to 3.03 cM/Mb in three studies of pedigree-based recombination maps (i.e., deCODE, Marshfield, and Genethon)(Dib et al. 1996;

Broman et al. 1998; Kong et al. 2002). Using these recombination rates, we obtained alternative estimates of the age to be between 11,800 to 2,000 years. On the other hand, dating is unattainable for CEU because the ancestral GIP103C haplotype in this population cannot be identified unambiguously, perhaps because that mutation in the region is effectively "saturated"

(Supplementary Fig. 6). Given that GIP signalling plays a critical role in the regulation of glucose and lipid metabolism, it is plausible that the increased prevalence of GIP103C allele in Eurasians is a consequence of changes in foraging skills or in population dynamics associated with the emergence of agricultural societies in the Eurasian continent 12-7 kya (Balter 2007; Fuller et al. 2009; Jones and Liu 2009; Richards and Trinkaus 2009; Gibbons 2010).

13 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Discussion

Based on gene age estimation and biochemical analyses, our study revealed a functional mutation that is associated with the selection of the GIP locus in East Asian populations approximately 8.1 kya and the presence of a cryptic GIP isoform. Specifically, we showed that the inventory of human GIP peptides has recently diverged, and that individuals could express three different combinations of GIP isoforms (GIP, GIP55S, and GIP55G) with distinct bioactivity profiles. Future study of how this phenotypic variation affects glucose and lipid homeostasis in response to different diets and which physiological variations in humans can be attributed to prior gene–environmental interactions at the GIP locus is crucial to a better understanding of human adaptations in energy-balance regulation.

Taking advantage of the availability of genome information from diverse human populations, recent studies have revealed a number of loci and variants that describe phenotypic variation in appearance, physiological parameters, and pathological responses to diseases (Sabeti et al. 2005;

Voight et al. 2006; Sulem et al. 2007; Tishkoff et al. 2007; Gibbons 2010; Leslie 2010; Ng et al.

2010; Simonson et al. 2010; Voight et al. 2010). In addition, it has been shown that human genomes contain hundreds of loci that exhibit varying degrees of positive selection (Kelley et al. 2006;

Voight et al. 2006; Sabeti et al. 2007; Barreiro et al. 2008; Akey 2009; Cai et al. 2009; Pickrell et al.

2009). These recent in silico findings on gene selection, perhaps due to differential gene– environmental interactions among populations, have opened doors to human history and a better understanding of the prevalence of adaptations among human populations. However, few causal variants have been confirmed or linked to a phenotype (Akey 2009; Gibbons 2010; Nei et al. 2010).

Thus, our finding that the high-frequency GIP103C-associated haplotypes in Eurasian populations are functionally relevant has provided a rare opportunity to better understand the environmental impact

14 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

on human physiology at the enteroinsular and enteroadipocyte axes. It is generally accepted that food source represents a potent force that influences selection and adaptive radiation in nature

(Darwin 1859; Freeman and Herron 2003). For example, the external differences in beak morphology and speciation of Darwin's finches are believed to be results of adaptations that exploit particular types of seeds, insects, and cactus flowers on the Galápagos Islands (Schluter 2000).

Likewise, changes in food source represented one of the most important selection pressures during transitions of human culture and posed a wide spectrum of challenges to the digestive and endocrine systems of our ancestors (Piperno et al. 2004; Balter 2007; Fuller et al. 2009; Jones and Liu 2009;

Richards and Trinkaus 2009; Gibbons 2010). For example, the ability to digest lactose in milk

(lactase persistence) and the adoption of a pastoral culture has been linked to the selection of a variety of variants at the lactase locus in multiple ethnic groups at various time points during the last ten millenniums (Tishkoff et al. 2007; Itan et al. 2009). Similarly, individuals from populations with high-starch diets tend to have more copies of amylases than do those from populations with low- starch diets (Perry et al. 2007). It has long been hypothesized that our ancestors ate a low- carbohydrate, high-protein diet, and that the adaptive response was manifested as insulin resistance, perhaps for coping with low dietary glucose (Miller and Colagiuri 1994). Because normal GIP signaling is crucial to normal glucose and lipid metabolism, we speculate that the selection of

GIP103C haplotypes could be pertinent to shifts in long-term energy-balance regulation after the emergence of agriculture, which provided a stable supply of high-starch staples and a reduced need for metabolic efficiency as opposed to the traditional hunter-gatherer societies (Miller and Colagiuri

1994; Wang et al. 2006; Balter 2007; Fuller et al. 2009; Jones and Liu 2009; Richards and Trinkaus

2009; Gibbons 2010; Luca et al. 2010; Richerson et al. 2010).

15 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

It was hypothesized by Neel almost fifty years ago that mismatches between prior physiological adaptations and contemporary environments can lead to health risks because ancestral variants that have been selected for the organism’s fitness or reproductive success may not be optimal for the individual’s health in the new environment (Neel 1962). In support of this thrifty genotype hypothesis, a number of genes in humans and house mice have been implied to have co- evolved with the emergence of agricultural societies (Prentice et al. 2005; Vander Molen et al.

2005; Shiao et al. 2008), and that a rapid shift in diets is associated with detrimental effects on human survival in a number of human populations (LANCET 1989). Conceptually, the serum- resistant GIP55G carried by the GIP103C haplotype may have been beneficial for individuals who have unconstrained access to the food supply in many agricultural societies by preventing severe hyperglycemia. As selection pressure changed in these societies, the ancient GIP103T haplotype could have become a liability and conferred a loss of fitness in the new environment. Importantly, regardless of what the selection pressure might have been, our study indicated that the GIP locus was susceptible to recent changes in human environment, and that physiological variation stemming from this selection could bear important implications for understanding phenotypic variation of metabolic syndromes such as diabetes and obesity. Nonetheless, it is important to note that the selection of the derived GIP103C haplotypes could be a consequence of coordinated actions of multiple SNPs at the GIP locus. Future analysis of the functionality of rs2291725 and linked SNPs is needed to elucidate the exact nature of the selection of GIP103C haplotypes in the last ten millenniums.

Taken together, our study has highlighted an unexpected complexity in the regulation of sugar and lipid metabolism among human populations; it has also illuminated the importance of understanding adaptations in genes associated with energy-balance regulation in the face of ongoing

16 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

changes in our diets toward refined and high-energy convenience food as well as the emerging pandemics of diabetes and obesity in modern society.

17 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Methods

Genotype analysis

FST estimation and LD plot

The selected loci that have rapidly increased in frequency due to local positive selection are likely to show high levels of genetic differentiation, which can be quantified with the FST statistic

(Lewontin and Krakauer 1973; Weir 1996). The principle of a number of estimators of FST is FST =

(HT - HS)/HT, where HT is the heterozygosity of the total population and HS is the average heterozygosity across subpopulations. The empirical distribution of FST or the average of FST over multilocus windows for genome-wide polymorphism data has been used to detect the genomic regions under positive selection (Akey et al. 2002). We computed the FST for all nonsynonymous

SNPs in the HapMap project phases I and II, and the function snp_fst.m in PGEToolbox (Cai 2008) was used to calculate and present an unbiased estimate of FST.

LD plots for SNPs within the CALCOCO2, ATP5G1, UBE2Z, SNF8, GIP, and IGF2BP1 gene regions from rs1422645 to rs8069452, spanning 250 kb, were generated using HaploView 4.1

(Barrett et al. 2005). The LD blocks were determined using an r2 threshold of 0.8.

Analysis of extended haplotype homozygosity (EHH) and integrated haplotype score (iHS)

The EHH plots were generated as described (Voight et al. 2006). We also computed the extended haplotype homozygosity (EHH) statistic for the core SNP (rs2291725) as previously described (Sabeti et al. 2002). The EHH curve measures the decay of identity of haplotypes that carry the core SNP as a function of distance. When an allele rises rapidly in frequency due to strong selection, it tends to exhibit high levels of haplotype homozygosity extending further than expected

18 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

under a neutral model. For the analysis of EHH decay, haplotypes in the genomic region that encompasses a 750-kb region centered on the GIP gene and contains 388 phased SNPs were downloaded from the HapMap project website.

The integrated haplotype score (iHS) statistic detects whether the area under the EHH curve is greater for a selected allele than for a neutral allele (Voight et al. 2006). Large negative values indicate unusually long haplotypes carrying the derived allele. To test whether observed iHS for rs2291725 are significantly deviated from expected values derived from neutral evolution, we used a coalescent model and generated 10,000 replicated haplotype sets using the coalescent simulator ms (Hudson 2002). We adopted parameters that are compatible with the sample size (180 chromosomes) of the ASN population and the length of genomic region (750 kb) that was analyzed.

-7 The simulation was conditioned based on a recombination rate = 9.7x10 cM/bp, a N0 = 10,000, and the number of segregating sites = 388. We also required that each replicate to contain at least one segregating site in which the derived allele frequency reaches ~ 0.75 and that the site is located in the center of the haplotype (that is, between a 370- and 380-kb area of the 750-kb region in total).

With these conditions, we found no iHS value for a simulated data set to be more negative than the observed iHS (Supplementary Fig. 4A)(empirical p < 1x10-4).

Estimation of the age of GIP103C haplotype

Under positive selection, a derived, or novel, allele can quickly rise to a high frequency. As the allele ages, recombination causes the breakdown of LD within existing linked alleles, and mutation leads to the accumulation of new linked variations. The allele’s age can be estimated by the decay of the haplotype carrying the allele (Stephens et al. 1998). The length of the haplotype retained between different copies of the allele shortens over time due to recombination and

19 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

mutation, which is modelled using a Poisson process by Reich and Goldstein (1999). The probability that a haplotype remains ancestral (that is, that the haplotype that carries the derived allele retains the status of high frequency right after being selected) is p = e-G(r+u), where G is generations, r is the recombination rate, and u is the mutation rate. For a given allele in the derived haplotype, the haplotype-decay approach estimates the number of generations G in terms of P (the probability that a given haplotype does not change from its ancestor)(Stephens et al. 1998; Reich and Goldstein 1999).

In ASN, the GIP LD block is 91 kb long and contains eight distinct haplotypes (1-8 from top to bottom in Supplementary Fig. 6). Haplotypes 1-4 are GIP103C-associated, and are derived. Among these derived haplotypes, haplotype 1 is apparently the ancestral haplotype from which haplotypes

2-4 were derived. The frequency of haplotype 1 in GIP103C-associated haplotypes is P’ = 0.803. We obtained the recombination rate derived from estimates of LD (McVean et al. 2004) from the website of the HapMap project (http://hapmap.ncbi.nlm.nih.gov/). Regression analysis with data points of different physical lengths versus genetic map lengths gave regression function: y = 0.93x –

0.018, where x is the physical distance (Mb) and y is the genetic distance (cM). The genetic distance of the 91-kb region was estimated to be 0.0666 cM (that is, 0.0666% chance of crossing over in a single generation), which gives a rate of recombination r = 6.66x10-4 per generation. We took the mutation rate of the region u = 1.0x10-5, based on an estimation of the haploid mutation rate of

~1.1x10-8 per base per generation (Roach et al. 2010). We assumed that the most common haplotype is the ancestral haplotype and used P’ obtained above as an approximation of P. Using G

= -ln(P’)/(r+u), we obtained G = 325 (or 8,100 years).

20 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Phenotype tests

Reagents

Synthetic GIP peptides were obtained from the Genescript Corp. and the American Peptide

Company Inc. The extended GIP55G and GIP55S isoforms were synthesized by the Stanford

University PAN facility and GL Biochem, Ltd. Pooled normal human serum, pooled complement- preserved human serum, and serum from individuals were obtained from Innovative Research, Inc. and ProteoGenex, Inc. Stocks of different hormones were prepared in phosphate-buffered saline and diluted in a serum-free culture medium. In addition to routine chemistry and mass-spectrometry assessments, we verified the quantity of different GIP isoforms using a human GIP enzyme-linked immunosorbent assay (ELISA)(Linco Research, Inc.).

Immunohistochemical analyses and ELISA

Total GIP levels in human serums were measured using a sandwich human GIP ELISA

(Linco Research, Inc.), and assays were performed with a programmable ELISA plate washer. The minimum detection limit of the assay was 8.1pg/ml. For the measurement of GIP55, human serums were partially purified with C18 chromatography, and the secondary antibody of the human GIP

ELISA kit was substituted with a rabbit polyclonal antibody specific for the last 13 amino acids of

GIP55 (REARALELASQAN). The GIP55-specific were generated using a KLH- conjugated CREARALELASQAN peptide (Covance Research Products and Genescript Corp.).

For immunohistochemical analyses, human duodenum sections (BioChain Institute, Inc.) were dewaxed with xylene and maintained at 95-99C for 10 minutes in 10 mM sodium citrate buffer (pH 6.0), followed by cooling on the bench top. Sections were immunostained with a mouse monoclonal anti-GIP antibody (Abbiotec, LLC) and a rabbit GIP55-specific antibody overnight at

21 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

4C, followed by incubation with fluorochrome-conjugated secondary antibodies (Invitrogen Inc.) for 1 hr at room temperature in the dark. Signals for GIP and GIP55 were detected using an Alexa

488 donkey anti-mouse antibody (488 nm, 1:2500 dilution) and a Texas Red-conjugated goat anti- rabbit antibody (594 nm, 1:500 dilution), respectively, with a Leica SP2 single- and multi-photon confocal microscope.

The in vivo glucose suppression assay

Eight-week-old Sprague-Dawley rats were fasted overnight for 20 hr. To measure the GIP isoforms’ ability to reduce blood glucose levels in vivo, fasting rats were injected with a select GIP peptide (100 nmoles/kg) dissolved in 0.8 ml of PBS together with glucose (3.8 g/kg body weight).

Blood samples were obtained via the tail vein at select time points after intraperitoneal injection of glucose and the peptide. Glucose levels in the blood were measured with a OneTouch Ultra Blood

Glucose Monitoring System and OneTouch Ultra Test Strips (Johnson and Johnson).

Receptor-activation assay

The expression vector for the human GIP receptor (Origene Corp.) was transfected into

HEK293T cells using Lipofectamine 2000 (Invitrogen Inc.). Receptor-activation activities were assayed based on cAMP production in transfected cells, and were performed as described (Park et al. 2008). To quantify resistance to serum degradation by GIP isoforms, aliquots of GIP peptides were incubated in microfuges at 37C with human serum in a final concentration of 10 µM for indicated time spans. To analyze the mechanism underlying the serum-degradation-resistant property of GIP55, peptides (10 µM) were pre-incubated with a recombinant human dipeptidyl peptidase IV (1 mU/ml reaction in PBS; Enzo Life Science, SE-434) for 3, 6, or 12 hr and were

22 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

frozen at -80C before being tested for receptor-activation activity. The receptor-activation results were analyzed using the GraphPad Prism 5 package (GraphPad Software).

23 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Acknowledgments

We thank Yi Wei and Augustin Sanchez (Stanford University Pan Facility) for technical assistance.

We thank Drs. Aaron J.W. Hsueh and Renee A. Reijo Pera (Dept. of OB/GYN, Stanford

University) for critical review of the manuscript, and C.L.C. further expresses gratitude to Dr.

Yung-Kuei Soong (Department of OB/GYN, Chang Gung Memorial Hospital, Taiwan). J.J.C. thanks Dr. Dmitri Petrov (Dept. of Biology, Stanford University) for his invaluable advice and long- lasting support. We acknowledge the support of NIH (DK70652, SYTH), Avon Foundation (02-

2009-054, SYTH), and Chang Gung Memorial Hospital (CMRPG34002, CLC).

Author Contributions

S.Y.T.H. conceived and supervised the study. C.L.C., J.P., and S.Y.T.H. were involved in functional testing and data analyses. J.J.C., C.L., and J.A. processed and performed genome data analyses. The paper was written primarily by C.L.C., J.J.C., and S.Y.T.H.

24 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

References

Akey JM. 2009. Constructing genomic maps of positive selection in humans: where do we go from

here? Genome Res 19(5): 711-722.

Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. 2002. Interrogating a high-density SNP map for

signatures of . Genome Res 12(12): 1805-1814.

Allison AC. 1956. The sickle-cell and haemoglobin C genes in some African populations. Ann Hum

Genet 21(1): 67-89.

Amigo J, Salas A, Phillips C, Carracedo A. 2008. SPSmart: adapting population based SNP

genotype databases for fast and comprehensive web access. BMC Bioinformatics 9: 428.

Anderson TM, vonHoldt BM, Candille SI, Musiani M, Greco C, Stahler DR, Smith DW,

Padhukasahasram B, Randi E, Leonard JA et al. 2009. Molecular and evolutionary history of

in North American gray wolves. Science 323(5919): 1339-1343.

Balter M. 2007. Plant science. Seeking agriculture's ancient roots. Science 316(5833): 1830-1835.

Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. 2008. Natural selection has driven

population differentiation in modern humans. Nat Genet 40(3): 340-345.

Barreiro LB, Quintana-Murci L. 2010. From evolutionary genetics to human immunology: how

selection shapes host defence genes. Nat Rev Genet 11(1): 17-30.

Barrett JC, Fry B, Maller J, Daly MJ. 2005. Haploview: analysis and visualization of LD and

haplotype maps. Bioinformatics 21(2): 263-265.

Ben-Shlomo I, Yu Hsu S, Rauch R, Kowalski HW, Hsueh AJ. 2003. Signaling receptome: a

genomic and evolutionary perspective of plasma membrane receptors involved in signal

transduction. Sci STKE 2003(187): RE9.

25 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Broman KW, Murray JC, Sheffield VC, White RL, Weber JL. 1998. Comprehensive human genetic

maps: individual and sex-specific variation in recombination. Am J Hum Genet 63(3): 861-

869.

Bryk J, Hardouin E, Pugach I, Hughes D, Strotmann R, Stoneking M, Myles S. 2008. Positive

selection in East Asians for an EDAR allele that enhances NF-kappaB activation. PLoS One

3(5): e2209.

Buchanan TA, Xiang AH. 2005. Gestational diabetes mellitus. J Clin Invest 115(3): 485-491.

Cai JJ. 2008. PGEToolbox: A Matlab toolbox for population genetics and evolution. J Hered 99(4):

438-440.

Cai JJ, Macpherson JM, Sella G, Petrov DA. 2009. Pervasive hitchhiking at coding and regulatory

sites in humans. PLoS Genet 5(1): e1000336.

Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-

Tamir B, Cambon-Thomsen A et al. 2002. A human genome diversity cell line panel.

Science 296(5566): 261-262.

Chambers JC, Elliott P, Zabaneh D, Zhang W, Li Y, Froguel P, Balding D, Scott J, Kooner JS. 2008.

Common genetic variation near MC4R is associated with waist circumference and insulin

resistance. Nat Genet 40(6): 716-718.

Charlesworth D. 2006. Balancing selection and its effects on sequences in nearby genome regions.

PLoS Genet 2(4): e64.

Chen H, Patterson N, Reich D. 2010. Population differentiation as a test for selective sweeps.

Genome Res 20(3): 393-402.

26 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Chia CW, Carlson OD, Kim W, Shin YK, Charles CP, Kim HS, Melvin DL, Egan JM. 2009.

Exogenous glucose-dependent insulinotropic polypeptide worsens post prandial

hyperglycemia in type 2 diabetes. Diabetes 58(6): 1342-1349.

Clark AG, Wang X, Matise T. 2010. Contrasting methods of quantifying fine structure of human

recombination. Annu Rev Genomics Hum Genet 11: 45-64.

Darwin C. 1859. On the Origin of by Means of Natural Selection, or the Preservation of

Favoured Races in the Struggle for Life John Murray, London.

Dib C, Faure S, Fizames C, Samson D, Drouot N, Vignal A, Millasseau P, Marc S, Hazan J, Seboun

E et al. 1996. A comprehensive genetic map of the human genome based on 5,264

microsatellites. Nature 380(6570): 152-154.

Drews J. 2000. Drug discovery: a historical perspective. Science 287(5460): 1960-1964.

Freeman S, Herron JC. 2003. Evolutionary Analysis, 3rd edition. Prentice-Hall Inc., Upper Saddle

River, NJ.

Fujimoto A, Kimura R, Ohashi J, Omi K, Yuliwulandari R, Batubara L, Mustofa MS, Samakkarn U,

Settheetham-Ishida W, Ishida T et al. 2008. A scan for genetic determinants of human hair

morphology: EDAR is associated with Asian hair thickness. Hum Mol Genet 17(6): 835-843.

Fuller DQ, Qin L, Zheng Y, Zhao Z, Chen X, Hosoya LA, Sun GP. 2009. The domestication

process and domestication rate in rice: spikelet bases from the Lower Yangtze. Science

323(5921): 1607-1610.

Fulurija A, Lutz TA, Sladko K, Osto M, Wielinga PY, Bachmann MF, Saudan P. 2008. Vaccination

against GIP for the treatment of obesity. PLoS ONE 3(9): e3163.

Gibbons A. 2010. Human Evolution. Tracing evolution's recent fingerprints. Science 329(5993):

740-742.

27 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Gniuli D, Calcagno A, Dalla Libera L, Calvani R, Leccesi L, Caristo ME, Vettor R, Castagneto M,

Ghirlanda G, Mingrone G. 2010. High-fat feeding stimulates endocrine, glucose-dependent

insulinotropic polypeptide (GIP)-expressing cell hyperplasia in the duodenum of Wistar rats.

Diabetologia.

Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. 2009. Inferring the joint

demographic history of multiple populations from multidimensional SNP frequency data.

PLoS Genet 5(10): e1000695.

Hellenthal G, Stephens M. 2007. msHOT: modifying Hudson's ms simulator to incorporate

crossover and gene conversion hotspots. Bioinformatics 23(4): 520-521.

Hoekstra HE, Hirschmann RJ, Bundey RA, Insel PA, Crossland JP. 2006. A single

mutation contributes to adaptive beach mouse color pattern. Science 313(5783): 101-104.

Hudson RR. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation.

Bioinformatics 18(2): 337-338.

Ingelsson E, Langenberg C, Hivert MF, Prokopenko I, Lyssenko V, Dupuis J, Magi R, Sharp S,

Jackson AU, Assimes TL et al. 2010. Detailed physiologic characterization reveals diverse

mechanisms for novel genetic Loci regulating glucose and insulin metabolism in humans.

Diabetes 59(5): 1266-1275.

International HapMap Consortium Frazer KA Ballinger DG Cox DR Hinds DA Stuve LL Gibbs RA

Belmont JW Boudreau A Hardenbol P et al. 2007. A second generation human haplotype

map of over 3.1 million SNPs. Nature 449(7164): 851-861.

Isken F, Pfeiffer AF, Nogueiras R, Osterhoff MA, Ristow M, Thorens B, Tschop MH, Weickert

MO. 2008. Deficiency of glucose-dependent insulinotropic polypeptide receptor prevents

ovariectomy-induced obesity in mice. Am J Physiol Endocrinol Metab 295(2): E350-355.

28 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Itan Y, Powell A, Beaumont MA, Burger J, Thomas MG. 2009. The origins of lactase persistence in

Europe. PLoS Comput Biol 5(8): e1000491.

Jones MK, Liu X. 2009. Archaeology. Origins of agriculture in East Asia. Science 324(5928): 730-

731.

Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM. 2006. Genomic signatures of positive

selection in humans and the limits of outlier approaches. Genome Res 16(8): 980-989.

Kim W, Egan JM. 2008. The role of incretins in glucose homeostasis and diabetes treatment.

Pharmacol Rev 60(4): 470-512.

Kimura M, Ota T. 1973. The age of a neutral mutant persisting in a finite population. Genetics

75(1): 199-212.

Klein RG. 2009. Darwin and the recent African origin of modern humans. Proc Natl Acad Sci U S A

106(38): 16007-16009.

Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir

S, Barnard J, Hallbeck B, Masson G et al. 2002. A high-resolution recombination map of the

human genome. Nat Genet 31(3): 241-247.

Laland KN, Odling-Smee J, Myles S. 2010. How culture shaped the human genome: bringing

genetics and the human sciences together. Nat Rev Genet 11(2): 137-148.

Lalueza-Fox C, Rompler H, Caramelli D, Staubert C, Catalano G, Hughes D, Rohland N, Pilli E,

Longo L, Condemi S et al. 2007. A 1 receptor allele suggests varying

pigmentation among Neanderthals. Science 318(5855): 1453-1455.

LANCET T. 1989. Thrifty genotype rendered detrimental by progress? Lancet 2(8667): 839-840.

Leslie M. 2010. Genetics. Kidney disease is parasite-slaying protein's downside. Science 329(5989):

263.

29 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Lewontin RC, Krakauer J. 1973. Distribution of gene frequency as a test of the theory of the

selective neutrality of polymorphisms. Genetics 74(1): 175-195.

Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS,

Feldman M, Cavalli-Sforza LL et al. 2008. Worldwide human relationships inferred from

genome-wide patterns of variation. Science 319(5866): 1100-1104.

Luca F, H. PG, di Rienzo A. 2010. Evolutionary Adaptatons to Dietary Changes. Annu Rev Nutr 30:

291-314.

McClean PL, Irwin N, Cassidy RS, Holst JJ, Gault VA, Flatt PR. 2007. GIP receptor antagonism

reverses obesity, insulin resistance, and associated metabolic disturbances induced in mice

by prolonged consumption of high-fat diet. Am J Physiol Endocrinol Metab 293(6): E1746-

1755.

McDougall I, Brown FH, Fleagle JG. 2005. Stratigraphic placement and age of modern humans

from Kibish, Ethiopia. Nature 433(7027): 733-736.

McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. 2004. The fine-scale

structure of recombination rate variation in the human genome. Science 304(5670): 581-584.

Miller JC, Colagiuri S. 1994. The carnivore connection: dietary carbohydrate in the evolution of

NIDDM. Diabetologia 37(12): 1280-1286.

Miyawaki K, Yamada Y, Ban N, Ihara Y, Tsukiyama K, Zhou H, Fujimoto S, Oku A, Tsuda K,

Toyokuni S et al. 2002. Inhibition of gastric inhibitory polypeptide signaling prevents

obesity. Nat Med 8(7): 738-742.

Moody AJ, Thim L, Valverde I. 1984. The isolation and sequencing of human gastric inhibitory

peptide (GIP). FEBS Lett 172(2): 142-148.

30 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Neel JV. 1962. Diabetes mellitus: a "thrifty" genotype rendered detrimental by "progress"? Am J

Hum Genet 14: 353-362.

Nei M, Suzuki Y, Nozawa M. 2010. The neutral theory of molecular evolution in the genomic era.

Annu Rev Genomics Hum Genet 11: 265-289.

Ng SB, Bigham AW, Buckingham KJ, Hannibal MC, McMillin MJ, Gildersleeve HI, Beck AE,

Tabor HK, Cooper GM, Mefford HC et al. 2010. Exome sequencing identifies MLL2

as a cause of Kabuki syndrome. Nat Genet.

Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG. 2007. Recent and ongoing selection in

the human genome. Nat Rev Genet 8(11): 857-868.

Park JI, Semyonov J, Chang CL, Yi W, Warren W, Hsu SY. 2008. Origin of INSL3-mediated

testicular descent in therian . Genome Res 18(6): 974-985.

Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain

JL, Misra R et al. 2007. Diet and the evolution of human amylase gene copy number

variation. Nat Genet 39(10): 1256-1260.

Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers

RM, Feldman MW et al. 2009. Signals of recent positive selection in a worldwide sample of

human populations. Genome Res 19(5): 826-837.

Piperno DR, Weiss E, Holst I, Nadel D. 2004. Processing of wild cereal grains in the Upper

Palaeolithic revealed by starch grain analysis. Nature 430(7000): 670-673.

Prentice AM, Rayco-Solon P, Moore SE. 2005. Insights from the developing world: thrifty

genotypes and thrifty phenotypes. Proc Nutr Soc 64(2): 153-161.

Pritchard JK, Pickrell JK, Coop G. 2010. The genetics of human adaptation: hard sweeps, soft

sweeps, and polygenic adaptation. Curr Biol 20(4): R208-215.

31 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Prokopenko I, McCarthy MI, Lindgren CM. 2008. Type 2 diabetes: new genes, new understanding.

Trends Genet 24(12): 613-621.

Reich DE. 1998. Estimating the age of mutations using variation at linked markers. . In

Microsatellites: evolution and applications, pp. 129-138. Oxford University Press, Oxford.

Reich DE, Goldstein DB. 1999. Estimating the age of mutations using variation at linked markers.

In Microsatellites: Evolution and Applications, (ed. DB Goldstein, C Schlotterer), pp. 129-

138. Oxford University Press, Oxford.

Richards MP, Trinkaus E. 2009. Out of Africa: modern human origins special feature: isotopic

evidence for the diets of European Neanderthals and early modern humans. Proc Natl Acad

Sci U S A 106(38): 16034-16039.

Richerson PJ, Boyd R, Henrich J. 2010. Colloquium paper: gene-culture coevolution in the age of

genomics. Proc Natl Acad Sci U S A 107 Suppl 2: 8985-8992.

Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N,

Bamshad M et al. 2010. Analysis of genetic inheritance in a family quartet by whole-

genome sequencing. Science 328(5978): 636-639.

Rosenberg NA. 2006. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell

Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann

Hum Genet 70(Pt 6): 841-847.

Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV,

Patterson NJ, McDonald GJ et al. 2002. Detecting recent positive selection in the human

genome from haplotype structure. Nature 419(6909): 832-837.

32 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Sabeti PC Varilly P Fry B Lohmueller J Hostetter E Cotsapas C Xie X Byrne EH McCarroll SA

Gaudet R et al. 2007. Genome-wide detection and characterization of positive selection in

human populations. Nature 449(7164): 913-918.

Sabeti PC, Walsh E, Schaffner SF, Varilly P, Fry B, Hutcheson HB, Cullen M, Mikkelsen TS, Roy

J, Patterson N et al. 2005. The case for selection at CCR5-Delta32. PLoS Biol 3(11): e378.

Saxena R Hivert MF Langenberg C Tanaka T Pankow JS Vollenweider P Lyssenko V Bouatia-Naji

N Dupuis J Jackson AU et al. 2010. Genetic variation in GIPR influences the glucose and

insulin responses to an oral glucose challenge. Nat Genet 42(2): 142-148.

Schluter D. 2000. The Ecology of Adaptive Radiation Oxford Univ. Press, Oxford.

Seminara SB, Messager S, Chatzidaki EE, Thresher RR, Acierno JS, Jr., Shagoury JK, Bo-Abbas Y,

Kuohung W, Schwinof KM, Hendrick AG et al. 2003. The GPR54 gene as a regulator of

puberty. N Engl J Med 349(17): 1614-1627.

Semyonov J, Park JI, Chang CL, Hsu SY. 2008. GPCR genes are preferentially retained after whole

genome duplication. PLoS ONE 3(4): e1903.

Shiao MS, Liao BY, Long M, Yu HT. 2008. Adaptive evolution of the insulin two-gene system in

mouse. Genetics 178(3): 1683-1691.

Shimomura Y, Wajid M, Ishii Y, Shapiro L, Petukhova L, Gordon D, Christiano AM. 2008.

Disruption of P2RY5, an orphan -coupled receptor, underlies autosomal recessive

woolly hair. Nat Genet 40(3): 335-339.

Simonson TS, Yang Y, Huff CD, Yun H, Qin G, Witherspoon DJ, Bai Z, Lorenzo FR, Xing J, Jorde

LB et al. 2010. Genetic Evidence for High-Altitude Adaptation in Tibet. Science May 13.

Slatkin M, Rannala B. 2000. Estimating allele age. Annu Rev Genomics Hum Genet 1: 225-249.

Smith JM, Haigh J. 1974. The hitch-hiking effect of a favourable gene. Genet Res 23(1): 23-35.

33 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Stephens JC, Reich DE, Goldstein DB, Shin HD, Smith MW, Carrington M, Winkler C, Huttley

GA, Allikmets R, Schriml L et al. 1998. Dating the origin of the CCR5-Delta32 AIDS-

resistance allele by the coalescence of haplotypes. Am J Hum Genet 62(6): 1507-1515.

Sulem P, Gudbjartsson DF, Stacey SN, Helgason A, Rafnar T, Magnusson KP, Manolescu A,

Karason A, Palsson A, Thorleifsson G et al. 2007. Genetic determinants of hair, eye and

skin pigmentation in Europeans. Nat Genet 39(12): 1443-1452.

Takeda J, Seino Y, Tanaka K, Fukumoto H, Kayano T, Takahashi H, Mitani T, Kurono M, Suzuki

T, Tobe T et al. 1987. Sequence of an intestinal cDNA encoding human gastric inhibitory

polypeptide precursor. Proc Natl Acad Sci U S A 84(20): 7005-7008.

The International HapMap Project. 2003. The International HapMap Project. Nature 426(6968):

789-796.

Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS, Powell K, Mortensen

HM, Hirbo JB, Osman M et al. 2007. Convergent adaptation of human lactase persistence in

Africa and Europe. Nat Genet 39(1): 31-40.

Topaloglu AK, Reimann F, Guclu M, Yalin AS, Kotan LD, Porter KM, Serin A, Mungan NO, Cook

JR, Ozbek MN et al. 2009. TAC3 and TACR3 mutations in familial hypogonadotropic

hypogonadism reveal a key role for Neurokinin B in the central control of reproduction. Nat

Genet 41(3): 354-358.

Turner JR, Johnson MS, Eanes WF. 1979. Contrasted modes of evolution in the same genome:

allozymes and adaptive change in Heliconius. Proc Natl Acad Sci U S A 76(4): 1924-1928.

Vander Molen J, Frisse LM, Fullerton SM, Qian Y, Del Bosque-Plata L, Hudson RR, Di Rienzo A.

2005. Population genetics of CAPN10 and GPR35: implications for the evolution of type 2

diabetes variants. Am J Hum Genet 76(4): 548-560.

34 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent positive selection in the

human genome. PLoS Biol 4(3): e72.

Voight BF Scott LJ Steinthorsdottir V Morris AP Dina C Welch RP Zeggini E Huth C Aulchenko

YS Thorleifsson G et al. 2010. Twelve type 2 diabetes susceptibility loci identified through

large-scale association analysis. Nat Genet 42(7): 579-589.

Wang ET, Kodama G, Baldi P, Moyzis RK. 2006. Global landscape of recent inferred Darwinian

selection for Homo sapiens. Proc Natl Acad Sci U S A 103(1): 135-140.

Weir BS. 1996. Genetic data analysis II: methods for discrete population genetic data. Sinauer

Associates, Sunderland.

Yi X, Liang Y, Huerta-Sanchez E, Jin X, Cuo ZX, Pool JE, Xu X, Jiang H, Vinckenbosch N,

Korneliussen TS et al. 2010. Sequencing of 50 human exomes reveals adaptation to high

altitude. Science 329(5987): 75-78.

35 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Table 1. Human GIP SNP rs2291725 exhibits a high population differentiation characteristic in the International HapMap II Project dataset. Genotypes were analyzed as described using the

SPSmart v3 and Haplotter (Voight et al. 2006; Amigo et al. 2008). The average frequency of two alleles—derived allele GIP103C and ancestral chimpanzee allele GIP103T—at rs2291725 is approximately even in the overall HapMap population: only 5.0% of YRI chromosomes contain

GIP103C compared to 48.3% and 75.5% of CEU and ASN chromosomes that carry the derived allele, respectively.

Population Ch. rs2291725 rs2291725 FST

No. Allele Frequency Genotypes

GIP103T GIP103C T/T T/C C/C vs.

YRI CEU ASN

YRI 120 0.950 0.050 0.900 0.100 0.000 - 0.24 0.48*

CEU 120 0.517 0.483 0.267 0.500 0.233 0.24 - 0.08

ASN 178 0.245 0.755 0.090 0.314 0.595 0.48* 0.08 -

All 418 0.524 0.476

* p <0.05.

36 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Figure Legends

Fig. 1. Differential distribution of alleles at rs2291725 in human populations. A-C) The distribution of the FST for nonsynonymous SNPs across the human genome and the FST for rs2291725. FST estimations between ASN and YRI (A), YRI and CEU (B), and ASN and CEU (C) are shown in the x-axis. The red vertical bar indicates the corresponding values of FST for rs2291725. The y-axis represents the frequency of SNPs with a given FST estimate. Statistical significance for comparisons in (A), (B), and (C) is indicated with empirical p = 0.0039, 0.058, and

0.047, respectively. The number of coding SNPs that were analyzed in (A), (B), and (C) histograms were 48526, 48674, and 48075, respectively. Among these SNPs, the number of nonsynonymous

SNPs in (A), (B), and (C) histograms were 29033, 29026, and 28774, respectively. D) Distribution of rs2291725 in HGDP-CEPH (944 unrelated samples) populations. Pie charts represent the proportion of each genotype by geographic region. The ancestral GIP103T allele (black pie) occurs at a higher frequency in African and American populations whereas the majority of Eurasian populations have a higher frequency of the derived GIP103C allele (white pie).

Fig. 2. Evolution of haplotypes encompassing the GIP locus. A) Plots of the extent of haplotype homozygosity in the 1.0 Mb region surrounding rs2291725 in CEU, ASN, and YRI as assessed by

Haplotter (Voight et al. 2006). These plots are divided into two parts. The upper portion shows haplotypes with the ancestral GIP103T allele in blue, and the lower portion shows haplotypes with the derived GIP103C allele in red. Adjacent haplotypes with the same color carry identical genotypes spanning the region between a select SNP and rs2291725. B) Plots of the breakdown of EHH over the distance between rs2291725 and neighboring SNPs at increasing distances. EHH decays much slower at the derived allele (red) as compared to the ancestral allele (blue) in ASN and CEU. The position of rs2291725 at the center of the plots is indicated by a vertical dotted line. C) Evolution of

37 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

haplotypes within a 70-kb core haploblock in the three HapMap II populations. A total of 13 haplotypes were found in the core region from rs196241 to rs2291726 (37 SNPs at Ch. 17,

44325258-44394253). The frequencies of these haplotypes (GIP-A~M) in YRI, CEU, and ASN are shown in the lower left panel. There are 12 haplotypes in YRI whereas CEU and ASN are represented by 5 and 4 haplotypes, respectively. GIP-A is the only haplotype containing the derived

GIP103C allele and is indicated by a dotted box. An unrooted tree analysis of these haplotypes showed that the GIP-A haplotype diverged from others early in human evolution (GeneBee). The frequency of each haplotype in a select population is indicated by the size of the pie at the tip of each branch.

Fig. 3. An extended GIP peptide is expressed in human gut cells. A) Alignment of human GIP

(residues Y52 to K107) with corresponding residues from 17 other vertebrates showed that the GIP open reading frame contains alternative basic cleavage sites for the generation of multiple GIP isoforms in primates, dogs, cats, cows, pigs, and horses (upper panel). The mature peptide region is indicated by a dark horizontal bar above the alignment. Residues that are conserved from teleosts to humans are indicated by a gray background. Putative basic cleavage sites are indicated by a red background. The position of the variable residue 103 is indicated by an arrow. Alternative post- translational processing of proGIP could lead to the generation of a 42-amino-acid mature GIP and an extended 55-amino-acid isoform (GIP55G or GIP55S) that differ at position 52 (lower panel). B)

GIP and GIP55 peptides are co-localized in duodenum cells. Immunoreactive GIP and GIP55 were detected in select duodenum cells by immunofluorescent staining. The right panels showed the dark field and phased contrast images of merged immunofluorescent signals (x 800). C) GIP, GIP55G, and GIP55S suppressed exogenous glucose in fasting rats in vivo. Each of the three GIP peptides reduced glucose contents in the blood to basal levels at 1 hr after injection of the peptide and

38 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

glucose. Each data point represents the mean ± SEM of triplicate samples. Similar results were observed in five separate experiments.

Fig. 4. The variation at rs2291725 affects the bioactivity of translated products. A) GIP55G peptide is resistant to serum degradation. Treatments of GIP receptor-expressing HEK293T cells with GIP, GIP55G, or GIP55S led to dose-dependent increases of cAMP production (top left panel).

Receptor-activation activities of peptides were also analyzed following incubation with pooled normal human serum for 3, 6, or 12 hr. Cells were treated with synthetic peptides for 12 hr; the signaling is reported as total cAMP contents in cell lysates. Error bars represent SEM of triplicate samples. Significant differences in cAMP production between GIP and GIP55G treatments at a given peptide concentration are indicated by asterisks (p <0.01). In the control group, cells were treated with an aliquot of human serum without a synthetic peptide. Similar results were observed in three separate experiments. B) Comparison of the slopes of EC50 trend lines for GIP, GIP55G, and

GIP55S after treatments with pooled human serum for the indicated time-spans. The slope of the

GIP55G group is significantly different from that of the GIP group (*, p = 0.0023).

39 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Information

Supplementary Table 1. List of polypeptide and their cognate ligand genes.

Supplementary Table 2. List of coding SNPs with high FST ( > 0.5) in human GPCR and their cognate ligand genes.

Supplementary Table 3. List of coding SNPs with high FST ( > 0.5) in human nonGPCR receptor and ligand genes.

Supplementary Table 4. List of genotyped SNPs from 44246961 to 44542055 on chromosome

17. The HapMap II dataset was analyzed using HaploView. The 101 SNPs included in LD plots of

Supplementary Fig. 3 (A-C) are highlighted by a grey background. The 37 SNPs used in the haplotype analysis of Fig. 2C are indicated by red letters. SNPs that are linked with rs2291725 are indicated by bold red letters.

Supplementary Table 5. Allele frequency of rs2291725 in the HGDP-CEPH populations.

Frequencies of GIP103T and GIP103C alleles in each of the 52 populations from the seven geographical regions and the number of chromosomes analyzed for each population.

40 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Table 6. EC50 values for GIP receptor activation at four different time points after incubation with pooled human serum or pooled complement-preserved human serum

(N=4).

Supplementary Table 7. EC50 values for GIP receptor activation at three different time points after incubation with a recombinant DPP IV enzyme (N=4).

Supplementary Fig. 1. Cumulative distribution function (CDF) plots for the FST of coding

SNPs in human GPCRs and their cognate ligand genes (blue) and all other human genes

(magenta). The FST was computed between three HapMap II populations (CEU, YRI, and ASN), and coding SNPs have been split into synonymous and nonsynonymous groups. The CDF for synonymous and nonsynonymous SNPs were plotted separately because the CDF of the FST for nonsynonymous SNPs has been shown to shift toward the right as compared to that of synonymous

SNPs (Barreiro et al. 2008), perhaps due to a global increase of population differentiation at nonsynonymous sites.

Supplementary Fig. 2. CDF plots for the FST of coding SNPs in human GPCR and ligand genes (blue) and nonGPCR receptor and ligand genes (green). FST was computed between three

HapMap II populations (CEU, YRI, and ASN) and coding SNPs have been split into synonymous and nonsynonymous SNPs.

Supplementary Fig. 3. Plots of linkage disequilibrium (LD) between each pair of genotyped

SNPs in a 200-kb region surrounding rs2291725 (Ch. 17, 44325258-44495998) in CEU (A),

ASN (B), and YRI (C). The International HapMap II dataset was analyzed using HaploView

(Barrett et al. 2005). Putative LD block structures (defined by D) are indicated by black triangles. In

41 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

CEU and ASN, rs2291725 is positioned within a ~90-kb block with high LD whereas the link between rs2291725 and neighboring SNPs is negligible in YRI. Red squares represent regions with a high degree of LD and a high likelihood of odds (LOD; D' =1, LOD>2). Blue squares represent regions of high LD but low LOD (D' =1, LOD<2). Blue horizontal bars above LD plots represent gene regions of CALCOCO2, ATP5G1, UBE2Z, SNF8, GIP, and IGF2BP1. D) Alignments of 13 haploblock sequences (A-M) found in a 70-kb genomic region neighboring rs2291725 (Ch. 17,

44325258-44394253). Among the 37 SNPs in this region, 22 are linked with rs2291725 and are indicated by asterisks above the alignment. Sequences with an identical nucleotide are highlighted by a select color. The positions of individual SNPs are shown on the top, and SNPs that flank the region (rs1962412 and rs2291726) are indicated by arrows. The position of rs2291725 is indicated by a red box. The putative ancestral haplotype from chimpanzees is also shown at the bottom of the alignment.

Supplementary Fig. 4. Observed iHS for rs2291725 and the distribution of iHS under neutral evolution. A) The distribution of simulated iHS under three scenarios: hotspot, neutral, and bottleneck. The distribution of iHS is constructed using coalescent simulations. Observed iHS for rs2291725 in the HapMap ASN population is indicated with a red vertical line. The means of simulated iHS under the hotspot, neutral, and bottleneck scenarios are 0.1977, 0.2746, and 0.3694, respectively. Under all three scenarios, the result rejected the hypothesis that the observed iHS of rs2291725 is larger than the simulated result. The p-values for the hotspot, neutral, and bottleneck models are 8.0x10-4, 0, and 0, respectively. B) The hypothetical demographic history used to condition the effect of bottleneck on population expansion. We assumed that ASN population is one panmictic population that underwent a size reduction, followed by a period of constant size

42 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

(bottleneck), and then proceeded with population expansion (Gutenkunst et al. 2009). The population had an ancestral population size of N1 which at time T2 instantaneously shrank to size

N2. It remained constant at size N2 until time T1, at which point it began expanding exponentially until the present time. The population size at present time is N3.

Supplementary Fig. 5. GIP55G is resistant to degradation by pooled complement-preserved human serum. Receptor-activation activities of GIP, GIP55G, or GIP55S following incubation with pooled complement-preserved human serum for 3, 6, and 12 hr were analyzed in HEK293T cells expressing a recombinant human GIP receptor. Error bars represent SEM of triplicate samples.

Significant differences in cAMP production between GIP and GIP55G treatments at a given peptide concentration are indicated by asterisks (p <0.01).

Supplementary Fig. 6. Haplotypes within the LD block at the GIP locus in ASN and CEU. We defined LD blocks in a 250-kb region around the GIP locus using HaploView with default parameters. LD blocks in CEU and ASN are approx 91 and 97 kb, respectively. The triangle indicates the position of rs2291752. Haplotype frequencies are given on the right. The block with

GIP103C allele is in blue for ASN and in red for CEU. The ASN block contains eight haplotypes.

Four out of the eight haplotypes are linked with the derived GIP103C allele, and these haplotypes have been referred to as “GIP103C haplotypes.”

43 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Methods

Coalescent simulations of the integrated haplotype score (iHS)

To determine the distribution of iHS under neutral evolution, we adopted coalescent simulation parameters that were estimated based on the genomic region of GIP in the ASN

-7 population: sample size = 180, theta = 388, recombination rate = 9.7x10 cM/bp, N0 = 10,000, and the length of the region = 750 kb. Using the ms package (Hudson 2002), we generated 10,000 samples, in which every sample has the same number of segregating sites regardless of the size of the genealogy generated, using the -s option on the command line: ms 180 10000 -s 388 -r 2920

750000. In this case, r = 2920 because the cross-over probability between the two ends of the locus is (750000-1)* 9.7x10-7 = 0.73, and that r = 4*10000*0.73 = 29200. For the purpose of computational efficiency, we scaled r by 0.1 during simulations. This setting is conservative because a lower r value results in a larger variance of iHS (data not shown).

To take into account the potential effects of demography, we conducted the coalescent simulations under a bottleneck model as shown in Supplementary Fig. 4B. We assumed that ASN is a panmictic population that underwent a size reduction, followed by a period of constant size

(bottleneck), and proceeded with an exponential expansion. We assumed that the population sizes are N0 = N1 = 10,000; N2 = 5,000; and N3 = 20,000, and we used associated parameters that have been previously inferred (Gutenkunst et al. 2009). We also assumed that that T1 is 480 generations, or 480/(4*10,000) = 0.012 in units of 4N0 generations, and T2 is 920 generations or 0.023 in units of 4N0 generations. To specify the exponential growth on the command line, we calculated the growth parameter α by solving the equation, 5,000 = 20,000*exp(-α0.012), which is α = -(1/0.012)* log(5,000/20,000) = 115.52. Thus the command line to generate 10,000 samples with a size of 180 is: ms 180 10000 -s 388 -r 2920 75000 -G 115.52 -eG 0.012 0.0 -eN 0.023 1.

44 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

The run times for 10,000 replicates were 72 hr and 80 hr for the neutral and the bottleneck scenario, respectively (using a single core of a 2.2 GHz quad core AMD Opteron processor). We also attempted to take into account the effect of the presence of recombination hotspots on the distribution of iHS. We used msHOT (Hellenthal and Stephens 2007), a modified ms simulator that has incorporated crossover and gene conversion hotspots, to perform the simulation. In this simulation, we introduced a hotspot with intensity of 10 between the 350- and 400-kb region, and the command line is: msHOT 180 10000 -s 388 -r 2920 75000 –v 1 35000 40000 10.

45 Fig. 1

A ASN vs. YRI Downloaded from

D genome.cshlp.org Frequency onSeptember30,2021-Publishedby B YRI vs. CEU Frequency Cold SpringHarborLaboratoryPress

America Central-South Asia Africa C Oceania Middle East ASN vs. CEU East Europe Asia Frequency

A Mb 0.5 0.5 B Fig. 2 GIP103T Downloaded from

CEU EHH CEU GIP103C

GIP103T genome.cshlp.org

ASN EHH ASN GIP103C onSeptember30,2021-Publishedby

GIP103T YRI YRI EHH GIP103C rs2291725 rs2291725

C Cold SpringHarborLaboratoryPress YRI CEU ASN GIP-K GIP-G GIP-F GIP-A 0.042 0.475 0.739 GIP55G GIP D GIP-E GIP-B 0.242 0.325 0.128 GIP-J GIP-C 0.017 GIP-B GIP-D 0.046 0.100 0.094 GIP-I GIP-M GIP-E 0.019 GIP-L GIP-C GIP-H GIP-F 0.012 GIP55S GIP-G 0.005 GIP-H 0.420 0.042 0.011 GIP55S GIP-A GIP-I 0.063 YRI CEU GIP-J 0.042 ASN GIP-K 0.058 GIP-L 0.018 GIP-M 0.025 GIP55G Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

A Fig. 3

Y52 Q93 S103 K107 GIP

Human YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNDWKHNITQREARALELASQANRK--- 57 Chimpanzee YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNDWKHNITQREARALELASQANRK--- 57 Orangutan YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNEWKHNITQREARALELASQANRK--- 57 Gorilla YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNEWKHNITQREAQALELASQANRK--- 57 Rhesus monkey YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNDWKHNITQREARALELASQANRK--- 57 YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKNDWKHNITQREAGALELAHQSNRK--- 57 Cow YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKSDWIHNITQREAGALELAHQSNRK--- 57 Cat YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKNEWKHNITQREAGALELAHQSNRK--- 57 Pig YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKSEWKHNITQREARALELAHQSNRK--- 57 YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKNDWKHNITQREARALELTHQSNRK--- 57 Rat YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKNDWKHNLTQREARALELAGQSQRN--- 57 Mouse YAEGTFISDYSIAMDKIRQQDFVNWLLAQRGKKSDWKHNITQREARALVLAGQSQGK--- 57 Guinea pig YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKGDWSHNLTQRDTHNPELSRPSQER--- 57 Elephant YAEGTFISDYSIAMDKIRQQDFVNWLLAQKGKKNEWKHNITQREARALELAVQSNRN--- 57 Opossum YAEGTFISDYSITMDKIMQQDFVNWLLSQKGKKNSWRHNITERKADGTGLTNHSNWNLK- 59 YSEATLASDYSRTMDNMLKKNFVEWLLARREKKSDNVIEPYKREAEPQLSAVSDQSLDPR 60 Clawed Frog YSEAILASDYSRSVDNMLKKNFVDWLLARREKKSENTSEATKREADFQLPDVNMKEK--- 57 YAESTIASDISKIVDSMVQKNFVNFLLNQREKKSEPALTEDPESHIFNDLLKK------53

GIP55G and GIP55S

1 55 YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNDWKHNITQREARALELASQAN G

Posttranslational processing 55-amino-acid RXXR RK

S N-propeptide GIP C-propeptide RXXR RXXR 42-amino-acid

Posttranslational processing

1 42 YAEGTFISDYSIAMDKIHQQDFVNWLLAQKGKKNDWKHNITQ

GIP

Fig. 3

B C Downloaded from Rabbit Mouse anti- GIP55- GIP specific

antibody antibody Merged Phased genome.cshlp.org 500 GIP 400 onSeptember30,2021-Publishedby GIP55G 300 GIP55S

200

Glucose (mg/dl) 100 Cold SpringHarborLaboratoryPress 0 0 15 30 45 60 Time (minutes)

Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

A Fig. 4 175 175 150 GIP * 150 GIP GIP55G * GIP55G GIP55S GIP55S 100 100 * * 50 (nmol/well) 50

cAMP production 0 hr 3 hr 0 0

175 175 GIP GIP 150 150 GIP55G GIP55G GIP55S * * GIP55S 100 100

50 (nmol/well) 50 *

cAMP production 6 hr 12 hr 0 0 0 -10 -9 -8 -7 0 -10 -9 -8 -7 Log[ligand, M] Log[ligand, M]

10 B Slope GIP -0.121±0.015 GIP55G -0.051±0.012* , M] 9

50 GIP55S -0.121±0.013

8 Log[EC

7 0 3 6 9 12 Hr Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Fig. 1. Cumulative distribution function (CDF) plots for FST of coding SNPs in human GPCRs and their cognate ligand genes (blue) and the rest of human genes (magenta). FST is computed between three HapMap II populations (CEU, YRI, and ASN) and coding SNPs are split into synonymous and nonsynonymous SNPs.

Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Fig. 2. Cumulative distribution function (CDF) plots for FST of coding SNPs in human GPCR and ligand genes (blue) and nonGPCR receptor and ligand genes (green). FST is computed between three HapMap II populations (CEU, YRI, and ASN) and coding SNPs are split into synonymous and nonsynonymous SNPs. Downloaded from A rs2291725 CALCOCO2 ATP5G1 UBE2Z SNF8 GIP IGF2BP1 genome.cshlp.org CEU onSeptember30,2021-Publishedby Cold SpringHarborLaboratoryPress 200 kb

Sup. Fig. 3 Downloaded from B rs2291725 CALCOCO2 ATP5G1 UBE2Z SNF8 GIP IGF2BP1 genome.cshlp.org ASN onSeptember30,2021-Publishedby Cold SpringHarborLaboratoryPress

200 kb

Sup. Fig. 3 C rs2291725 Downloaded from CALCOCO2 ATP5G1 UBE2Z SNF8 GIP IGF2BP1 genome.cshlp.org YRI onSeptember30,2021-Publishedby

200 kb Cold SpringHarborLaboratoryPress

Sup. Fig. 3 D 1 5 10 15 20 25 30 35 ** * * ** * ** * ************ * Downloaded from GIP A CGCGCAGGTAACTGGGGGGCGCGCTCGCCTACTCTCT GIP B TATATGAACGGGCAAATGGTATCTCTAATCGGCACTC GIP C TGTATGAACGGGCAAATGGTATCTCTAATCGGCACTT genome.cshlp.org GIP D CGTACGAGCGGGCAGATAGCACCTCTAATCGGCATTC

GIP E TGTATGAGCGGGCAGATAGCACCTCTAATCGGCATTC onSeptember30,2021-Publishedby GIP F CGTACGAGCGGGCAGATAGCATCTCTAATCGGCATTT GIP G TGTATGAGCGGGCAGATAGCATCTCTAATCGGCATTT GIP H CGTACGGGCAGGTAGATGACACCTCTAATCGGCATTC GIP I TGTACGGGCAGGTAGATGACACCTCTAATCGGCATTC GIP J CGTACGGGCAGGCAGATGGCACCTCTAATCGGCATTC Cold SpringHarborLaboratoryPress GIP K CGTACGGGCAGGCAGATGGCACCTCTAATCGGCATTT GIP L TGTATGGGCAGGCAGATGGCACCTCTAATCGGCATTC GIP M TGTATGGGCAGGCAGATGGCACCTCTAATCGGCATTT

Ancestral CGCGCGGGCAAGTA-GGGGCACCCCCGACCGGCATTC rs1962412 rs2291726 Sup. Fig. 3 A B Downloaded from genome.cshlp.org onSeptember30,2021-Publishedby Frequency Cold SpringHarborLaboratoryPress

iHS

Sup. Fig. 4 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

250 GIP 200 GIP55G * 150 GIP55S 100 *

(nmol/well) * 50 cAMP production 0 0 hr 250 GIP 200 GIP55G 150 GIP55S 100 (nmol/well) 50 cAMP production 3 hr 0

250 GIP 200 GIP55G 150 GIP55S * 100 *

(nmol/well) * 50 cAMP production 6 hr 0 0 -10 -9 -8 -7 Log[ligand, M] Sup. Fig. 5 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

250 GIP 200 GIP55G 150 GIP55S * 100 * (nmol/well) 50 cAMP production 12 hr 0 0 -10 -9 -8 -7 Log[ligand, M]

Sup. Fig. 5 ASN LD block (91 kb) Downloaded from genome.cshlp.org onSeptember30,2021-Publishedby

CEU LD block (97 kb) Cold SpringHarborLaboratoryPress

Sup. Fig. 6 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Table 1. List of polypeptide cell surface receptors and their cognate ligand genes that were analyzed.

Gene category Gene Name Gene category Gene Name GPCR ADCYAP1R1 nonGPCR ACVR1 GPCR ADORA1 nonGPCR ACVR1B GPCR ADORA2A nonGPCR ACVR1C GPCR ADORA2B nonGPCR ACVR2A GPCR ADORA3 nonGPCR ACVR2B GPCR ADRA1A nonGPCR ACVRL1 GPCR ADRA1B nonGPCR ALK GPCR ADRA1D nonGPCR AMH GPCR ADRA2A nonGPCR AMHR2 GPCR ADRA2B nonGPCR ANGPT1 GPCR ADRA2C nonGPCR ANGPT2 GPCR ADRB1 nonGPCR ANGPTL4 GPCR ADRB2 nonGPCR AREG GPCR ADRB3 nonGPCR ARTN GPCR AGTR1 nonGPCR AXL GPCR AGTR2 nonGPCR BAMBI GPCR APLNR nonGPCR BMP10 GPCR AVPR1A nonGPCR BMP15 GPCR AVPR1B nonGPCR BMP2 GPCR AVPR2 nonGPCR BMP4 GPCR BAI1 nonGPCR BMP6 GPCR BAI2 nonGPCR BMP7 GPCR BAI3 nonGPCR BMPR1A GPCR BDKRB1 nonGPCR BMPR1B GPCR BDKRB2 nonGPCR BMPR2 GPCR BRS3 nonGPCR BOC GPCR C3AR1 nonGPCR BTC GPCR C5AR1 nonGPCR CD40 GPCR CALCR nonGPCR CD40LG GPCR CALCRL nonGPCR CDON GPCR CASR nonGPCR CER1 GPCR CCBP2 nonGPCR CNTFR GPCR CCKAR nonGPCR CRLF1 GPCR CCKBR nonGPCR CSF1R GPCR CCR1 nonGPCR CSF2RA GPCR CCR10 nonGPCR CSF2RB GPCR CCR2 nonGPCR CSF3R GPCR CCR3 nonGPCR DCC GPCR CCR4 nonGPCR DDR2 GPCR CCR5 nonGPCR DHH GPCR CCR6 nonGPCR DLL1 GPCR CCR7 nonGPCR DLL3 GPCR CCR8 nonGPCR DLL4 GPCR CCR9 nonGPCR EDA GPCR CCRL1 nonGPCR EDA2R GPCR CCRL2 nonGPCR EDAR GPCR CD97 nonGPCR EFNA1 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR CELSR1 nonGPCR EFNA2 GPCR CELSR2 nonGPCR EFNA3 GPCR CELSR3 nonGPCR EFNA4 GPCR CHRM1 nonGPCR EFNA5 GPCR CHRM2 nonGPCR EFNB1 GPCR CHRM3 nonGPCR EFNB2 GPCR CHRM5 nonGPCR EFNB3 GPCR CMKLR1 nonGPCR EGF GPCR CNR1 nonGPCR EGFR GPCR CNR2 nonGPCR EPHA1 GPCR CRHR1 nonGPCR EPHA2 GPCR CRHR2 nonGPCR EPHA3 GPCR CX3CR1 nonGPCR EPHA4 GPCR CXCR1 nonGPCR EPHA5 GPCR CXCR2 nonGPCR EPHA6 GPCR CXCR3 nonGPCR EPHA7 GPCR CXCR4 nonGPCR EPHA8 GPCR CXCR5 nonGPCR EPHB1 GPCR CXCR6 nonGPCR EPHB2 GPCR CXCR7 nonGPCR EPHB3 GPCR CYSLTR1 nonGPCR EPHB4 GPCR CYSLTR2 nonGPCR EPHB6 GPCR DARC nonGPCR EPOR GPCR DRD1 nonGPCR ERBB2 GPCR DRD2 nonGPCR ERBB3 GPCR DRD3 nonGPCR ERBB4 GPCR DRD4 nonGPCR EREG GPCR DRD5 nonGPCR FAS GPCR EDNRA nonGPCR FASLG GPCR EDNRB nonGPCR FGF1 GPCR ELTD1 nonGPCR FGF10 GPCR EMR1 nonGPCR FGF11 GPCR EMR2 nonGPCR FGF12 GPCR EMR3 nonGPCR FGF13 GPCR EMR4P nonGPCR FGF14 GPCR F2R nonGPCR FGF16 GPCR F2RL1 nonGPCR FGF17 GPCR F2RL2 nonGPCR FGF18 GPCR F2RL3 nonGPCR FGF19 GPCR FFAR1 nonGPCR FGF2 GPCR FFAR2 nonGPCR FGF20 GPCR FFAR3 nonGPCR FGF21 GPCR FPR1 nonGPCR FGF22 GPCR FPR2 nonGPCR FGF23 GPCR FPR3 nonGPCR FGF3 GPCR FSHR nonGPCR FGF4 GPCR FZD1 nonGPCR FGF5 GPCR FZD10 nonGPCR FGF6 GPCR FZD2 nonGPCR FGF7 GPCR FZD3 nonGPCR FGF8 GPCR FZD4 nonGPCR FGF9 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR FZD5 nonGPCR FGFR1 GPCR FZD6 nonGPCR FGFR2 GPCR FZD7 nonGPCR FGFR3 GPCR FZD8 nonGPCR FGFR4 GPCR FZD9 nonGPCR FGFRL1 GPCR GABBR1 nonGPCR FIGF GPCR GALR1 nonGPCR FLT1 GPCR GALR2 nonGPCR FLT3 GPCR GALR3 nonGPCR FLT4 GPCR GHRHR nonGPCR GDF1 GPCR GHSR nonGPCR GDF10 GPCR GIPR nonGPCR GDF11 GPCR GLP1R nonGPCR GDF15 GPCR GLP2R nonGPCR GDF2 GPCR GNRHR nonGPCR GDF3 GPCR GNRHR2 nonGPCR GDF5 GPCR GPER nonGPCR GDF6 GPCR GPR1 nonGPCR GDF7 GPCR GPR101 nonGPCR GDF9 GPCR GPR109A nonGPCR GDNF GPCR GPR109B nonGPCR GHR GPCR GPR110 nonGPCR GREM1 GPCR GPR111 nonGPCR GREM2 GPCR GPR112 nonGPCR GUCA2A GPCR GPR113 nonGPCR GUCA2B GPCR GPR114 nonGPCR GUCY2C GPCR GPR115 nonGPCR GUCY2D GPCR GPR116 nonGPCR GUCY2F GPCR GPR119 nonGPCR HBEGF GPCR GPR12 nonGPCR IFNA1 GPCR GPR120 nonGPCR IFNAR1 GPCR GPR123 nonGPCR IFNAR2 GPCR GPR124 nonGPCR IFNB1 GPCR GPR125 nonGPCR IFNG GPCR GPR126 nonGPCR IFNGR1 GPCR GPR128 nonGPCR IFNGR2 GPCR GPR132 nonGPCR IFNW1 GPCR GPR133 nonGPCR IGF1 GPCR GPR135 nonGPCR IGF1R GPCR GPR139 nonGPCR IGF2 GPCR GPR141 nonGPCR IHH GPCR GPR142 nonGPCR IL10 GPCR GPR144 nonGPCR IL10RA GPCR GPR146 nonGPCR IL10RB GPCR GPR15 nonGPCR IL11 GPCR GPR150 nonGPCR IL11RA GPCR GPR151 nonGPCR IL12A GPCR GPR152 nonGPCR IL12B GPCR GPR160 nonGPCR IL12RB1 GPCR GPR161 nonGPCR IL12RB2 GPCR GPR17 nonGPCR IL13 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR GPR171 nonGPCR IL13RA1 GPCR GPR173 nonGPCR IL13RA2 GPCR GPR174 nonGPCR IL15 GPCR GPR176 nonGPCR IL15RA GPCR GPR18 nonGPCR IL17A GPCR GPR182 nonGPCR IL17B GPCR GPR183 nonGPCR IL17RA GPCR GPR19 nonGPCR IL17RB GPCR GPR20 nonGPCR IL17RC GPCR GPR21 nonGPCR IL18 GPCR GPR22 nonGPCR IL18R1 GPCR GPR25 nonGPCR IL18RAP GPCR GPR26 nonGPCR IL19 GPCR GPR27 nonGPCR IL1A GPCR GPR3 nonGPCR IL1B GPCR GPR31 nonGPCR IL1R1 GPCR GPR32 nonGPCR IL1R2 GPCR GPR34 nonGPCR IL1RAP GPCR GPR35 nonGPCR IL1RAPL1 GPCR GPR37 nonGPCR IL1RL1 GPCR GPR37L1 nonGPCR IL2 GPCR GPR39 nonGPCR IL20 GPCR GPR4 nonGPCR IL20RA GPCR GPR42 nonGPCR IL20RB GPCR GPR44 nonGPCR IL21 GPCR GPR45 nonGPCR IL21R GPCR GPR50 nonGPCR IL22 GPCR GPR51 nonGPCR IL22RA1 GPCR GPR52 nonGPCR IL22RA2 GPCR GPR55 nonGPCR IL23A GPCR GPR56 nonGPCR IL23R GPCR GPR6 nonGPCR IL24 GPCR GPR61 nonGPCR IL25 GPCR GPR62 nonGPCR IL28A GPCR GPR63 nonGPCR IL28B GPCR GPR64 nonGPCR IL28RA GPCR GPR65 nonGPCR IL29 GPCR GPR68 nonGPCR IL2RA GPCR GPR73 nonGPCR IL2RB GPCR GPR75 nonGPCR IL2RG GPCR GPR77 nonGPCR IL3 GPCR GPR78 nonGPCR IL31 GPCR GPR81 nonGPCR IL31RA GPCR GPR82 nonGPCR IL3RA GPCR GPR83 nonGPCR IL4 GPCR GPR84 nonGPCR IL4R GPCR GPR85 nonGPCR IL5 GPCR GPR87 nonGPCR IL5RA GPCR GPR88 nonGPCR IL6 GPCR GPR97 nonGPCR IL6R GPCR GPR98 nonGPCR IL6ST Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR GPRC5A nonGPCR IL7 GPCR GPRC5B nonGPCR IL7R GPCR GPRC5C nonGPCR IL9 GPCR GPRC5D nonGPCR IL9R GPCR GPRC6A nonGPCR INHA GPCR GRM1 nonGPCR INHBA GPCR GRM2 nonGPCR INHBB GPCR GRM3 nonGPCR INHBC GPCR GRM4 nonGPCR INS GPCR GRM5 nonGPCR INSR GPCR GRM6 nonGPCR INSRR GPCR GRM7 nonGPCR JAG1 GPCR GRM8 nonGPCR JAG2 GPCR GRPR nonGPCR KDR GPCR HCRTR1 nonGPCR KIT GPCR HCRTR2 nonGPCR KITLG GPCR HRH1 nonGPCR LDLR GPCR HRH2 nonGPCR LEPR GPCR HRH3 nonGPCR LIFR GPCR HRH4 nonGPCR LRP1 GPCR HTR1A nonGPCR LRP10 GPCR HTR1B nonGPCR LRP11 GPCR HTR1D nonGPCR LRP12 GPCR HTR1E nonGPCR LRP2 GPCR HTR1F nonGPCR LRP3 GPCR HTR2A nonGPCR LRP5 GPCR HTR2B nonGPCR LRP6 GPCR HTR2C nonGPCR LRP8 GPCR HTR4 nonGPCR LTBR GPCR HTR5A nonGPCR LTK GPCR HTR6 nonGPCR MERTK GPCR HTR7 nonGPCR MET GPCR KISS1R nonGPCR MPL GPCR LGR4 nonGPCR MST1R GPCR LGR5 nonGPCR MUSK GPCR LGR6 nonGPCR NBL1 GPCR LHCGR nonGPCR NDP GPCR LPAR1 nonGPCR NEO1 GPCR LPAR2 nonGPCR NGFR GPCR LPAR3 nonGPCR NLGN1 GPCR LPAR4 nonGPCR NLGN2 GPCR LPAR5 nonGPCR NLGN3 GPCR LPAR6 nonGPCR NLGN4X GPCR LPHN1 nonGPCR NODAL GPCR LPHN2 nonGPCR NOTCH1 GPCR LPHN3 nonGPCR NOTCH2 GPCR LTB4R nonGPCR NOTCH3 GPCR LTB4R2 nonGPCR NPPA GPCR MAS1 nonGPCR NPPB GPCR MC1R nonGPCR NPPC GPCR MC2R nonGPCR NPR1 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR MC3R nonGPCR NPR2 GPCR MC4R nonGPCR NPR3 GPCR MC5R nonGPCR NRG1 GPCR MCHR1 nonGPCR NRG2 GPCR MCHR2 nonGPCR NRG3 GPCR MLNR nonGPCR NRTN GPCR MRGPRD nonGPCR NTN1 GPCR MRGPRE nonGPCR NTN4 GPCR MRGPRF nonGPCR NTNG1 GPCR MRGPRG nonGPCR NTNG2 GPCR MRGPRX1 nonGPCR NTRK1 GPCR MRGPRX2 nonGPCR NTRK2 GPCR MRGPRX3 nonGPCR NTRK3 GPCR MRGPRX4 nonGPCR NXPH1 GPCR MTNR1A nonGPCR NXPH2 GPCR MTNR1B nonGPCR NXPH3 GPCR NMBR nonGPCR NXPH4 GPCR NMUR1 nonGPCR OSMR GPCR NMUR2 nonGPCR PDGFA GPCR NPBWR1 nonGPCR PDGFB GPCR NPBWR2 nonGPCR PDGFC GPCR NPFFR1 nonGPCR PDGFD GPCR NPFFR2 nonGPCR PDGFRA GPCR NPSR1 nonGPCR PDGFRB GPCR NPY1R nonGPCR PGF GPCR NPY2R nonGPCR PRLR GPCR NPY5R nonGPCR PSPN GPCR NTSR1 nonGPCR PTCH2 GPCR NTSR2 nonGPCR PTK7 GPCR OPN1LW nonGPCR PTPRA GPCR OPN1MW nonGPCR PTPRD GPCR OPN1SW nonGPCR PTPRE GPCR OPN3 nonGPCR PTPRF GPCR OPN4 nonGPCR PTPRG GPCR OPN5 nonGPCR PTPRH GPCR OPRD1 nonGPCR PTPRJ GPCR OPRK1 nonGPCR PTPRK GPCR OPRL1 nonGPCR PTPRM GPCR OPRM1 nonGPCR PTPRN GPCR OXER1 nonGPCR PTPRN2 GPCR OXGR1 nonGPCR PTPRO GPCR OXTR nonGPCR PTPRS GPCR P2RY1 nonGPCR PTPRT GPCR P2RY10 nonGPCR PTPRU GPCR P2RY12 nonGPCR PTPRZ1 GPCR P2RY13 nonGPCR RASSF8 GPCR P2RY14 nonGPCR RET GPCR P2RY2 nonGPCR ROBO1 GPCR P2RY4 nonGPCR ROBO2 GPCR P2RY6 nonGPCR ROBO3 GPCR P2RY8 nonGPCR ROBO4 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR PPYR1 nonGPCR ROR1 GPCR PRLHR nonGPCR ROR2 GPCR PROKR2 nonGPCR ROS1 GPCR PTAFR nonGPCR RYK GPCR PTGDR nonGPCR SHH GPCR PTGER1 nonGPCR SLIT1 GPCR PTGER2 nonGPCR SLIT2 GPCR PTGER3 nonGPCR SLIT3 GPCR PTGER4 nonGPCR SORL1 GPCR PTGFR nonGPCR SOST GPCR PTGIR nonGPCR SOSTDC1 GPCR PTH1R nonGPCR TDGF1 GPCR PTH2R nonGPCR TEK GPCR QRFPR nonGPCR TFG GPCR RGR nonGPCR TGFA GPCR RHO nonGPCR TGFB1 GPCR RRH nonGPCR TGFB2 GPCR RXFP1 nonGPCR TGFB3 GPCR RXFP2 nonGPCR TGFBR1 GPCR RXFP3 nonGPCR TGFBR2 GPCR RXFP4 nonGPCR TIE1 GPCR S1PR1 nonGPCR TLR1 GPCR S1PR2 nonGPCR TLR10 GPCR S1PR3 nonGPCR TLR2 GPCR S1PR4 nonGPCR TLR3 GPCR S1PR5 nonGPCR TLR4 GPCR SCTR nonGPCR TLR5 GPCR SMO nonGPCR TLR6 GPCR SSTR1 nonGPCR TLR7 GPCR SSTR2 nonGPCR TLR8 GPCR SSTR3 nonGPCR TLR9 GPCR SSTR4 nonGPCR TNFRSF10A GPCR SSTR5 nonGPCR TNFRSF10B GPCR SUCNR1 nonGPCR TNFRSF10C GPCR TAAR1 nonGPCR TNFRSF10D GPCR TAAR2 nonGPCR TNFRSF11A GPCR TAAR5 nonGPCR TNFRSF11B GPCR TAAR6 nonGPCR TNFRSF12A GPCR TAAR8 nonGPCR TNFRSF13B GPCR TACR1 nonGPCR TNFRSF13C GPCR TACR2 nonGPCR TNFRSF14 GPCR TACR3 nonGPCR TNFRSF17 GPCR TBXA2R nonGPCR TNFRSF18 GPCR TRHR nonGPCR TNFRSF19 GPCR TSHR nonGPCR TNFRSF1A GPCR UTS2R nonGPCR TNFRSF1B GPCR VIPR1 nonGPCR TNFRSF21 GPCR VIPR2 nonGPCR TNFRSF25 GPCR XCR1 nonGPCR TNFRSF4 GPCR ligand ADCYAP1 nonGPCR TNFRSF6B GPCR ligand ADM nonGPCR TNFRSF8 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR ligand ADM2 nonGPCR TNFRSF9 GPCR ligand AGT nonGPCR TNFSF10 GPCR ligand APLN nonGPCR TNFSF11 GPCR ligand AVP nonGPCR TNFSF12 GPCR ligand CALCA nonGPCR TNFSF13 GPCR ligand CCK nonGPCR TNFSF13B GPCR ligand CCL1 nonGPCR TNFSF14 GPCR ligand CCL11 nonGPCR TNFSF15 GPCR ligand CCL13 nonGPCR TNFSF18 GPCR ligand CCL14 nonGPCR TNFSF4 GPCR ligand CCL15 nonGPCR TNFSF8 GPCR ligand CCL16 nonGPCR TNFSF9 GPCR ligand CCL17 nonGPCR TRAF2 GPCR ligand CCL18 nonGPCR TRAF3 GPCR ligand CCL19 nonGPCR TRAF5 GPCR ligand CCL2 nonGPCR TYRO3 GPCR ligand CCL20 nonGPCR UNC5A GPCR ligand CCL21 nonGPCR UNC5B GPCR ligand CCL22 nonGPCR UNC5C GPCR ligand CCL23 nonGPCR UNC5D GPCR ligand CCL24 nonGPCR VEGFC GPCR ligand CCL26 nonGPCR VLDLR GPCR ligand CCL27 nonGPCR WNT1 GPCR ligand CCL3 nonGPCR WNT11 GPCR ligand CCL4 nonGPCR WNT3 GPCR ligand CCL5 nonGPCR WNT4 GPCR ligand CCL7 nonGPCR WNT5A GPCR ligand CCL8 nonGPCR WNT7A GPCR ligand CGA nonGPCR WNT7B GPCR ligand CRH GPCR ligand CXCL1 GPCR ligand CXCL12 GPCR ligand CXCL13 GPCR ligand CXCL14 GPCR ligand CXCL16 GPCR ligand CXCL2 GPCR ligand CXCL3 GPCR ligand CXCL5 GPCR ligand CXCL6 GPCR ligand CXCL9 GPCR ligand EDN1 GPCR ligand EDN2 GPCR ligand EDN3 GPCR ligand FSHB GPCR ligand GAL GPCR ligand GALP GPCR ligand GAST GPCR ligand GCG GPCR ligand GHRH GPCR ligand GHRL GPCR ligand GIP Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR ligand GNRH1 GPCR ligand GNRH2 GPCR ligand GPHB5 GPCR ligand GRP GPCR ligand HCRT GPCR ligand IAPP GPCR ligand IL8 GPCR ligand INSL5 GPCR ligand KISS1 GPCR ligand KNG1 GPCR ligand LHB GPCR ligand MLN GPCR ligand NMB GPCR ligand NMS GPCR ligand NMU GPCR ligand NPB GPCR ligand NPFF GPCR ligand NPS GPCR ligand NPVF GPCR ligand NPW GPCR ligand NPY GPCR ligand NTS GPCR ligand OXT GPCR ligand PDYN GPCR ligand PENK GPCR ligand PF4V1 GPCR ligand PMCH GPCR ligand PNOC GPCR ligand POMC GPCR ligand PPBP GPCR ligand PRLH GPCR ligand PROK1 GPCR ligand PROK2 GPCR ligand PTH GPCR ligand PTH2 GPCR ligand PTHLH GPCR ligand PYY GPCR ligand QRFP GPCR ligand RLN2 GPCR ligand RLN3 GPCR ligand SCT GPCR ligand SST GPCR ligand TAC1 GPCR ligand TAC3 GPCR ligand TAC4 GPCR ligand TRH GPCR ligand TSHB GPCR ligand UCN GPCR ligand UCN2 GPCR ligand UCN3 GPCR ligand UTS2 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

GPCR ligand UTS2D GPCR ligand VIP Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Table 2. List of coding SNPs with high FST ( > 0.5) in human GPCRs and their cognate ligand genes.

Fst chr position rsid gene group 0.992 4 9393463 rs2227852 DRD5 ceu_asn_nsy 0.992 4 9393463 rs2227852 DRD5 ceu_yri_nsy 0.908 1 157441978 rs12075 DARC asn_yri_nsy 0.834 22 45140161 rs12165943 CELSR1 asn_yri_nsy 0.776 12 12952561 rs2075288 GPRC5A asn_yri_syn 0.759 17 31364397 rs1003645 CCL23 ceu_yri_nsy 0.758 22 45166358 rs6008794 CELSR1 asn_yri_nsy 0.749 22 45164979 rs4044210 CELSR1 asn_yri_nsy 0.735 2 237154643 rs1045879 CXCR7 asn_yri_syn 0.731 7 124174610 rs724356 GPR37 asn_yri_syn 0.720 17 44394131 rs2291725 GIP asn_yri_nsy 0.704 16 88513655 rs885479 MC1R asn_yri_nsy 0.697 22 45142335 rs7285515 CELSR1 asn_yri_syn 0.689 10 86002693 rs1042454 RGR asn_yri_syn 0.676 22 45159185 rs6007897 CELSR1 asn_yri_nsy 0.649 9 132758990 rs3847193 QRFP asn_yri_syn 0.645 10 85994853 rs2279227 RGR ceu_yri_syn 0.631 19 6877378 rs4807916 EMR1 asn_yri_nsy 0.628 9 100380122 rs2808536 GPR51 asn_yri_syn 0.628 10 85994853 rs2279227 RGR asn_yri_syn 0.613 8 54304710 rs702764 OPRK1 asn_yri_syn 0.612 6 146534159 rs9485052 GRM1 ceu_yri_nsy 0.603 2 48774879 rs2293275 LHCGR asn_yri_nsy 0.600 19 6848464 rs330880 EMR1 ceu_yri_nsy 0.596 10 86002693 rs1042454 RGR ceu_yri_syn 0.593 2 48769375 rs11125179 LHCGR asn_yri_syn 0.592 19 3546794 rs4523 TBXA2R asn_yri_syn 0.589 16 88513655 rs885479 MC1R ceu_asn_nsy 0.586 7 92893689 rs1801197 CALCR ceu_asn_nsy 0.576 12 107210138 rs1057401 CMKLR1 asn_yri_syn 0.570 6 70127894 rs913543 BAI3 asn_yri_syn 0.562 22 45140161 rs12165943 CELSR1 ceu_yri_nsy 0.557 12 130188688 rs7302341 GPR133 asn_yri_syn 0.555 3 45962984 rs2234355 CXCR6 asn_yri_nsy 0.549 2 132891234 rs2241764 GPR39 ceu_yri_nsy 0.548 8 54304710 rs702764 OPRK1 ceu_yri_syn 0.543 2 231483541 rs1992186 GPR55 asn_yri_syn Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

0.543 6 146534159 rs9485052 GRM1 asn_yri_nsy 0.539 6 46940772 rs614826 GPR116 ceu_yri_syn 0.539 3 45962984 rs2234355 CXCR6 ceu_yri_nsy 0.537 3 115373505 rs6280 DRD3 asn_yri_nsy 0.535 17 31364397 rs1003645 CCL23 asn_yri_nsy 0.532 11 66975485 rs949252 GPR152 asn_yri_syn 0.529 22 45142335 rs7285515 CELSR1 ceu_yri_syn 0.528 11 66975485 rs949252 GPR152 ceu_yri_syn 0.520 11 18115569 rs12291017 MRGPRX3 asn_yri_nsy 0.518 11 66976751 rs1638560 GPR152 ceu_yri_syn 0.513 7 158543995 rs2098349 VIPR2 asn_yri_syn 0.513 2 100463392 rs1519654 NMS ceu_asn_syn

Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Table 3. List of coding SNPs with high FST ( > 0.5) in human nonGPCR receptor and ligand genes.

Fst chr position rsid gene group 0.874 2 108880033 rs3827760 EDAR ceu_asn_nsy 0.874 2 108880033 rs3827760 EDAR asn_yri_nsy 0.841 9 14712477 rs3747532 CER1 asn_yri_nsy 0.769 16 27281901 rs1801275 IL4R asn_yri_nsy 0.769 19 6326831 rs2304198 PSPN ceu_yri_syn 0.749 8 32572900 rs3924999 NRG1 asn_yri_nsy 0.748 19 60571684 rs1126757 IL11 asn_yri_syn 0.738 4 96475639 rs4699423 UNC5C asn_yri_syn 0.731 8 23115269 rs4871857 TNFRSF10A asn_yri_nsy 0.723 8 23116201 rs6557634 TNFRSF10A asn_yri_nsy 0.706 17 35137563 rs1058808 ERBB2 ceu_yri_nsy 0.692 13 27781061 rs7993418 FLT1 asn_yri_syn 0.686 14 102411802 rs1131877 TRAF3 ceu_yri_nsy 0.683 16 27281473 rs2234900 IL4R asn_yri_syn 0.676 1 155115619 rs6337 NTRK1 ceu_asn_syn 0.661 19 44480955 rs30461 IL29 asn_yri_nsy 0.661 12 14685329 rs10772800 GUCY2C asn_yri_syn 0.659 12 14685329 rs10772800 GUCY2C ceu_yri_syn 0.657 1 65809029 rs1137100 LEPR asn_yri_nsy 0.630 16 27281901 rs1801275 IL4R ceu_yri_nsy 0.625 2 29794033 rs2246745 ALK asn_yri_syn 0.618 16 27281473 rs2234900 IL4R ceu_yri_syn 0.606 2 169711678 rs4667591 LRP2 ceu_yri_nsy 0.606 3 53858762 rs1025689 IL17RB ceu_yri_syn 0.605 1 53565239 rs3820198 LRP8 ceu_yri_nsy 0.576 19 44480955 rs30461 IL29 ceu_yri_nsy 0.574 3 38499746 rs1046048 ACVR2B ceu_yri_syn 0.571 6 137367540 rs1555498 IL20RA asn_yri_nsy 0.569 2 102334507 rs4988957 IL1RL1 asn_yri_syn 0.569 2 102334717 rs4988958 IL1RL1 asn_yri_syn 0.569 2 102334788 rs10192157 IL1RL1 asn_yri_nsy 0.569 2 102334794 rs10206753 IL1RL1 asn_yri_nsy 0.569 2 102334439 rs4988956 IL1RL1 asn_yri_nsy 0.559 4 96325345 rs2289043 UNC5C ceu_yri_nsy 0.558 8 23115269 rs4871857 TNFRSF10A ceu_asn_nsy 0.558 8 23116201 rs6557634 TNFRSF10A ceu_asn_nsy 0.557 2 102334644 rs10204137 IL1RL1 asn_yri_nsy Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

0.555 1 53565239 rs3820198 LRP8 asn_yri_nsy 0.555 16 27281373 rs1805011 IL4R asn_yri_nsy 0.554 9 138517528 rs10521 NOTCH1 ceu_asn_syn 0.547 1 155115619 rs6337 NTRK1 ceu_yri_syn 0.541 20 10568386 rs1051419 JAG1 asn_yri_syn 0.541 6 137367540 rs1555498 IL20RA ceu_yri_nsy 0.539 8 35703234 rs6983275 UNC5D ceu_yri_syn 0.537 20 61796554 rs3208008 TNFRSF6B asn_yri_nsy 0.537 3 136153214 rs7644369 EPHB1 asn_yri_syn 0.535 8 35703357 rs6996645 UNC5D ceu_yri_syn 0.530 11 48122843 rs4752904 PTPRJ ceu_yri_nsy 0.529 1 29503042 rs2295061 PTPRU ceu_asn_syn 0.527 16 27281416 rs2234898 IL4R asn_yri_syn 0.523 7 116223258 rs2023748 MET asn_yri_syn 0.523 7 116223004 rs41736 MET asn_yri_syn 0.523 7 116223333 rs41737 MET asn_yri_syn 0.523 2 108880086 rs12623957 EDAR asn_yri_syn 0.520 3 13871285 rs12639607 WNT7A asn_yri_syn 0.514 2 29794033 rs2246745 ALK ceu_yri_syn 0.514 17 7259120 rs2241233 NLGN2 asn_yri_syn 0.508 14 102411802 rs1131877 TRAF3 asn_yri_nsy 0.507 20 2893759 rs1178027 PTPRA ceu_yri_nsy

Supplementary Table 4. List of genotyped single nucleotide polymorphisms (SNPs) from 44246961 to 44542055 on

chromosome 17. Downloaded from # Name Position ObsHET PredHET HWpval %Geno FamTrio MendErr MAF Alleles 1 rs674310 44246961 0.483 0.5 0.9497 100.0 0 0 0.492 C:G 2 rs674755 44247046 0.483 0.5 0.9497 100.0 0 0 0.492 A:T 3 rs7208074 44249072 0.467 0.473 1.0 100.0 0 0 0.383 G:A 4 rs532154 44251363 0.467 0.499 0.752 100.0 0 0 0.483 A:G

5 rs639679 44254253 0.267 0.278 1.0 100.0 0 0 0.167 A:G genome.cshlp.org 6 rs606911 44263227 0.267 0.278 1.0 100.0 0 0 0.167 G:A 7 rs8076178 44263279 0.1 0.095 1.0 100.0 0 0 0.05 G:C 8 rs12450440 44266352 0.1 0.095 1.0 100.0 0 0 0.05 A:T 9 rs8080446 44267435 0.1 0.095 1.0 100.0 0 0 0.05 A:G 10 rs8078984 44267550 0.1 0.095 1.0 100.0 0 0 0.05 G:A 11 rs8078992 44267572 0.1 0.095 1.0 100.0 0 0 0.05 G:T onSeptember30,2021-Publishedby 12 rs7207087 44272952 0.35 0.45 0.1338 100.0 0 0 0.342 C:G 13 rs612720 44273203 0.45 0.5 0.561 100.0 0 0 0.492 G:A 14 rs497270 44273773 0.267 0.278 1.0 100.0 0 0 0.167 A:G 15 rs12945070 44277734 0.367 0.455 0.1974 100.0 0 0 0.35 G:C 16 rs533842 44278383 0.267 0.278 1.0 100.0 0 0 0.167 G:A 17 rs12948015 44279979 0.483 0.493 1.0 100.0 0 0 0.442 G:T 18 rs3744604 44280858 0.1 0.095 1.0 100.0 0 0 0.05 G:A 19 rs2303015 44284907 0.1 0.095 1.0 100.0 0 0 0.05 T:C 20 rs8074034 44287297 0.483 0.493 1.0 100.0 0 0 0.442 A:G 21 rs1422645 44294657 0.35 0.45 0.1338 100.0 0 0 0.342 C:G 22 rs11434 44296244 0.1 0.095 1.0 100.0 0 0 0.05 G:A

23 rs7222365 44296816 0.1 0.095 1.0 100.0 0 0 0.05 C:T Cold SpringHarborLaboratoryPress 24 rs11654595 44297999 0.1 0.095 1.0 100.0 0 0 0.05 A:G 25 rs12450692 44300502 0.1 0.095 1.0 100.0 0 0 0.05 C:T 26 rs12450563 44300565 0.1 0.095 1.0 100.0 0 0 0.05 G:C 27 rs12453671 44300832 0.1 0.095 1.0 100.0 0 0 0.05 A:T 28 rs882150 44301645 0.483 0.493 1.0 100.0 0 0 0.442 A:G 29 rs12449856 44301996 0.133 0.124 1.0 100.0 0 0 0.067 C:T 30 rs8078826 44302434 0.483 0.493 1.0 100.0 0 0 0.442 A:G 31 rs2411377 44303217 0.1 0.095 1.0 100.0 0 0 0.05 G:A 32 rs318102 44305495 0.45 0.5 0.561 100.0 0 0 0.492 G:C 33 rs178467 44305868 0.267 0.278 1.0 100.0 0 0 0.167 A:G 34 rs8079874 44305974 0.1 0.095 1.0 100.0 0 0 0.05 C:T 35 rs1990936 44307486 0.1 0.095 1.0 100.0 0 0 0.05 G:A 36 rs9904761 44311829 0.367 0.455 0.1974 100.0 0 0 0.35 C:G 37 rs4793990 44315759 0.367 0.455 0.1974 100.0 0 0 0.35 A:G # Name Position ObsHET PredHET HWpval %Geno FamTrio MendErr MAF Alleles 38 rs1962412 44325258 0.4 0.455 0.4682 100.0 0 0 0.35 C:T 39 rs1800632 44325508 0.4 0.444 0.5758 100.0 0 0 0.333 G:A 40 rs318095 44329733 0.5 0.499 1.0 100.0 0 0 0.483 T:C Downloaded from 41 rs962272 44333282 0.5 0.499 1.0 100.0 0 0 0.483 A:G 42 rs962273 44333352 0.4 0.455 0.4682 100.0 0 0 0.35 C:T 43 rs4294857 44338390 0.5 0.499 1.0 100.0 0 0 0.483 G:A 44 rs999474 44342664 0.467 0.498 0.771 100.0 0 0 0.467 G:A 45 rs999475 44343043 0.4 0.455 0.4682 100.0 0 0 0.35 G:A 46 rs46522 44343596 0.5 0.499 1.0 100.0 0 0 0.483 C:T genome.cshlp.org 47 rs9894220 44344153 0.467 0.498 0.771 100.0 0 0 0.467 A:G 48 rs318090 44346751 0.5 0.499 1.0 100.0 0 0 0.483 G:A 49 rs12453374 44349845 0.5 0.499 1.0 100.0 0 0 0.483 G:C 50 rs1058018 44355250 0.467 0.498 0.771 100.0 0 0 0.467 T:C

51 rs4793991 44356800 0.5 0.499 1.0 100.0 0 0 0.483 A:G onSeptember30,2021-Publishedby 52 rs11079842 44357761 0.4 0.455 0.4682 100.0 0 0 0.35 G:A 53 rs15563 44360192 0.5 0.499 1.0 100.0 0 0 0.483 A:G 54 rs1057897 44360508 0.5 0.499 1.0 100.0 0 0 0.483 T:G 55 rs12941604 44362012 0.217 0.193 0.9665 100.0 0 0 0.108 G:A 56 rs16944762 44362496 0.1 0.095 1.0 100.0 0 0 0.05 G:A 57 rs2270576 44362962 0.4 0.455 0.4682 100.0 0 0 0.35 C:T 58 rs4793992 44363206 0.5 0.499 1.0 100.0 0 0 0.483 A:G 59 rs4793993 44363545 0.4 0.455 0.4682 100.0 0 0 0.35 C:T 60 rs17708633 44366619 0.5 0.499 1.0 100.0 0 0 0.483 C:G 61 rs1994970 44369126 0.5 0.499 1.0 100.0 0 0 0.483 T:C 62 rs9747646 44371881 0.5 0.499 1.0 100.0 0 0 0.483 C:T 63 rs12601858 44372476 0.5 0.499 1.0 100.0 0 0 0.483 T:C 64 rs8182364 44373024 0.5 0.499 1.0 100.0 0 0 0.483 A:G Cold SpringHarborLaboratoryPress 65 rs4793995 44374398 0.5 0.499 1.0 100.0 0 0 0.483 A:C 66 rs4793996 44374596 0.5 0.499 1.0 100.0 0 0 0.483 T:C 67 rs4793998 44374898 0.5 0.499 1.0 100.0 0 0 0.483 C:T 68 rs11079844 44383333 0.5 0.499 1.0 100.0 0 0 0.483 G:A 69 rs12602933 44384279 0.5 0.499 1.0 100.0 0 0 0.483 G:C 70 rs4794003 44384903 0.5 0.499 1.0 100.0 0 0 0.483 C:T 71 rs12602746 44386740 0.5 0.499 1.0 100.0 0 0 0.483 A:C 72 rs9904288 44386972 0.4 0.455 0.4682 100.0 0 0 0.35 T:C 73 rs2291725 44394131 0.5 0.499 1.0 100.0 0 0 0.483 T:C 74 rs2291726 44394253 0.5 0.5 1.0 100.0 0 0 0.5 C:C---

75 rs8078510 44400861 0.383 0.439 0.448 100.0 0 0 0.325 G:A 76 rs937301 44401275 0.483 0.5 0.9497 100.0 0 0 0.492 A:G 77 rs3848460 44402113 0.483 0.5 0.9497 100.0 0 0 0.492 A:G 78 rs3809770 44402595 0.45 0.493 0.6283 100.0 0 0 0.442 A:G # Name Position ObsHET PredHET HWpval %Geno FamTrio MendErr MAF Alleles 79 rs3895874 44402867 0.5 0.499 1.0 100.0 0 0 0.483 A:G 80 rs9899404 44408827 0.367 0.473 0.1246 100.0 0 0 0.383 C:T 81 rs9894411 44417476 0.1 0.095 1.0 100.0 0 0 0.05 T:C Downloaded from 82 rs4794012 44420090 0.367 0.464 0.1551 100.0 0 0 0.367 T:G 83 rs4794015 44422825 0.517 0.5 1.0 100.0 0 0 0.492 A:G 84 rs2411759 44426073 0.533 0.499 0.8395 100.0 0 0 0.483 T:C 85 rs1390154 44426482 0.383 0.469 0.2278 100.0 0 0 0.375 G:A 86 rs11650936 44427260 0.55 0.5 0.6492 100.0 0 0 0.492 G:C 87 rs7223256 44430989 0.017 0.017 1.0 100.0 0 0 0.008 C:T genome.cshlp.org 88 rs1495274 44432647 0.117 0.139 0.5523 100.0 0 0 0.075 G:A 89 rs1994969 44435430 0.4 0.491 0.2132 100.0 0 0 0.433 T:G 90 rs9674544 44439710 0.417 0.493 0.3127 100.0 0 0 0.442 A:G 91 rs12939375 44444716 0.4 0.491 0.2132 100.0 0 0 0.433 G:C

92 rs6504592 44445297 0.133 0.124 1.0 100.0 0 0 0.067 C:G onSeptember30,2021-Publishedby 93 rs11079849 44445784 0.483 0.489 1.0 100.0 0 0 0.425 C:T 94 rs9906710 44446282 0.483 0.493 1.0 100.0 0 0 0.442 C:A 95 rs9906944 44446419 0.45 0.483 0.7385 100.0 0 0 0.408 C:T 96 rs10853104 44447075 0.433 0.498 0.417 100.0 0 0 0.467 T:C 97 rs11079851 44447309 0.4 0.491 0.2132 100.0 0 0 0.433 C:G 98 rs4794018 44448397 0.483 0.493 1.0 100.0 0 0 0.442 T:C 99 rs8076012 44456988 0.417 0.5 0.271 100.0 0 0 0.492 A:G 100 rs17708997 44457037 0.15 0.139 1.0 100.0 0 0 0.075 A:G 101 rs9912906 44459431 0.417 0.477 0.44 100.0 0 0 0.392 A:T 102 rs9899580 44461877 0.417 0.477 0.44 100.0 0 0 0.392 T:G 103 rs4794019 44462289 0.417 0.477 0.44 100.0 0 0 0.392 G:T 104 rs8079385 44464426 0.25 0.219 0.7369 100.0 0 0 0.125 C:G 105 rs4794020 44466398 0.417 0.477 0.44 100.0 0 0 0.392 C:T Cold SpringHarborLaboratoryPress 106 rs11870560 44469032 0.433 0.498 0.417 100.0 0 0 0.467 C:T 107 rs12942084 44469391 0.417 0.477 0.44 100.0 0 0 0.392 A:G 108 rs16945851 44469481 0.067 0.064 1.0 100.0 0 0 0.033 C:T 109 rs8073244 44470037 0.233 0.206 0.8491 100.0 0 0 0.117 C:T 110 rs1463761 44473087 0.233 0.206 0.8491 100.0 0 0 0.117 A:G 111 rs11872073 44474134 0.017 0.017 1.0 100.0 0 0 0.008 G:A 112 rs3816272 44475466 0.417 0.477 0.44 100.0 0 0 0.392 A:G 113 rs8068981 44477524 0.417 0.477 0.44 100.0 0 0 0.392 T:C 114 rs4643373 44478422 0.417 0.469 0.5116 100.0 0 0 0.375 T:C 115 rs9896443 44480981 0.383 0.5 0.1046 100.0 0 0 0.492 T:C

116 rs2969 44483246 0.317 0.349 0.6576 100.0 0 0 0.225 C:T 117 rs11655950 44484120 0.417 0.477 0.44 100.0 0 0 0.392 G:A 118 rs2241932 44485229 0.067 0.064 1.0 100.0 0 0 0.033 G:C 119 rs3744085 44486900 0.4 0.499 0.1751 100.0 0 0 0.483 G:A # Name Position ObsHET PredHET HWpval %Geno FamTrio MendErr MAF Alleles 120 rs6504593 44487818 0.4 0.499 0.1751 100.0 0 0 0.483 C:T 121 rs8069452 44495998 0.433 0.5 0.3994 100.0 0 0 0.5 A:A 122 rs1523136 44499531 0.433 0.48 0.5797 100.0 0 0 0.4 T:C Downloaded from 123 rs12939237 44499563 0.183 0.167 1.0 100.0 0 0 0.092 G:A 124 rs940088 44500847 0.433 0.48 0.5797 100.0 0 0 0.4 T:C 125 rs11656072 44502360 0.167 0.18 0.9127 100.0 0 0 0.1 T:C 126 rs11656250 44512427 0.3 0.278 0.9897 100.0 0 0 0.167 G:A 127 rs11657661 44520235 0.283 0.289 1.0 100.0 0 0 0.175 C:T 128 rs4319818 44520997 0.517 0.489 0.9112 100.0 0 0 0.425 C:G genome.cshlp.org 129 rs9972911 44532746 0.483 0.477 1.0 100.0 0 0 0.392 C:T 130 rs9303547 44532912 0.267 0.231 0.6317 100.0 0 0 0.133 A:T 131 rs9910632 44537708 0.433 0.464 0.7541 100.0 0 0 0.367 C:T 132 rs1523135 44538319 0.433 0.464 0.7541 100.0 0 0 0.367 G:C

133 rs6504595 44542055 0.3 0.339 0.5365 100.0 0 0 0.217 G:T onSeptember30,2021-Publishedby Cold SpringHarborLaboratoryPress

Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Supplementary Table 5. Allele frequency of rs2291725 in the HGDP-CEPH populations.

Allele Frequency (%) GIP103T GIP103C Chromosome No. AFRICA (Sub-Sahara) C. African Republic-Biaka Pygmy 100.0 0.0 44 D. R. of Congo-Mbuti Pygmy 92.2 3.8 26 Kenya-Bantu 100.0 0.0 22 Namidia-San 100.0 0.0 10 Nigeria-Yoruba 90.5 9.5 42 Senegal-Mandenka 93.2 6.8 44 South Africa-Bantu 93.7 6.3 16 subtotal 102 AMERICA Brazil-Karitiana 75.0 25.0 28 Brazil-Surui 75.0 25.0 16 Colombia-Piapoco and Curripaco 78.6 21.4 14 Mexico-Maya 71.4 28.6 42 Mexico-Pima 57.1 42.9 28 subtotal 64 EUROPE France-Basque 58.3 41.7 48 France-French 48.2 51.8 56 Italy-Sardinian 39.3 60.7 56 Italy-Tuscan 50.0 50.0 16 Italy-Bergamo 53.8 46.2 26 Orkney Islands-Orcadian 43.3 56.7 30 Russia Caucasus-Adygei 33.0 67.0 34 Russia-Russian 52.0 48.0 50 subtotal 158 MIDDLE EAST Algeria (Mzab)-Mozabite 79.3 20.7 58 Israel (Carmel)-Druze 45.2 54.8 84 Israel (central)-Palestinian 56.5 43.5 92 Israel (Negev)-Bedouin 45.7 54.3 92 subtotal 163 CENTRAL-SOUTH ASIA China-Uygur 55.0 45.0 20 Pakistan-Balochi 47.9 52.1 48 Pakistan-Brahui 44.0 56.0 50 Pakistan-Burusho 52.0 48.0 50 Pakistan-Hazara 38.6 61.4 44 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Pakistan-Kalash 56.5 43.5 46 Pakistan-Makrani 48.0 52.0 50 Pakistan-Pathan 45.5 54.5 44 Pakistan-Sindhi 64.6 35.4 48 subtotal 200 OCEANIA Bougainville-NAN Melanesian 27.3 72.7 22 New Guinea-Papuan 58.8 41.2 34 subtotal 28 EAST ASIA Cambodia-Cambodian 40.0 60.0 20 China-Dai 45.0 55.0 20 China-Daur 38.9 61.1 18 China-Han 28.4 71.6 88 China-Hezhen 27.8 72.2 18 China-Lahu 31.2 68.8 16 China-Miaozu 40.0 60.0 20 China-Mongola 30.0 70.0 20 China-Naxi 62.5 37.5 16 China-Oroquen 22.2 77.8 18 China-She 30.0 70.0 20 China-Tu 35.0 65.0 20 China-Tujia 35.0 65.0 20 China-Xibo 5.6 94.4 18 China-Yizu 35.0 65.0 20 Japan-Japanese 21.4 78.6 56 Siberia-Yakut 24.0 76.0 50 subtotal 229 Total 944 Supplementary Table 6.

EC50 values (-Log EC50, mean ± SEM) for GIP, GIP55G, and GIP55S. Downloaded from

With pooled human serum

Hours of incubation GIP GIP55G GIP55S genome.cshlp.org 0 9.02±0.15 8.47±0.14 8.56±0.19 3 8.21±0.13 8.32±0.19 8.34±0.14 6 8.21±0.15 8.02±0.10 7.74±0.14 onSeptember30,2021-Publishedby 12 7.52±0.19 7.93±0.09 7.38±0.16

With pooled complement preserved human serum Hours of incubation GIP GIP55G GIP55S 0 8.91±0.15 8.51±0.14 8.43±0.18 3 8.53±0.16 8.22±0.15 8.49±0.13 6 8.18±0.12 7.88±0.09 7.79±0.13 12 7.53±0.17 7.86±0.14 7.12±0.22 Cold SpringHarborLaboratoryPress

Supplementary Table 7.

EC50 values (-Log EC50, mean ± SEM) for GIP, GIP55G, and GIP55S Downloaded from after pre-incubation with a recombinant dipeptidyl peptidase IV enzyme (1mIU/ml)

Hours of incubation GIP GIP55G GIP55S genome.cshlp.org 0 8.91±0.15 8.51±0.14 8.43±0.18 3 7.92±0.14 7.70±0.09 7.79±0.14 6 7.32±0.12 7.25±0.09 7.10±0.08 onSeptember30,2021-Publishedby Cold SpringHarborLaboratoryPress

Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

Adaptive selection of an incretin gene in Eurasian populations

Chia Lin Chang, James J Cai, Chiening Lo, et al.

Genome Res. published online October 26, 2010 Access the most recent version at doi:10.1101/gr.110593.110

Supplemental http://genome.cshlp.org/content/suppl/2010/10/27/gr.110593.110.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; accepted Manuscript manuscript is likely to differ from the final, published version.

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Copyright © 2010, Cold Spring Harbor Laboratory Press