Genes and Immunity (2009) 10, 678–686 & 2009 Macmillan Publishers Limited All rights reserved 1466-4879/09 $32.00 www.nature.com/gene

ORIGINAL ARTICLE A population genetics study of the Familial Mediterranean Fever gene: evidence of balancing selection under an overdominance regime

M Fumagalli1,2, R Cagliani1, U Pozzoli1, S Riva1, GP Comi3, G Menozzi1, N Bresolin1,3 and M Sironi1 1Scientific Institute IRCCS E. Medea, Bioinformatic Laboratory, Bosisio Parini (LC), Italy; 2Department of Bioengineering, Politecnico di Milano, Milan, Italy and 3Department of Neurological Sciences, Dino Ferrari Centre, University of Milan, IRCCS Ospedale Maggiore Policlinico, Mangiagalli and Regina Elena Foundation, Milan, Italy

Familial Mediterranean Fever (FMF) is a recessively inherited systemic autoinflammatory disease caused by mutations in the MEFV gene. The frequency of different disease alleles is extremely high in multiple populations from the Mediterranean region, suggesting heterozygote advantage. Here, we characterize the sequence variation and haplotype structure of the MEFV 3 0 gene region (from exon 5 to the 3 0 UTR) in seven human populations. In non-African populations, we observed high levels of variation, an excess of intermediate-frequency alleles, reduced population differentiation and a genealogy with two common haplotypes separated by deep branches. These features are suggestive of balancing selection having acted on this region to maintain one or more selected alleles. In line with this finding, an excess of heterozygotes was observed in Europeans and Asians, suggesting an overdominance regime. Our data, together with the earlier demonstration that the MEFV exon 10 has been subjected to episodic positive selection over primate evolution, provide evidence for an adaptive role of nucleotide variation in this gene region. Our data suggest that further studies aimed at clarifying the role of MEFV variants might benefit from the integration of molecular evolutionary and functional analyses. Genes and Immunity (2009) 10, 678–686; doi:10.1038/gene.2009.59; published online 13 August 2009

Keywords: Familial Mediterranean Fever; balancing selection; heterozygote advantage

Introduction caspase-1, the net result being an attenuation of IL1-b activation. Interestingly, pyrin molecules carrying muta- Familial Mediterranean Fever (FMF, MIM #249100) is tions in the C-terminal B30.2 domain were shown8 to be prototype of a group of disorders, termed systemic less efficient in binding caspase-1 and, therefore, in autoinflammatory diseases, characterized by recurrent suppressing activation. This observation is episodes of fever and sterile peritonitis, arthritis and relevant to our understanding of MEFV function and pleuritis, and gradual development of nephropathic evolution as most FMF mutations cluster within the amyloidosis.1 The disease is recessively inherited and C-terminal protein domain.9 caused by mutations in the MEFV (Mediterranean fever) FMF shows a high prevalence in the Mediterranean gene.2,3 Pyrin (or marenostrin), the protein product of region, and carrier frequencies ranging from 1:3 to 1:5 MEFV, is a 781-amino acid protein expressed in most have been described9 in some populations, such as Jews, types of white blood cells4 and involved in pivotal Arabs, Turks, Armenians and Italians (ITs). Up to now cellular processes such as apoptosis, cytoskeletal signal- 450 disease-associated mutations have been described ing and secretion.5 Although its precise role in (INFEVERS, The registry of FMF and Hereditary Auto- inflammatory processes is still unknown, pyrin has been inflammatory Disorders Mutations; http://fmf.igh.cnrs. shown to interact with an adaptor protein denoted fr/ISSAID/infevers/), yet five of them, E148Q, M680I, apoptosis-associated speck-like protein with a CARD M694 V, M694I and V726A account for about 70–80% of through its PYRIN N-terminal domain6,7 and to regulate cases in the Mediterranean areas.9 Except for E148Q, all caspase-1 activation and IL1-b production.7 More re- other founder mutations are located in exon 10, coding cently, the C-terminal B30.2 domain of pyrin was for the B30.2 domain. Moreover, 4 out of 5 additional demonstrated8 to directly interact with and inhibit mutations showing high frequency in specific popula- tions are also located in exon 10.9 The high frequency of carriers in multiple populations Correspondence: Dr M Sironi, Scientific Institute IRCCS E. Medea, and the presence of distinct disease alleles suggest that Bioinformatic Lab, Via don Luigi Monza 20, 23842 Bosisio Parini heterozygote advantage rather than genetic drift is the (LC), Italy. E-mail: [email protected] cause of FMF prevalence in the Mediterranean area. In Received 12 February 2009; revised 25 June 2009; accepted 8 July particular, the above-described evidences led to the 2009; published online 13 August 2009 speculation8 that C-terminal MEFV mutations, resulting Balancing selection at the MEFV locus M Fumagalli et al 679 in increased IL1-b activation, might confer stronger site heterozygosity, and p,13 the average number of innate immune responses against pathogens. pairwise sequence nucleotide differences. As shown in

Heterozygote advantage, also referred to as over- Table 1, in all populations higher p compared with yW dominance, results in balancing selection, a situation values are observed, indicating an excess of intermedi- whereby greater than neutral levels of polymorphisms ate-frequency variants. are maintained in a population. Tajima’s D,14 Fu and Li’s D* and F*15 are commonly Here, we present evidence that balancing selection has used statistics that evaluate departures of the allelic shaped nucleotide diversity and maintained two diver- frequency distributions from the expected patterns of gent haplotype lineages in the 30 region of the MEVF neutral variation by comparing different estimates of gene, possibly providing evidence to the long-standing nucleotide variation. It should be noted that population hypothesis of selective advantage for FMF mutations. history, in addition to selective processes, is known14,16 to affect frequency spectra and all related statistics such as Tajima’s D and Fu and Li’s D* and F*; in particular, Results positive values of the statistics are expected under a scenario of population contraction.14,16 One possibility to Nucleotide diversity and neutrality tests circumvent this problem is to exploit the fact that As most FMF recurrent mutations are located in the selection acts on a single locus whereas demography C-terminal protein region and given the earlier descrip- affects the whole genome. We therefore calculated the tion of exon 10 having been subjected to episodic positive 95th percentile values for p, Tajima’s D and Fu and Li’s F* selection in primates,10 we wished to verify whether and D* in the distribution of 5 kb windows deriving from evolutionary neutrality could be rejected for the 30 MEFV 232 genes resequenced in AA, EA, AS and YRI by the region. To this aim, a 5.3-kb region (MEFVex5ÀUTR) NIEHS SNPs Program (http://egp.gs.washington.edu; ranging from exon 5 to the gene end was sequenced in NIEHS panel 2); these data are not available for ABO five human populations, namely Asians (AS), Yorubans and SAI. (YRI), Australian Aborigines (ABO), South American As shown in Table 1, for all populations p values

Indians (SAI) and ITs. Data concerning two additional calculated for the MEFVex5ÀUTR region rank high in the populations of European (EA) and African American distribution of NIEHS genes; similarly, most statistics in (AA) descent were derived from the Innate Immunity in EA and AS display higher values than the 95th Heart, Lung and Blood Disease Programs for Genomic percentile, indicating that selection, rather than demo- Applications web site (IIPGA, http://innateimmunity.net). graphy, might be involved. The same holds true for YRI

Analysis of SNPs in the MEFVex5ÀUTR region for the 7 but not for AA, the latter displaying values below the human populations revealed the presence of a total 57 95th percentile. variants. The mild I591T mutation11 was identified in A second possibility to disentangle the effect of two subjects, one AA and one EA. Two additional amino demographic history from selection is to apply popula- acid changes were identified in two IT subjects: tion genetics models, which take demography into the A744S has been shown earlier to be responsible for account. Different such models have been proposed but FMF (INFEVERS, The registry of FMF and Hereditary none of them specifically incorporates demographic Autoinflammatory Disorders Mutations; http://fmf.igh. parameters for SAI and ABO; these populations were cnrs.fr/ISSAID/infevers/), whereas an A701 V substitu- therefore applied the standard neutral model (Supple- tion has never been reported earlier and its relevance to mentary Table 1). Table 1 shows the results we obtained disease is unknown. Nucleotide diversity was assessed by using a calibrated model,17 which is based on the 12 using two indices: yW, an estimate of the expected per ability of generating realistic data rather than relying on

Table 1 Summary statistics for the MEFVex5ÀUTR region

a b c d Pop N S yW p Tajima’s D Fu and Li’s D* Fu and Li’s F*

Ranke Pf Ranke Pf Ranke Pf Ranke

YRI 42 36 15.79 19.39 0.97 0.79 0.041 0.95 0.77 0.053 0.92 0.92 0.028 0.96 AA 48 41 20.36 22.83 0.99 0.42 0.093 0.91 0.29 0.17 0.75 0.40 0.11 0.84 EA 46 24 12.03 21.00 0.98 2.46 0.0036 0.99 0.81 0.14 0.80 1.63 0.022 0.96 IT 46 26 11.16 17.70 NA 2.37 0.0059 NA 0.49 0.26 NA 1.35 0.053 NA AS 50 24 10.11 18.43 0.98 2.68 0.0038 40.99 1.43 0.021 0.98 2.22 0.0030 4.99 SAI 48 19 8.08 12.69 NA 1.82 NA NA 1.66 NA NA 2.04 NA NA ABO 24 20 10.11 14.75 NA 1.68 NA NA 1.32 NA NA 1.67 NA NA

Abbreviations: AA, African American; ABO, Australian Aborigines; AS, Asians; EA, European; IT, Italians; SAI, South American Indians; YRI, Yorubans. aSample size. bNumber of segregating sites. c À4 Per site yW estimate ( Â 10 ). dPer site p ( Â 10À4). ePercentile rank from a distribution of 5 kb windows from 232 NIEHS genes. fP-values obtained with a calibrated population genetics model.

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 680 Table 2 MLHKA test results for MEFVex5-UTR for four populations by deep branches. To examine the genealogy of MEFV haplotypes, we built a median-joining network. The Population MLHKA topology of this network (Figure 1) displays two major clades (haplogroups 1 and 2) separated by long-branch Ka P-value lengths, each containing common haplotypes. One additional minor clade constituted of six African YRI 2.37 0.012 is also observed (haplogroup 3). A table AA 2.68 0.0082 reporting all haplotypes and their frequency in the seven EA 2.28 0.028 populations is available as supplemental information AS 2.42 0.026 (Supplementary Table 2). We next wished to estimate the TMRCA (Time to the Abbreviations: AA, African American; AS, Asians; EA, European; Most Recent Common Ancestor) of the two major YRI, Yorubans. haplotype clades (haplogroups 1 and 2). To this aim, aSelection parameter. we applied a phylogeny-based method28 and estimated a TMRCA of 1.76 MY (s.d.: 461 KY). The same calculation for the whole genealogy estimated a TMRCA of 2.28 MY inference about population histories. In particular, we (s.d.: 567 KY). performed coalescent simulations using the cosi pack- Given the relatively low recombination rate in the age17 and its best-fit population parameters for YRI, AA, region, we wished to verify this result using GENETREE, EA and AS: this model allowed rejection of neutrality for which is based on a maximum-likelihood coalescent all populations excluding AA. Application of other analysis.29,30 The resulting gene tree, rooted using the demographic models18,19 yielded very similar results, chimpanzee sequence, is partitioned into two deep allowing neutrality rejection for AS, EA and IT, whereas branches that correspond to haplogroups 1 and 2 plus revealing borderline or nonsignificant values for African a minor branch accounted for by haplogroup 3 (Figure 2). populations (Supplementary Table 1). Using this method, the TMRCA of the two major MEFV Under neutral evolution, the amount of within-species haplotype lineages amounted to 1.61 MY, whereas a diversity is predicted to correlate with levels of between- slightly deeper TMRCA (1.77 MY) is obtained for the full species divergence, as both depend on the neutral genealogy (that is including haplogroup 3). mutation rate.20 The HKA test21 is commonly used to Deep haplotype genealogies might result from both verify whether this expectation is verified. We applied a balancing selection and ancient population structure maximum-likelihood-ratio HKA (MLHKA) test22 by (reviewed in Garrigan et al.31). Yet, balancing selection is comparing polymorphisms and divergence levels at the expected to elongate the entire neutral genealogy, 30 end of MEFV with 16 reference gene regions whereas the effects of ancient population structure are resequenced in AA, YRI, EA and AS. The MLHKA test reflected in an increase in the genealogical time occupied rejected the hypothesis of neural evolution at the 30 of by single lineages.32,33 A possibility to discriminate MEFV in all populations (Table 2), a result further between these two scenarios is to calculate the percentage supporting the idea that balancing selection has been of congruent mutations, meaning those that occur on the acting on this gene region. It is worth mentioning that the basal branches of a genealogy.33 When this approach is test relies on the assumption that the reference loci we applied to the two major clades (that is haplogroups 1 selected are neutrally evolving; we addressed this point and 2), a percentage of congruent mutations equal to 20% by calculating Tajima’s D and verifying that no sig- is obtained; this is much lower than earlier estimates nificant deviation from neutrality is observed;23 yet, under a model of ancient population structure that Tajima’s D (and similar tests based on the allele- ranged from 42 to 45%,34,35 suggesting that balancing frequency spectrum) might have low power to detect selection rather than population subdivision is respon- departure from neutrality under specific conditions.24 sible for the maintenance of the two clades. Conversely, Still, taking into account the fact that the majority of the presence of a third minor haplogroup restricted to human genes are neutrally evolving, we consider the Africa and with a deeper TMRCA, might be explained by data we obtained provide a reliable inference of selection ancient population structure in the African continent. acting on MEFV. Balancing selection might result from different causes, 25 Population genetic differentiation, quantified by FST, one of them being heterozygote advantage (also known as can also be used to detect the signature of balancing overdominance). To verify whether this is the case for the

selection. In particular, lower FST values are expected at MEFVex5ÀUTR region, for each population we calculated the loci under balancing selection compared with neurally ratio of observed heterozygosity to expected gene diver- 26,27 evolving ones. FST among AA, EA and AS was 0.021, sity. This same ratio calculation has recently been applied much lower than the genome average of 0.12327 and not to human HapMap SNPs and threshold values for significantly different from 0 (P ¼ 0.10); consistently, an inference of overdominance had been set to 1.160, 1.165 36 FST value of 0.021 corresponds to the 0.034 percentile in and 1.167 for YRI, EU (European) and AS, respectively.

the distribution of 232 FST values calculated for 5 kb To obtain a further estimate of the parameter distribution windows from NIEHS genes among the same three in the , ratios were also calculated for the populations. 5 kb windows deriving from NIEHS genes (Figure 3). As shown in Table 3, the observed heterozygosity to expected

Haplotype and genotype analysis gene diversity for the MEFVex5ÀUTR region falls above the One effect of balancing selection is to preserve two or 95th percentile values obtained from either NIEHS genes more major haplotype lineages, each containing a or HapMap SNPs in both EA and AS but not in the common haplotype, and resulting in clades separated remaining populations with available comparisons.

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 681

Figure 1 Genealogy of MEFVex5ÀUTR haplotypes reconstructed through a median-joining network. Each node represents a different haplotype, with the size of the circle proportional to the haplotype frequency. Also, circles are color coded according to population (light green: AA, white: EA, yellow: YRI, red: AS, magenta: SAI, dark green: ABO, blue: IT). The chimpanzee sequence and the root are shown. The position of synonymous (green) and nonsynonymous substitutions (red) is reported with arrows.

Comparison with other primates Finally, we sequenced the region orthologous to MEFV exon 10 in 3 chimpanzees and 4 macaques. Only one coding variant was identified in Pan troglodytes; it was not shared with humans, as expected, as the inferred divergence age of the two human haplotype clades largely postdates the time of human/chimpanzee se- paration. Interestingly, the variation, already described in dbSNP (rs25915108), results in the S747C substitution (residue numbering as in humans) and occurs at presumably high frequency (we found 3 S and 3 C alleles). Analysis of earlier sequenced10 pyrin genes in primates indicated that the S747 allele is shared with humans, Gorilla and Orangutan and, therefore, likely represents the ancestral chimpanzee allele; conversely, the C747 variants is present in some more evolutionary distant primates. As far as macaques are concerned, two coding polymorphic sites were identified, resulting in synonymous substitutions. Figure 2 Estimated tree for the MEFVex5ÀUTR gene region. Muta- tions are represented as black dots and named for their physical position along the 5.3-kb MEFVex5ÀUTR region. Mutation numbering does not correspond to the one shown in this figure as, as reported Discussion in the text, 6 SNPs and 12 chromosomes were removed as they violated the infinite site model. The absolute frequency of each Overall, the data we report here strongly suggest that haplotype is also reported. nucleotide diversity in the 30 region of MEFV has been

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 682 AS EA non-African populations all MEFV statistics rank above 0.6 0.6 the 95th percentile. In line with these findings, when coalescent simulations were performed under different 0.5 0.5 demographic scenarios, neutral evolution was rejected for non-African populations under most models, 0.4 0.4 whereas analysis of AA and YRI often yielded margin- ally significant or nonsignificant statistics. In this respect, 0.3 0.3 it is worth mentioning that in the case of ABO and SAI Observed Observed no empirical comparison could be performed due to the 0.2 0.2 lack of extensive resequencing data; also, little informa-

0.1 0.1 tion is available on these samples from a population genetics perspective so that we cannot formally rule out 0.0 0.0 the possibility that some level of population substructure in these samples partially affects our results. Still, the 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Expected Expected expected/observed heterozygosity ratio tends to be high (especially for ABO), suggesting that selection might be Figure 3 Scatter plot of expected gene diversity and observed acting in these populations. heterozygosity for 5 kb windows deriving from NIEHS genes. Each As far as African populations are concerned, the window is represented by an open circle, whereas the MEFVex5ÀUTR region is shown as a full circle. Data are shown for EA and AS. situation is complicated by the presence of an additional highly divergent clade displaying lower frequency compared with haplogroups 1 and 2. This observation, together with the deeper TMRCA of the MEFV minor Table 3 Ratio of observed heterozygosity to expected gene clade, suggests that haplogroup 3 (Figures 1 and 2) is diversity in the seven populations accounted for by ancient population structure in the African continent, in line with the idea that African Population Ratioa Rankb 95th valuec populations have been more subdivided compared with non-African ones.31,37–42 YRI 0.97 0.33 1.16 We used two different methods to infer the TMRCA of AA 1.099 0.92 NA the two major MEFV haplotype clades: estimates vary EA 1.30 40.99 1.165 slightly depending on the method used, yielding a rough IT 1.044 0.70 NA B AS 1.22 0.99 1.167 estimation of 1.70 MY with wide confidence intervals. SAI 1.036 NA NA Estimates of TMRCAs for neutrally evolving autosomal ABO 1.26 NA NA human loci range between 0.8 and 1.5 MY;42 therefore, the TMRCA we estimated for MEFV, although not Abbreviations: AA, African American; ABO, Australian Aborigines; exceptional, is deeper than for most neutrally evolving AS, Asians; EA, European; IT, Italians; SAI, South American loci. In line with these considerations, it should be noted Indians; YRI, Yorubans. that the TMRCA does not correspond to the age at which aRatio of observed heterozygosity to expected gene diversity. selection has been acting, as neutral polymorphisms can bPercentile rank in the distribution of the same parameters persist in populations for time periods that depend on calculated on 5 kb windows from NIEHS genes. effective population size and generation time. The c95th percentile value in the distribution of HapMap SNPs.36 TMRCA is not therefore in contrast with the weak selection signatures in African populations. Rather, the question remains as to what selective pressure has been shaped by balancing selection in non-African popula- responsible for the establishment and maintenance of tions. This is in line with the region harboring most balancing selection at the MEFV locus. mutations with high prevalence in multiple populations9 An earlier analysis10 of inter-species divergence of and with these mutations being very old.2 pyrin exon 10 in a number of primate species indicated

By analyzing the MEFVex5ÀUTR region in seven human that human FMF mutations often recapitulate ancestral populations, we observed high levels of nucleotide amino acid states; the authors also indicated that the variation, an excess of intermediate-frequency alleles, C-terminal pyrin domain has been subjected to episodic reduced population differentiation and a genealogy with positive selection over primate evolution, with environ- common haplotypes separated by deep branches. mental pathogens possibly providing the underlying In particular, application of neutrality tests based on selective pressure. Our intra-species analysis suggests allele-frequency spectra allowed firm rejection of the that nonneutral evolution of MEFV has characterized neutral model for non-African populations only. Con- human history, as well, and the two studies point to the sistent with these findings and with overdominance common vision whereby variation in the 30 gene region being the underlying cause of balancing selection, an has some adaptive significance. We are presently unable excess of heterozygotes was observed in non-African to determine whether the chimpanzee S747C variant populations but not in African ones. To take into account implies some altered pyrin function and, in this case, the confounding effects of population history, we what the forces responsible for its maintenance are. Still, compared our data with both empirical and theoretical it might be interesting to notice that the derived C747 expectations. As an empirical control for demographic allele, in analogy to human FMF variants,10 represents

effects, we compared the MEFVex5ÀUTR region with a reversal to an ancestral amino acid state. large set of loci resequenced in different human popula- Balancing selection is thought to be rare in humans;43,44 tions: analysis of their distribution indicated that in most up to now most descriptions involve immune response

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 683 loci,45 with MHC genes probably representing the best- processes is the same or not and we can only speculate known example.46 More recently, genes not directly about its nature. Still, the pathogen resistance hypothesis involved in antigen recognition have been described as remains an attractive explanation both for FMF in the targets of balancing selection. For example, these include Mediterranean area and for our findings, also being cytokine genes or their receptors, as exemplified by consistent with the role of pyrin in the IL1-b signaling CCR5, IL1F10, IL1F7, IL18RAP, IL7R and IL4R;47–49 pathway.6–8 It has been noticed that exposure to similarly, b-defensisns50,51 and genes encoding blood- infectious diseases might have increased as humans group antigens23 have been shown to be frequently progressively came in contact with wild animals through subjected to balancing selection. In all these cases, the hunting and scavenging60 and as a consequence of underlying selective pressure has not been ascribed to expanding population sizes that allowed infection one specific infective agent; rather, in analogy to MHC spread and maintenance.61 It is therefore conceivable genes,52 it has been hypothesized23,48 that pathogen that at some point in human history variation in the diversity in terms of different species or genera might terminal region of MEFV might have become adaptive underly the maintenance of balanced polymorphisms at by conferring some resistance to one or more pathogens. some blood group antigen and interleukin genes. These Subsequently, populations living in different geographic observations are consistent with the fact that one possible areas might have adapted to local environmental patho- reason for the establishment of balancing selection is gens with the appearance and maintenance of specific represented by variable environmental conditions (for genetic variants as those we now observe at high example different pathogens being active in different frequency in the Mediterranean area. Yet, it should be times and places) and therefore that different alleles noted that no obvious selective benefit for pyrin might afford protection from different pathogens. mutation carriers has been identified,62–64 although some Another possible explanation for the maintenance of authors have reported that patients and carriers might balanced polymorphisms is heterozygote advantage. be protected from atopy,65 asthma66 and .63,65 Convincing demonstrations of overdominance have been Recently, it has been proposed that hypomorphic pyrin provided for a small number of human genes including variants might have been selected due to their conferring glucose-6-phosphate dehydrogenase,53 hemoglobin-b,54 resistance to brucellosis67 or tuberculosis68 but no direct the cystic transmembrane conductance regula- evidence has been provided. An alternative possibility is tor,55 a member of the interleukin-1 family (IL1F5)48 and that the ancient adaptive significance of MEFV variants the Kidd blood group antigen gene;23 in all these cases, (that is the presence of a balanced polymorphism the responsible selective pressure is thought to be maintaining the two major clades) is due to reasons accounted for by an infectious agent.56 Our data indicate totally different from pathogen resistance and relating to that balancing selection maintaining two major lineages still unknown pyrin functions. In this respect, it might be at the MEFVex5ÀUTR region is likely due to overdomi- worth mentioning that polymorphisms in genes coding nance in Caucasians and AS; indeed, a number of for proteins in the IL1-b pathway have been associated heterozygotes higher than expected is observed in these with increased risk of spontaneous abortion69 and populations and in the case of EAs the deviation we preterm delivery.70 In particular, in the case of IL1B both observed for MEFV is the most extreme in the set of the maternal and fetal genotype have been shown to play NIEHS genes (Figure 3). Overdominance might stem a role.70,71 Given its functioning as a regulator of IL1-b from different effects; in the case of G6PD and HBB activation, pyrin might therefore be involved in preg- mutations, for example, heterozygotes are partially nancy outcome. This effect might be particularly protected from malaria;53,54 alternatively, heterozygotes important after intrauterine infection,72 which in turn might experience increased immune response flexibility represents an important risk factor for preterm delivery.73 by modulating allele-specific gene expression in different Albeit preliminary, these observations might explain the cell types57 and in response to diverse stimuli/cyto- observed excess of heterozygotes at the MEFV locus and kines.58 This effect possibly applies to the promoter fit the indication whereby genes with an effect on regions of DEFB1,51 CCR547 and FUT223 and might be reproduction, in analogy to immune response genes, more likely to apply when balanced polymorphisms are can be frequent targets of balancing selection.74 maintained in regulatory (rather than coding) regions, as Further studies, possibly integrating molecular evolu- is the case of MEFV. tionary and functional analyses, are required to clarify It is worth noting that no heterozygote excess was the reasons why MEFV has been a target of natural observed among ITs, a population with a reportedly59 selection during early human history and to identify the high frequency of FMF (not confirmed in this survey). underlying causes of FMF prevalence in modern Yet, it should be pointed out that none of the most populations. common FMF mutations was identified in our subjects (either IT or of different geographic origin), suggesting that the maintenance of disease-associated mutations in Mediterranean populations and the persistence of the Materials and methods two major lineages we observe here are accounted for by DNA samples and sequencing different evolutionary processes. Indeed, the high fre- Human genomic DNA was obtained from the European quency of FMF mutations in a relatively restricted Collection of Cell Cultures (Ethnic Diversity DNA Panel geographic region suggests that it represents a more plus additional samples for Australian Aborigine, ABO, recent phenomenon compared with the observed main- deriving from HLA-defined panels). Additional DNA tenance of the two balanced haplotype clades in global samples from SAI and Yoruba (YRI) individuals were populations. We are presently unable to determine derived from the Coriell Institute for Medical Research. whether the selective pressure(s) responsible for both Finally, the ‘Telethon Bank of DNA, Nerve and Muscle

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 684 Tissues’ (no. GTF02008), located at the Department of described earlier;23 briefly, Pan troglodytes (NCBI pan- Neurological Sciences, IRCCS Ospedale Maggiore Poli- Tro2) was used as an outgroup and the 16 reference clinico, Mangiagalli and Regina Elena Foundation, genes were randomly selected among NIEHS loci shorter Milan, Italy, was the source of additional DNA samples than 20 kb that have been resequenced in the four of IT subjects. populations (YRI, AA, EA and AS; panel 2); the only The 5.3-kb region covering the 30 portion of MEFV was criterion was that Tajima0s D did not suggest the action of PCR amplified; primer sequences for human and natural selection. primate templates are reported in Supplementary File 1. The median-joining network to infer haplotype gen- PCR products were treated with ExoSAP-IT (USB ealogy was constructed using NETWORK 4.5.28 A first Corporation Cleveland, OH, USA), directly sequenced estimate of the time to the most common ancestor on both strands with a Big Dye Terminator sequencing (TMRCA) was obtained using a phylogeny-based Kit (v3.1 Applied Biosystem, Foster City, CA, USA) and approach implemented in NETWORK 4.5 using a runonanAppliedBiosystemsABI3130XLGenetic mutation rate based on 50 fixed differences between

Analyzer (Applied Biosystem). Sequences were assembled chimpanzee and humans in the 5.3-kb MEFVex5ÀUTR using AutoAssembler version 1.4.0 (Applied Biosystems), region. inspected manually by two distinct operator and singletons The second TMRCA estimate derived from application were re-amplified and resequenced. of a maximum-likelihood coalescent method was im- plemented in GENETREE.29,30 The method assumes an Data retrieval and haplotype construction infinite-site model without recombination and, therefore, MEFV genotype data for American subjects with either haplotypes and sites that violate these assumptions need African (AA) or EA descent were retrieved from the to be removed: out of 304 chromosomes, 12 had to be IIPGA website (http://innateimmunity.net). excluded, as did 6 single segregating sites. Again, the

Genotype data for 232 resequenced human genes mutation rate m for the MEFVex5ÀUTR region was obtained were derived from the NIEHS SNPs Program web site on the basis of the divergence between humans and (http://egp.gs.washington.edu/). In particular, we selected chimpanzees and under the assumption that both the genes that had been resequenced in populations of species separation occurred 6 MYA82 and of a generation defined ethnicity including AA, EA, Yoruba (YRI) and time of 25 years. Using this m and y maximum likelihood

AS (NIEHS panel 2). As detailed in the text, for each gene (yML ¼ 5.6), we estimated the effective population size

a 5-kb window was randomly selected; the only parameter (Ne ¼ 17678). With these assumptions, the

requirement was that it did not contain any long coalescence time, scaled in 2Ne units, was converted (4500 bp) resequencing gap; if the gene did not fulfill into years. For the coalescence process, 106 simulations this requirement it was discarded, as were 5 kb regions were performed. displaying o5 SNPs. The number of analyzed regions All calculations were performed in the R environment for AA, YRI, EA and AS were 207, 203, 193 and 186, (www.r-project.org). respectively. Haplotypes were inferred using PHASE version 2.1,75,76 a program for reconstructing haplotypes from Conflict of interest unrelated genotype data through a Bayesian statistical method. Haplotypes for ABO, AS, YRI, SAI and IT The authors declare no conflict of interest. individuals are available as supplemental information (Supplementary File 2). Acknowledgements Statistical analysis Tajima’s D,14 Fu and Li’s D* and F*15 statistics, as well as We are grateful to Roberto Giorda for helpful discussion 12 13 diversity parameters yW and p were calculated using about the paper. libsequence,77 aCþþ class library providing an object- oriented framework for the analysis of molecular population genetic data. Departure from neutrality was tested from coalescent simulations computed with the References cosi package17 and its best-fit parameters for YRI, AA, EA and AS populations with 10 000 iterations. 1 Sohar E, Gafni J, Pras M, Heller H. Familial Mediterranean fever. A survey of 470 cases and review of the literature. Additional simulations were performed with the ms Am J Med 1967; 2: 227–253. 78 software and parameters for each demographic model 2 The International FMF Consortium. Ancient missense muta- were set as proposed earlier.18,19 tions in a new member of the RoRet gene family are likely to Estimates of the population recombination rate para- cause familial Mediterranean fever. Cell 1997; 4: 797–807.

meter rH01 were obtained from diploid data by a 3 French FMF Consortium. A candidate gene for familial composite likelihood method, with the use of the Web Mediterranean fever. Nat Genet 1997; 1: 25–31. application MAXDIP79 (http://genapps.uchicago.edu/ 4 Centola M, Wood G, Frucht DM, Galon J, Aringer M, Farrell C maxdip/). et al. The gene for familial Mediterranean fever, MEFV, is 25 expressed in early leukocyte development and is regulated The FST statistic estimates genetic differentiation in response to inflammatory mediators. Blood 2000; 10: among populations and was calculated as proposed 80 3223–3231. earlier. Significance was assessed by permuting 10 000 5 Schaner PE, Gumucio DL. Familial Mediterranean fever in 81 times the haplotype distribution among populations. the post-genomic era: how an ancient disease is providing The MLHKA test was performed using the MLHKA new insights into inflammatory pathways. Curr Drug Targets software22 using multilocus data of 16 NIEHS genes as Inflamm Allergy 2005; 1: 67–76.

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 685 6 Richards N, Schaner P, Diaz A, Stuckey J, Shelden E, Wadhwa 28 Bandelt HJ, Forster P, Rohl A. Median-joining networks A et al. Interaction between pyrin and the apoptotic speck for inferring intraspecific phylogenies. Mol Biol Evol 1999; 1: protein (ASC) modulates ASC-induced apoptosis. J Biol Chem 37–48. 2001; 42: 39320–39329. 29 Griffiths RC, Tavare S. Sampling theory for neutral alleles in a 7 Chae JJ, Komarow HD, Cheng J, Wood G, Raben N, Liu PP varying environment. Philos Trans R Soc Lond B Biol Sci 1994; et al. Targeted disruption of pyrin, the FMF protein, causes 1310: 403–410. heightened sensitivity to endotoxin and a defect in macro- 30 Griffiths RC, Tavare S. Unrooted genealogical tree pro- phage apoptosis. Mol Cell 2003; 3: 591–604. babilities in the infinitely-many-sites model. Math Biosci 8 Chae JJ, Wood G, Masters SL, Richard K, Park G, Smith BJ 1995; 1: 77–98. et al. The B30.2 domain of pyrin, the familial Mediterranean 31 Garrigan D, Hammer MF. Reconstructing human origins in fever protein, interacts directly with caspase-1 to modulate the genomic era. Nat Rev Genet 2006; 9: 669–680. IL-1beta production. Proc Natl Acad Sci USA 2006; 26: 32 Takahata N. A simple genealogical structure of strongly 9982–9987. balanced allelic lines and trans-species evolution of poly- 9 Touitou I. The spectrum of Familial Mediterranean Fever morphism. Proc Natl Acad Sci USA 1990; 7: 2419–2423. (FMF) mutations. Eur J Hum Genet 2001; 7: 473–483. 33 Wall JD. Detecting ancient admixture in humans using 10 Schaner P, Richards N, Wadhwa A, Aksentijevich I, Kastner D, sequence polymorphism data. Genetics 2000; 3: 1271–1279. Tucker P et al. Episodic evolution of pyrin in primates: human 34 Barreiro LB, Patin E, Neyrolles O, Cann HM, Gicquel B, mutations recapitulate ancestral amino acid states. Nat Genet Quintana-Murci L. The heritage of pathogen pressures and 2001; 3: 318–321. ancient demography in the human innate-immunity CD209/ 11 Fisher BA, Lachmann HJ, Rowczenio D, Goodman HJ, Bhalara CD209 L region. Am J Hum Genet 2005; 5: 869–886. S, Hawkins PN. Colchicine responsive periodic fever syn- 35 Garrigan D, Mobasher Z, Kingan SB, Wilder JA, Hammer MF. drome associated with pyrin I591T. Ann Rheum Dis 2005; 9: Deep haplotype divergence and long-range linkage disequili- 1384–1385. brium at xp21.1 provide evidence that humans descend from a 12 Watterson GA. On the number of segregating sites in genetical structured ancestral population. Genetics 2005; 4: 1849–1856. models without recombination. Theor Popul Biol 1975; 2: 36 Alonso S, Lopez S, Izagirre N, de la Rua C. Overdominance in 256–276. the human genome and olfactory receptor activity. Mol Biol 13 Nei M, Li WH. Mathematical model for studying genetic Evol 2008; 5: 997–1001. variation in terms of restriction endonucleases. Proc Natl Acad 37 Harding RM, McVean G. A structured ancestral population for Sci USA 1979; 10: 5269–5273. the evolution of modern humans. Curr Opin Genet Dev 2004; 6: 14 Tajima F. Statistical method for testing the neutral mutation 667–674. hypothesis by DNA polymorphism. Genetics 1989; 3: 585–595. 38 Harris EE, Hey J. X evidence for ancient human 15 Fu YX, Li WH. Statistical tests of neutrality of mutations. histories. Proc Natl Acad Sci USA 1999; 6: 3320–3324. Genetics 1993; 3: 693–709. 39 Labuda D, Zietkiewicz E, Yotova V. Archaic lineages in the 16 Wooding S, Rogers A. The matrix coalescent and an applica- history of modern humans. Genetics 2000; 2: 799–808. tion to human single-nucleotide polymorphisms. Genetics 40 Goldstein DB, Chikhi L. Human migrations and population 2002; 4: 1641–1650. structure: what we know and why it matters. Annu Rev 17 Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Genomics Hum Genet 2002; 3: 129–152. Calibrating a coalescent simulation of human genome 41 Satta Y, Takahata N. The distribution of the ancestral sequence variation. Genome Res 2005; 11: 1576–1583. haplotype in finite stepping-stone models with population 18 Marth GT, Czabarka E, Murvai J, Sherry ST. The allele expansion. Mol Ecol 2004; 4: 877–886. frequency spectrum in genome-wide human variation data 42 Tishkoff SA, Verrelli BC. Patterns of human genetic diversity: reveals signals of differential demographic history in three implications for human evolutionary history and disease. large world populations. Genetics 2004; 1: 351–372. Annu Rev Genomics Hum Genet 2003; 4: 293–340. 19 Voight BF, Adams AM, Frisse LA, Qian Y, Hudson RR, Di 43 Asthana S, Schmidt S, Sunyaev S. A limited role for balancing Rienzo A. Interrogating multiple aspects of variation in a full selection. Trends Genet 2005; 1: 30–32. resequencing data set to infer human population size changes. 44 Bubb KL, Bovee D, Buckley D, Haugen E, Kibukawa M, Proc Natl Acad Sci USA 2005; 51: 18508–18513. Paddock M et al. Scan of human genome reveals no new 20 Kimura M. The Neutral Theory of Molecular Evolution. Loci under ancient balancing selection. Genetics 2006; 4: Cambridge University Press: Cambridge, 1983. 2165–2177. 21 Hudson RR, Kreitman M, Aguade M. A test of neutral 45 Hurst LD. Fundamental concepts in genetics: genetics and the molecular evolution based on nucleotide data. Genetics 1987; 1: understanding of selection. Nat Rev Genet 2009; 2: 83–93. 153–159. 46 Takahata N, Satta Y. Footprints of intragenic recombination at 22 Wright SI, Charlesworth B. The HKA test revisited: a HLA loci. Immunogenetics 1998; 6: 430–441. maximum-likelihood-ratio test of the standard neutral model. 47 Bamshad MJ, Mummidi S, Gonzalez E, Ahuja SS, Dunn DM, Genetics 2004; 2: 1071–1076. Watkins WS et al. A strong signature of balancing selection in 23 Fumagalli M, Cagliani R, Pozzoli U, Riva S, Comi GP, Menozzi the 50 cis-regulatory region of CCR5. Proc Natl Acad Sci USA G et al. Widespread balancing selection and pathogen-driven 2002; 16: 10539–10544. selection at blood group antigen genes. Genome Res 2009; 2: 48 Fumagalli M, Pozzoli U, Cagliani R, Comi GP, Riva S, Clerici 199–212. M et al. Parasites represent a major selective force for 24 Zhai W, Nielsen R, Slatkin M. An investigation of the interleukin genes and shape the genetic predisposition to statistical power of neutrality tests based on comparative autoimmune conditions. J Exp Med 2009; 6: 1395–1408. and population genetic data. Mol Biol Evol 2009; 2: 273–283. 49 Wu X, Di Rienzo A, Ober C. A population genetics study of 25 Wright S. Genetical structure of populations. Nature 1950; single nucleotide polymorphisms in the interleukin 4 receptor 4215: 247–249. alpha (IL4RA) gene. Genes Immun 2001; 3: 128–134. 26 Bowcock AM, Kidd JR, Mountain JL, Hebert JM, Carotenuto L, 50 Hollox EJ, Armour JA, Barber JC. Extensive normal copy Kidd KK et al. Drift, admixture, and selection in human number variation of a beta-defensin antimicrobial-gene evolution: a study with DNA polymorphisms. Proc Natl Acad cluster. Am J Hum Genet 2003; 3: 591–600. Sci USA 1991; 3: 839–843. 51 Cagliani R, Fumagalli M, Riva S, Pozzoli U, Comi GP, 27 Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating Menozzi G et al. The signature of long-standing balancing a high-density SNP map for signatures of natural selection. selection at the human defensin beta-1 promoter. Genome Biol Genome Res 2002; 12: 1805–1814. 2008; 9: R143.

Genes and Immunity Balancing selection at the MEFV locus M Fumagalli et al 686 52 Prugnolle F, Manica A, Charpentier M, Guegan JF, Guernier V, 67 Ross JJ. Goats, germs, and fever: are the pyrin mutations Balloux F. Pathogen-driven selection and worldwide HLA responsible for familial Mediterranean fever protective against class I diversity. Curr Biol 2005; 11: 1022–1027. Brucellosis? Med Hypotheses 2007; 3: 499–501. 53 Verrelli BC, McDonald JH, Argyropoulos G, Destro-Bisol G, 68 Cattan D. Familial Mediterranean fever: is low mortality from Froment A, Drousiotou A et al. Evidence for balancing tuberculosis a specific advantage for MEFV mutations selection from nucleotide sequence analyses of human carriers? Mortality from tuberculosis among Muslims, Jewish, G6PD. Am J Hum Genet 2002; 5: 1112–1128. French, Italian and Maltese patients in Tunis (Tunisia) in the 54 Aidoo M, Terlouw DJ, Kolczak MS, McElroy PD, ter Kuile FO, first half of the 20th century. Clin Exp Rheumatol 2003; 4 (Suppl Kariuki S et al. Protective effects of the sickle cell gene 30): S53–S54; author reply S54. against malaria morbidity and mortality. Lancet 2002; 9314: 69 Perni SC, Vardhana S, Tuttle SL, Kalish RB, Chasen ST, Witkin 1311–1312. SS. Fetal interleukin-1 receptor antagonist gene polymorph- 55 Pier GB, Grout M, Zaidi T, Meluleni G, Mueschenborn SS, ism, intra-amniotic interleukin-1beta levels, and history of Banting G et al. Salmonella typhi uses CFTR to enter intestinal spontaneous abortion. Am J Obstet Gynecol 2004; 4: 1318–1323. epithelial cells. Nature 1998; 6680: 79–82. 70 Genc MR, Gerber S, Nesin M, Witkin SS. Polymorphism in 56 Stiehm ER. Disease versus disease: how one disease may the interleukin-1 gene complex and spontaneous preterm ameliorate another. Pediatrics 2006; 1: 184–191. delivery. Am J Obstet Gynecol 2002; 1: 157–163. 57 Beaty JS, Sukiennicki TL, Nepom GT. Allelic variation in 71 Wang ZC, Yunis EJ, De los Santos MJ, Xiao L, Anderson DJ, transcription modulates MHC class II expression and func- Hill JA. T helper 1–type immunity to trophoblast antigens in tion. Microbes Infect 1999; 11: 919–927. women with a history of recurrent pregnancy loss is 58 Beaty JS, West KA, Nepom GT. Functional effects of a natural associated with polymorphism of the IL1B promoter region. polymorphism in the transcriptional regulatory sequence of Genes Immun 2002; 1: 38–42. HLA-DQB1. Mol Cell Biol 1995; 9: 4771–4782. 72 Hernandez-Guerrero C, Monzon-Bordonaba F, Jimenez- 59 La Regina M, Nucera G, Diaco M, Procopio A, Gasbarrini G, Zamudio L, Ahued-Ahued R, Arechavaleta-Velasco F, Strauss Notarnicola C et al. Familial Mediterranean fever is no longer 3rd JF et al. In-vitro secretion of proinflammatory a rare disease in Italy. Eur J Hum Genet 2003; 1: 50–56. by human amniochorion carrying hyper-responsive gene 60 McMichael AJ. Environmental and social influences on polymorphisms of tumour necrosis factor-alpha and emerging infectious diseases: past, present and future. Philos interleukin-1beta. Mol Hum Reprod 2003; 10: 625–629. Trans R Soc Lond B Biol Sci 2004; 1447: 1049–1058. 73 Yost NP, Cox SM. Infection and preterm labor. Clin Obstet 61 Dobson A. People and disease. In: Jones S, Martin R and Gynecol 2000; 4: 759–767. Pilbeam D (eds). The Cambridge Encyclopedia of Human 74 Charlesworth D. Balancing selection and its effects on Evolution. Cambridge University Press: Cambridge, 1992. pp sequences in nearby genome regions. PLoS Genet 2006; 4: e64. 411–420. 75 Stephens M, Smith NJ, Donnelly P. A new statistical method 62 Kogan A, Shinar Y, Lidar M, Revivo A, Langevitz P, Padeh S for haplotype reconstruction from population data. Am J Hum et al. Common MEFV mutations among Jewish ethnic groups Genet 2001; 4: 978–989. in Israel: high frequency of carrier and phenotype III states 76 Stephens M, Scheet P. Accounting for decay of linkage and absence of a perceptible biological advantage for the disequilibrium in haplotype inference and missing-data carrier state. Am J Med Genet 2001; 3: 272–276. imputation. Am J Hum Genet 2005; 3: 449–462. 63 Kalyoncu M, Acar BC, Cakar N, Bakkaloglu A, Ozturk S, 77 Thornton K. Libsequence: a C++ class library for evolutionary Dereli E et al. Are carriers for MEFV mutations ‘healthy’? Clin genetic analysis. Bioinformatics 2003; 17: 2325–2327. Exp Rheumatol 2006; 5(Suppl 42): S120–S122. 78 Hudson RR. Generating samples under a Wright-Fisher 64 Yepiskoposyan L, Harutyunyan A. Population genetics of neutral model of genetic variation. Bioinformatics 2002; 2: familial Mediterranean fever: a review. Eur J Hum Genet 2007; 337–338. 9: 911–916. 79 Hudson RR. Two-locus sampling distributions and their 65 Sackesen C, Bakkaloglu A, Sekerel BE, Ozaltin F, Besbas N, application. Genetics 2001; 4: 1805–1817. Yilmaz E et al. Decreased prevalence of atopy in paediatric 80 Hudson RR, Boos DD, Kaplan NL. A statistical test patients with familial Mediterranean fever. Ann Rheum Dis for detecting geographic subdivision. Mol Biol Evol 1992; 1: 2004; 2: 187–190. 138–151. 66 Brenner-Ullman A, Melzer-Ofir H, Daniels M, Shohat M. 81 Hudson RR, Slatkin M, Maddison WP. Estimation of levels of Possible protection against in heterozygotes gene flow from DNA sequence data. Genetics 1992; 2: 583–589. for familial Mediterranean fever. Am J Med Genet 1994; 2: 82 Glazko GV, Nei M. Estimation of divergence times for major 172–175. lineages of primate species. Mol Biol Evol 2003; 3: 424–434.

Supplementary Information accompanies the paper on Genes and Immunity website (http://www.nature.com/gene)

Genes and Immunity