<<

Interspecific introgressive origin of genomic diversity in the house

Kevin J. Liua,1,2, Ethan Steinberga, Alexander Yozzoa, Ying Songb,3, Michael H. Kohnb,1, and Luay Nakhleha,b,1

aDepartment of Computer Science and bBioSciences, Rice University, Houston, TX 77005

Edited by John C. Avise, University of California, Irvine, CA, and approved November 12, 2014 (received for review April 4, 2014) We report on a -wide scan for introgression between the introgression from M. spretus into some M. m. domesticus pop- ( musculus domesticus) and the ulations in the wild, involving the vitamin K epoxide reductase (Mus spretus), using samples from the ranges of sympatry and subcomponent 1 (Vkorc1) gene, which was later shown to be allopatry in Africa and Europe. Our analysis reveals wide variabil- more widespread in Europe, albeit geographically restricted to ity in introgression signatures along the , as well as parts of southwestern and central Europe (11). across the samples. We find that fewer than half of the autosomes Major, unanswered questions arise from these studies. First, is in each genome harbor all detectable introgression, whereas the the vicinity around the Vkorc1 gene an isolated case of adaptive X has none. Further, European mice carry more introgression in the house mouse genome, or do many other such M. spretus alleles than the sympatric African ones. Using the regions exist? Second, is introgression between M. spretus and length distribution and sharing patterns of introgressed genomic M. m. domesticus common outside the range of sympatry? Third, tracts across the samples, we infer, first, that at least three distinct have there been other hybridization events, and, in particular, hybridization events involving M. spretus have occurred, one of more ancient ones? Fourth, what role do introgressed genes, which is ancient, and the other two are recent (one presumably and, more generally, genomic regions, play? due to warfarin rodenticide selection). Second, several of the To investigate these open questions, we used genome-wide inferred introgressed tracts contain genes that are likely to confer variation data from 20 M. m. domesticus samples (wild and wild- adaptive advantage. Third, introgressed tracts might contain derived) from the ranges of sympatry and allopatry, as well as two

driver genes that determine the evolutionary fate of those tracts. M. spretus samples. For detecting introgression, we used PhyloNet- EVOLUTION Further, functional analysis revealed introgressed genes that are HMM (12), a newly developed method for statistical inference of essential to fitness, including the Vkorc1 gene, which is implicated introgression in genomes while accounting for other evolutionary in rodenticide resistance, and olfactory receptor genes. Our find- processes, most notably incomplete lineage sorting (ILS). ings highlight the extent and role of introgression in nature and Our analysis provides answers to the questions posed above. call for careful analysis and interpretation of house mouse data First, we find signatures of introgression between M. spretus and in evolutionary and genetic studies. each of the M. m. domesticus samples. The amount of intro- gression varies across the autosomes of each genome, with a few Mus musculus | Mus spretus | hybridization | adaptive introgression | harboring all detectable introgression, and most of PhyloNet-HMM Significance lassical strains, as well as newly established Cwild-derived ones, are widely used by geneticists for answering The mouse has been one of the main mammalian model organ- a diverse array of questions (1). Understanding the genome con- isms used for genetic and biomedical research. Understanding the tents and architecture of these strains is important for studies of evolution of house mouse genomes would shed light not only natural variation and complex traits, as well as evolutionary studies on genetic interactions and their interplay with traits in the Mus spretus Mus musculus in general (2). , a sister of , mouse but would also have significant implications for human M. musculus impacts the findings in investigations for at least and health. Analysis using a recently developed sta- two reasons. First, it was deliberately interbred with laboratory tistical method shows that the house mouse genome is a mo- M. musculus strains to introduce genetic variation (3). Second, saic that contains previously unrecognized contributions from is partially sympatric (naturally a different mouse species. We traced these contributions to cooccurring) with M. spretus (Fig. 1). ancient and recent interbreeding events. Our findings reveal Recent studies have examined admixture between the extent of introgression in an important mammalian ge- of house mice (5–8), but have not studied introgression with nome and provide an approach for genome-wide scans of in- M. spretus. In at least one case (5), the introgressive descent of trogression in other eukaryotic genomes. the mouse genome was hidden due to data postprocessing that masked introgressed genomic regions as missing data. In an- Author contributions: K.J.L., M.H.K., and L.N. designed research; K.J.L. performed re- other study reporting whole-genome sequencing of 17 classical search; K.J.L., E.S., A.Y., and Y.S. contributed new reagents/analytic tools; K.J.L., M.H.K., laboratory strains (6), M. spretus was used as an outgroup for and L.N. analyzed data; and K.J.L., M.H.K., and L.N. wrote the paper. phylogenetic analysis. The authors were surprised to find that The authors declare no conflict of interest. 12.1% of loci failed to place M. spretus as an outgroup to the This article is a PNAS Direct Submission. M. musculus . The authors concluded that M. spretus was Data deposition: The sequences reported in this paper have been deposited in the GenBank not a reliable outgroup but did not pursue their observation fur- database (accession no. GSE62906). ther. On the other hand, in a 2002 study (9), Orth et al. compiled 1To whom correspondence may be addressed. Email: [email protected], [email protected], or data on allozyme, microsatellite, and mitochondrial variation [email protected]. 2 in house mice from (sympatry) and nearby countries in Present address: Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824. western and central Europe. Interestingly, allele sharing between 3Present address: The State Key Laboratory for Biology of Plant Diseases and Insect Pests the species was observed in the range of sympatry but not outside and Key Laboratory of Weed and Biology and Management, Institute of Plant in the range of allopatry. The studies demonstrated the possibility Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China. of natural hybridization between these two sister species. Fur- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. ther, the study of Song et al. (10) demonstrated a recent adaptive 1073/pnas.1406298111/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1406298111 PNAS Early Edition | 1of6 Downloaded by guest on September 29, 2021 on five chromosomes in the sample from La Roca del Vallès, Spain. For all samples, fewer than half the chromosomes of a sample’s genome carried any detected introgression (SI Appendix, Figs. S2–S20). The analysis did not detect any introgression on chromosome X (SI Appendix,Fig.S21). Further, in the two sam- ples from Spain and the six Germany–Hamm samples, one or two chromosomes carried over 50% of all detected introgression. Generally, the percentage of introgressed sites in a genome ranged from about 0.02% in a sample from to about 0.8% in samples from Germany (Fig. 2). The large extent of detected in- trogression between M. spretus and M. m. domesticus seen on chromosome 17 in the samples from Spain (see SI Appendix) merits further investigation. The introgressed regions spatially coincide with the known polymorphic recombination-suppressing inversions and t-hapolotypes in house mice (13). The amount of introgression in the genomes of the 20 samples points qualitatively to three groups of samples: Group I, which includes the two samples from Spain and the six Germany– Hamm samples; Group II, which includes the two other Ger- many samples and the Italy and Greece samples; and Group III, Fig. 1. Species ranges and samples used in our study. The species range of which includes the samples from Africa. Variability in the M. spretus isshowningreen(4),andthespeciesrangeofM. m. domesticus amount of introgression across samples within each group is includes the blue regions, the range of M. spretus, and beyond (1). M. m. domesticus and M. spretus samples were obtained from locations marked much smaller than that across groups, as is the amount of sharing with red circles and purple diamonds, respectively. The samples originated of introgressed regions. Further, Group I has the most intro- from within and outside the area of sympatry between the two species. gression, and Group III has the least. Notice that all samples (SI Appendix, Table S1, provides additional details about the samples used in within Group I, except for the one from Spain–Arenal, contain our study.) the introgression with M. spretus that carries Vkorc1 (10). Group II contains all of the allopatric European mice that do not carry Vkorc1, and Group III contains all of the sympatric African the chromosomes have none. We detected no introgression on the samples. This categorization guides the displays and analyses of X chromosome. Further, the amount of introgression varied our results below. These results answer the first two questions we widely across the samples. Our analyses demonstrate intro- posed above in the affirmative: there are introgressed genomic gression outside the range of sympatry. In fact, our results show regions beyond the region that contains Vkorc1, and introgression more signatures of introgression in the genomes of allopatric is present in all 20 samples, pointing to the spread of introgres- samples from Europe than in sympatric samples from Africa. For sions beyond the range of sympatry. Quantifying how common the third question, we used the length distribution and sharing such introgressions are outside the range of sympatry, however, patterns of introgressed regions across the samples to show requires denser sampling that is beyond the scope of this work. support for at least three hybridization events: an ancient hy- bridization event that predates the colonization of Europe by Support for More Than a Single Hybridization Event. To answer the M. m. domesticus and two more recent events, one of which third question of whether multiple, distinct hybridization events presumably occurred about 50 y ago and is related to warfarin involving M. spretus and M. m. domesticus have occurred, we resistance selection (10). For the fourth question, our functional focused on two analyses: inspecting the introgressed tract length analysis of the introgressed genes shows enrichment for certain distribution, where an introgressed tract is defined as a maximally categories, most notably olfaction—an essential trait for the fit- ness of . Understanding the genomic architecture and

evolutionary history of the house mouse has broad implications 45 on various aspects of evolutionary, genetic, and biomedical re- 40 35 A search endeavors that use this . The PhyloNet- 30 HMM method (12) can be used to detect introgression in other 25 20 eukaryotic species, further broadening the impact of this work. 15 Length (Mb) 10 5 Results 0 We now describe our findings of introgression within the in- 0.8 Introgressed Introgressed 0.7 B dividual genomes, as well as across the genomes of the 20 0.6 M. m. domesticus samples (40 haploid genomes). The four African 0.5 samples, as well as the two samples from Spain, are sympatric with 0.4 M. spretus 0.3 , whereas the other samples are allopatric (Fig. 1). Percentage (%) 0.2 0.1 0 Genome-Wide Signals of Introgression. Our analysis detected in- SpainSpainGermanyGermanyGermanyGermanyGermanyGermanyGermanyGermanyGreeceGreeceItaly Italy Italy Italy TunisiaTunisiaMoroccoAlgeria M. spretus M. m. domesticus trogression between and in the MenconicoSanGirogioCassinoMilazzo ArenalRocadelVallès KorinthosLaganas MonastirMonastir genomes of all 20 M. m. domesticus samples; Fig. 2 (SI Appendix HammHammHammHammHammHamm Remderoda

provides complete scans of introgression of all 20 samples). A B C D E F A B However, the patterns of introgression varied across the chro- mosomes within each individual genome, as well as across the Fig. 2. Amount of introgressed genetic material in the 20 M. m. domesticus genomes. In terms of within-genome variability, a few chromo- samples. (A) The amount of introgressed genetic material in Mb per sample. somes in each genome carried almost all of the introgressed (B) The amount of introgressed genetic material as a percentage of the regions. For example, all detected introgressed regions resided genome length per sample.

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1406298111 Liu et al. Downloaded by guest on September 29, 2021 contiguous introgressed region, and inspecting the sharing pattern other two hybridization events are more recent, and one of them of introgressed regions across the samples. Repeated back-cross- presumably occurred about 50 y ago and is related to warfarin ing, recombination, and drift result in fragmentation of intro- resistance selection (10). gressed regions, with very long regions pointing to recent hybridization events. On the other hand, selection on adaptively Adaptive Signals of Introgression. Introgressed genomic tracts and introgressed regions could also maintain them for long periods, the genes they carry are generally assumed to be neutral or confounding the tract length-based analysis of the age of hy- deleterious. Further, such tracts would naturally be expected to bridization. However, if a long region is shared across some, but be present in the genomes of sympatric hybridizing taxa. Con- not all, samples from the population, that increases the likeli- sequently, in the samples we considered here, one would expect hood of a recent hybridization hypothesis. to find more introgressed tracts, if any, in the sympatric mice (the For each of the three groups, we plotted the distribution of African and Spanish samples) than in the allopatric ones. introgressed tract lengths, where an introgressed tract is defined However, our results give a very different picture from these as a maximally contiguous introgressed region (Fig. 3). The fig- expectations. We hypothesize that some of these introgressed ure shows that Group I contains the only samples that have tracts have conferred selective advantage on the mice that carry introgressed tracts of lengths exceeding 4 Mb. All these tracts them. For example, the introgressed region on chromosome 7 in correspond to the adaptively introgressed region that contains Group I contains the Vkorc1 gene whose introgression and Vkorc1 between positions 122 and 132 Mb on chromosome 7 adaptive role were discussed in the context of warfarin resistance A (Fig. 4 ). The exclusivity of these very long introgressed tracts to selection in ref. 10. To identify whether other introgressed Group I points to a very recent hybridization event involving regions of adaptive roles are associated with that Vkorc1-con- M. spretus , in agreement with the assessment of ref. 10. Except taining region, we applied the selective sweep measure of ref. 14 for this group of introgressed tracts, the three distributions are to a comparison of rodenticide-resistant to rodenticide-suscep- very similar, with an excess of very short tracts and a smaller tible wild M. m. domesticus samples, which favors detection of number of longer tracts (up to 4 Mb). The very short tracts could the recent rodenticide-related selective sweep (within the last be a signal of ancient hybridization or just incorrect inference by ∼50 y). Not surprisingly, the selective sweep statistics in the the method (detecting very short introgressed regions is very Vkorc1-containing region were among the largest of any detected hard due to low signal-to-noise ratio). However, the pattern of in our study (Fig. 4A) (see SI Appendix for full results). We also

sharing of introgressed regions across the samples supports a EVOLUTION detected selective sweeps outside the Vkorc1 region. hypothesis of ancient hybridization, as we now discuss. To assess the potential adaptive benefit of other introgressed Fig. 4 shows examples of three different patterns of in- regions, we used the frequencies of the introgressed regions, as trogression across the samples (full genome-wide scans of all reflected by the sharing patterns. For example, the shared intro- samples can be found in the SI Appendix). Fig. 4A shows intro- gressed regions across sympatric and allopatric samples on chro- gressed tracts that are shared exclusively among samples in B Group I (we hypothesize that the Spain–Arenal sample un- mosome 1 and 7, as shown in Fig. 4 , point to a hypothesis of derwent a secondary loss of the introgressed Vkorc1-containing adaptive roles of parts of these regions. To further zoom in on these region that it once had). As we discussed above, these point to at shared regions, we analyzed the sharing patterns of genes across the least one very recent hybridization event involving M. spretus. introgressed regions in all samples. Fig. 5 shows the Venn diagram Fig. 4B shows introgressed tracts that are shared across samples of the sets of introgressed genes in Groups I, II, and III. from all three groups. This pattern points to an ancient hybrid- Fig. 5 shows that the two European groups have 399 intro- ization event involving M. spretus and that precedes the ancestor gressed genes in common, almost twice the number of intro- of all M. m. domesticus samples in the study. It is important to gressed genes that are in common between either of them and note here that this pattern could also be a signature of balancing the African group. We hypothesize that the set of 157 genes that selection on standing variation before the split of M. musculus are shared across all three groups contain a subset that we call and M. spretus. We discuss this possibility in the Discussion sec- “driver genes”—those that have driven the maintenance of those tion below. Fig. 4C shows introgressed regions, of considerable introgressed regions for a long time across the samples. In our length, in the sample from . This is, again, a signature of proposed classification, driver genes would be beneficial upon a recent hybridization event that is different from that involving introgression and would be subject to selection. Genes that are the large tracts on chromosome 7 in Group I. introgressed in one group, but not the others, are potential Putting together all of the evidence, the data supports a hy- neutral, linked “passenger” genes. Although passenger genes pothesis of at least three distinct hybridization events. One hy- would be expected to be neutral, they could also introduce new bridization event is ancient, predating the colonization of polymorphisms into M. m. domesticus genomes and could be- Europe by M. m. domesticus upward of 2,000 y ago (15). The come subject to selection at some point during its sojourn time.

A 100 B 100 C 100

10 10 10 Count Count Count

1 1 1

0 20 40 60 80 100 120 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 45 Tract length (100 kb) Tract length (100 kb) Tract length (100 kb)

Fig. 3. Distributions of introgressed tract lengths detected in the 40 haploid genomes. (A) Group I: The six Germany–Hamm samples and two Spain samples. (B) Group II: The four Italy, two Greece, and two other Germany samples. (C) Group III: The , Morocco, and two Tunisia samples. Note the x axis scale difference between panel A and the other two panels. (See main text for the rationale behind the grouping.)

Liu et al. PNAS Early Edition | 3of6 Downloaded by guest on September 29, 2021 Individual driver genes on introgressed tracts are not expected to result in functional enrichment scores. For example, the Vkorc1 A Spain Arenal introgressed region that contains on chromosome 7 has Spain RocadelVallès many other genes, yet is not enriched for any functional cate- Germany Hamm A Germany Hamm B gories. On the other hand, introgressed tracts with an abundance Germany Hamm C Germany Hamm D of genes from a given family tend to result in significant en- Germany Hamm E Germany Hamm F richment scores of the tracts. We illustrate this with two exam- Germany Germany Remderoda ples related to olfaction, a multigenic trait known to be essential Greece Korinthos for the fitness of rodents (full lists of the genes in introgressed Greece Lagunas Italy Menconico regions and their functional enrichment are Italy SanGirogio Italy Cassino given in the SI Appendix). The introgressed tract on chromosome Italy Milazzo Tunisia Monastir A 1 that is shared by mice from Africa and Europe, including al- Tunisia Monastir B P = Morocco lopatric mice, is significantly enriched [ 5.9E-8 after Benjamini Algeria and Hochberg (16) correction] for genes involved in olfactory transduction and encodes olfactory receptor genes (13 out of 36 12 10 B 8 genes) located on the contiguous tracts (Fig. 4 ). It is conceivable CLR 6

0 2 4 6 8 10 12 score 4 that this group of genes, or a subset of these, may have acted as

XP 2 Normalized 0 a driver for this introgressed fragment carrying at least another 186 189 59 63 122 Vkorc1 132 58 61 74 79 102 105 chr1 chr5 chr7 chr15 chr15 chr15 set of 23 passenger genes. Similarly, the region on chromosome 7 P = B shared by the African samples is highly enriched ( 3.6E-6) for Spain Arenal genes involved in olfactory transduction and encodes at least 15 Spain RocadelVallès Germany Hamm A olfactory receptors among at least 62 genes situated on this tract. Germany Hamm B Germany Hamm C Evidently, large tracts have become polymorphic for introgressed Germany Hamm D and native repertoires of olfactory receptor alleles. Germany Hamm E Germany Hamm F Germany Germany Remderoda Discussion Greece Korinthos Greece Lagunas The biological significance of hybridization and introgression in Italy Menconico Italy SanGirogio the evolution of new traits in natural eukaryotic populations has Italy Cassino ignited much research into these two processes (17). Intro- Italy Milazzo Tunisia Monastir A gressed genetic material can be neutral and go unnoticed in Tunisia Monastir B Morocco terms of phenotypes but can also be adaptive and affect phe- Algeria notypes (10, 18). Notably, these processes have played a crucial

12 role in the of plants and and appear to be 10 8 common in natural populations of plants (17). Additionally, the CLR 6

0 2 4 6 8 10 12 score 4 importance of introgression has become a central discussion XP 2 Normalized 0 172 175 67 71 101 108 22 26 19 22 66 69 point when reconstructing the evolution of , including chr1 chr6 chr7 chr10 chr12 chr18 humans (19). Further, it has now become clear that the genomes of model organisms before their adoption as laboratory models C Spain Arenal by humans have been shaped by hybridization and introgression Spain RocadelVallès Germany Hamm A in their natural ancestral populations, such as in mice and Germany Hamm B Germany Hamm C macaques (9, 20, 21). Such influx of genetic variation of inter- Germany Hamm D Germany Hamm E subspecific or interspecific origins is expected to continue, as Germany Hamm F Germany wild-derived strains of mice will contribute to the Collaborative Germany Remderoda Cross in laboratory mice (22), and research centers Greece Korinthos Greece Lagunas continue to rely on imports of macaques from Asia. Italy Menconico Italy SanGirogio Large-scale efforts have been made to decode the genetic Italy Cassino Italy Milazzo background of most commonly used laboratory mouse strains, Tunisia Monastir A M. m. domesticus Tunisia Monastir B including inbred and wild-derived strains of , Morocco and of other subspecies of the laboratory mouse, including Algeria M. m. musculus and M. m. castaneus (2). Among the numerous 12 insights of the evolutionary genomic analyses of the laboratory 10 8 CLR 6 mouse and its wild relatives were that intersubspecific in-

0 2 4 6 8 10 12 score 4

XP 2 trogression between strains has been common (2). In addition to Normalized 0 57 64 82 89 understanding the ancestry and mosaic structure of laboratory chr3 chr7 mouse genomes, detecting introgression is also of biomedical Fig. 4. Three different introgression patterns across the 20 M. m. domesticus significance. In a recent study (10), the authors discovered in samples. (A) Introgressed regions that are exclusive to the Germany– a mouse model resistance to the commonly used anticoagulant Hamm and Spain samples. (B) Introgressed regions that are shared across warfarin (23) through the acquisition of a mutated version of the samples. (C) Introgressed regions that are exclusive to African sam- a key enzyme of the vitamin K , Vkorc1, that is targeted by ples. For each sample, scans from both haploid chromosomes are shown. warfarin. Whereas previous genome-wide studies in mice focused A posterior decoding cutoff of 95% was used to declare a site intro- on polymorphism and introgression within the M. musculus group gressed (see main text for more details). The red squares on the x axis of (2, 5), we focused here on introgression involving genomic ma- the top part of each panel denotes the locations of genes in introgressed terial of M. spretus and the genomes of several M. m. domesticus regions (given the scale, the squares appear overlapping, but the genes are not overlapping). The location of Vkorc1 on chromosome 7 is in- samples from the regions of sympatry and allopatry in Africa and dicated with a dashed vertical line in A. The bottom part of each panel Europe. These analyses are now enabled by our recently de- shows selective sweep statistics, which are normalized XP-CLR scores (14) veloped method for statistical inference of introgression in the based on a comparison of rodenticide-resistant to rodenticide-susceptible presence of other evolutionary events, most notably incomplete wild M. m. domesticus samples. Scale of x axis is in megabases. lineage sorting (12).

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1406298111 Liu et al. Downloaded by guest on September 29, 2021 finite-site models in a statistical inference framework, which helps account for convergence at the nucleotide level. Currently, methods that automatically distinguish introgression from bal- ancing selection do not, however, exist. For example, recent studies of adaptive introgression(26,27)focusedsolelyonin- trogression. Two studies have recently reported on signatures of balancing selection in house mouse genomes (28, 29). However, each of the studies reported on a single, very short region that was shared across all of the samples, including from the various subspecies considered. Our current analysis of mul- Fig. 5. Venn diagram of the three sets of genes in introgressed regions in tiple M. m. domesticus from the ranges of sympatry and allopatry Groups I, II, and III of samples. Introgression was called based on a posterior decoding probability cutoff of 95%. For each circle in the Venn diagram, two in Africa and Europe mostly point to a very polymorphic signal quantities are shown: (Top) the number of genes and (Bottom) the per- of introgression across these samples, which, consequently, centage of all introgressed genes found in our study. decreases the likelihood that ancestral polymorphism with balancing selection acting on it could explain the patterns we see in the data. We also scanned genomes of samples from In terms of the debate surrounding the importance of in- M. m. musculus, which is a sister subspecies of M. m. domesticus trogression in evolution, an important result of our (SI Appendix,Fig.S23). As the figure shows, no introgression genome-wide study is that it is not only a recent strong and was detected in the M. m. musculus sample (a similar result to potentially human-driven selection (warfarin rodenticide and that shown in ref. 12), which further weakens the plausibility of Vkorc1) that has promoted introgression in natural populations balancing selection acting from the most recent common an- of mice. In fact, hybridization and introgression between cestor of M. musculus and M. spretus until the present day. M. spretus and M. m. domesticus appear to be natural pro- Furthermore, many of the introgressed regions we detect are cesses spanning at least several thousand years. Nevertheless, much longer than regions with balancing selection signatures even from a rather dense genome-wide survey such as ours it is reported in previous studies. For example, in humans, DeGiorgio difficult to discern how frequently introgression occurs. This is et al. (30) reported that the maximum contiguous genomic region because most hybridization does not lead to introgression, as with balancing selection signal was of length ∼40 kb (near the drift and selection tend to remove introgressed regions. How- FANK1 gene) and which “surprised” them because it was “ab- EVOLUTION ever, here we infer at least three hybridization events, one in the normally large for balancing selection” (ref. 29, p. 11). Intro- distant past and two of more recent timing (including the in- gressed tract lengths on the order of megabases would require trogression of Vkorc1). We suspect that this is an underestimate much stronger balancing selection than have been previously of the frequency of hybridization and introgression between reported in the literature. Still, for some of the shared intro- M. spretus and M. m. domesticus in the wild because the species gressed regions, we cannot rule out the possibility of balancing have established secondary contact a few thousand years ago selection, as an alternative to introgression. Whereas our method when house mice reached the Mediterranean Basin on their for detecting introgression can be confounded by balancing se- westward spread into northern Africa and Europe. lection, a similar issue might arise for methods that detect bal- In terms of a role of selection on introgressed tracts, our ancing selection. For example, DeGiorgio et al. noted recently genome-wide scan revealed informative patterns. First, in- that gene flow is very likely to confound their test for balancing trogression is limited to a few autosomes and absent from the X selection (30). New methods need to be developed to account for chromosome. This is consistent with a strong role of purifying the two processes simultaneously, as detecting balancing selection selection and drift in the removal of introgressed material. We could shed light on the evolution of species (31). find it noteworthy that tracts are very frequently found in the Adaptive introgressive hybridization may be an important homozygous state, which indicates that introgressed variants can source for novel functional genetic variants, and combinations be recessive as well as dominant. The sharing of tracts across thereof, that encode novel traits upon which selection could act. samples is consistent with positive selection on introgressed ma- Here, one objective of the study is to discern whether multiple terial (adaptive introgression). Such tract sharing is observed over introgressed genes encode the previously reported warfarin re- long time scales, such as for chromosome 1, as well as over shorter sistance trait. Our functional annotation of the introgressed tract time scales and locally, as is suggested by tract sharing by subsets did not reveal any discernible enrichment for genes that clearly of samples from presumably local populations, such as Hamm in modulate warfarin resistance. Moreover, we did not observe Germany or African samples. Finally, it is common to infer that apatternwhereallmicethatcarrytheVkorc1 introgression share introgressed regions are adaptive if these are found outside the introgressed tracks of comparable length. We therefore conclude area of sympatry. As we observed numerous introgressed tracts, that, until further samples are analyzed and tracts are annotated in both presumably old and young, as judged by their tract lengths, it terms of function more comprehensively, the adaptive introgressive is reasonable to assume that selection might have favored the hybridization leading to warfarin resistance in house mice requires spread of some of these variants into the allopatric range of house the introgression of only Vkorc1. Whether the introgressed Vkorc1 mice. It is important to note here that more sampling from the two interacts with the native genes in pathways resulting in warfarin species and, potentially, subspecies of M. musculus,wouldbe resistance cannot be deduced from our analysis at this time. necessary to determine, with more certainty, the directionality of We observed that the raw material for a complex trait known to the introgression between the two species’ genomes. be important to rodent life history could be introgressed. We noticed Selection could have confounding effects on our analyses and the large numbers of olfactory receptor genes for which now poly- inferences. Specifically, it is important to note that various evo- morphic and divergent copies segregate in the populations of lutionary events could give rise to genomic patterns and signals house mice (32). It is known that both minor nucleotide differ- that resemble those created by introgression, including ILS, ences, as well as larger scale differences in the olfactory receptor convergence, and ancestral polymorphism coupled with balanc- repertoire, have measurable phenotypic consequences in mice ing selection (24). Indeed, all of these processes could confound (32). Thus, we hypothesize that wild mice from Europe that have the detection of introgression (25). The introgression detection experienced gene flow, such as seen for chromosome 1 which is method (12) that we use here accounts for ILS (hence, ancestral enriched for olfactory receptor clusters, may indeed be the sub- polymorphism that is not under balancing selection) and employs ject of natural selection.

Liu et al. PNAS Early Edition | 5of6 Downloaded by guest on September 29, 2021 Materials and Methods The method was run using its default settings. See the SI Appendix for the set Our study used two M. spretus samples and twenty M. m. domesticus sam- of samples used in these scans. ples from the ranges of sympatry and allopatry that were either newly sampled or from previous publications. The work relied on tissue sharing and ACKNOWLEDGMENTS. We thank Yun Yu for help with the PhyloNet-HMM was exempted by Rice University’s institutional review board. For details on software and Stefan Endepols for sharing mouse tissues. The work was partially supported by R01-HL091007-01A1 (to M.H.K.) from the National the samples, compiling the sequence data, genotyping and phasing it, as well Institutes of Health, National Heart, Lung and Blood Institute, and by startup as single nucleotide polymorphism (SNP) calling, see the SI Appendix.We funds from Rice University (to M.H.K.). L.N. was supported in part by used PhyloNet-HMM (12) to scan M. musculus genomes for segments with National Science Foundation Grants DBI-1062463 and CCF-1302179 and introgressed origin from M. spretus. For every haploid M. m. domesticus Grant R01LM009494 from the National Library of Medicine (NLM). K.J.L. genome, we analyzed it along with the M. spretus genomes and detected was partially supported by a training fellowship from the Keck Center of the introgression. See the SI Appendix for full details on how the analysis was Gulf Coast Consortia, on the NLM Training Program in Biomedical In- done. We used XP-CLR (14) version 1.0 to scan for selective sweep patterns. formatics, NLM T15LM007093.

1. Guénet J-L, Bonhomme F (2003) Wild mice: An ever-increasing contribution to a 17. Arnold ML (2004) Transfer and origin of adaptations through natural hybridization: popular mammalian model. Trends Genet 19(1):24–31. Were Anderson and Stebbins right? Plant Cell 16(3):562–570. 2. Yang H, Bell TA, Churchill GA, Pardo-Manuel de Villena F (2007) On the subspecific 18. Schmidt LH, Fradkin R, Harrison J, Rossan RN (1977) Differences in the virulence of origin of the laboratory mouse. Nat Genet 39(9):1100–1107. Plasmodium knowlesi for Macaca irus (fascicularis) of Philippine and Malayan origins. 3. Bonhomme F, Martin S, Thaler L (1978) Hybridation en laboratoire de Mus musculus L. Am J Trop Med Hyg 26(4):612–622. et Mus spretus Lataste. Experientia 34(9):1140–1141. 19. Arnold ML, Meyer A (2006) Natural hybridization in primates: One evolutionary 4. Palomoa L, Justob E, Vargasa J (2009) Mus spretus (Rodentia: ). Mamm Spe- mechanism. Zoology (Jena) 109(4):261–276. cies 840:1–10. 20. Stevison LS, Kohn MH (2008) Determining genetic background in captive stocks of – 5. Yang H, et al. (2011) Subspecific origin and haplotype diversity in the laboratory cynomolgus macaques (Macaca fascicularis). J Med Primatol 37(6):311 317. mouse. Nat Genet 43(7):648–655. 21. Osada N, et al. (2010) Ancient genome-wide admixture extends beyond the current – 6. Keane TM, et al. (2011) Mouse genomic variation and its effect on phenotypes and hybrid zone between Macaca fascicularis and M. mulatta. Mol Ecol 19(14):2884 2895. gene regulation. Nature 477(7364):289–294. 22. Churchill GA, et al. (2004) The Collaborative Cross, a community resource for the – 7. Staubach F, et al. (2012) Genome patterns of selection and introgression of hap- genetic analysis of complex traits. Nat Genet 36(11):1133 1137. 23. Scully M (2002) Warfarin therapy. The Biochemist 24:15–17. lotypes in natural populations of the house mouse (Mus musculus). PLoS Genet 8(8): 24. Hedrick PW (2013) Adaptive introgression in animals: Examples and comparison to e1002891. new mutation and standing variation as sources of adaptive variation. Mol Ecol 8. Teeter KC, et al. (2008) Genome-wide patterns of gene flow across a house mouse 22(18):4606–4618. hybrid zone. Genome Res 18(1):67–76. 25. Nakhleh L (2013) Computational approaches to species phylogeny inference and gene 9. Orth A, et al. (2002) [Natural hybridization between 2 sympatric species of mice, Mus tree reconciliation. Trends Ecol Evol 28(12):719–728. musculus domesticus L. and Mus spretus Lataste]. C R Biol 325(2):89–97. 26. Green RE, et al. (2010) A draft sequence of the Neandertal genome. Science 328(5979): 10. Song Y, et al. (2011) Adaptive introgression of anticoagulant rodent poison resistance 710–722. by hybridization between old world mice. Curr Biol 21(15):1296–1301. 27. Heliconius Genome Consortium (2012) Butterfly genome reveals promiscuous ex- 11. Pelz H-J, et al. (2012) Distribution and frequency of Vkorc1 sequence variants con- change of mimicry adaptations among species. Nature 487(7405):94–98. – ferring resistance to anticoagulants in Mus musculus. Manag Sci 68(2):254 259. 28. Ferguson W, Dvora S, Gallo J, Orth A, Boissinot S (2008) Long-term balancing selection 12. Liu KJ, et al. (2014) An HMM-based comparative genomic framework for detecting at the West Nile virus resistance gene, Oas1b, maintains transspecific polymorphisms introgression in eukaryotes. PLOS Comput Biol 10(6):e1003649. in the house mouse. Mol Biol Evol 25(8):1609–1618. 13. Hammer MF, Schimenti J, Silver LM (1989) Evolution of mouse chromosome 17 and 29. Linnenbrink M, et al. (2011) Long-term balancing selection at the blood group-related the origin of inversions associated with t haplotypes. Proc Natl Acad Sci USA 86(9): gene B4galnt2 in the genus Mus (Rodentia; Muridae). Mol Biol Evol 28(11):2999–3003. 3261–3265. 30. DeGiorgio M, Lohmueller KE, Nielsen R (2014) A model-based approach for identi- 14. Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective fying signatures of ancient balancing selection in genetic data. PLoS Genet 10(8): sweeps. Genome Res 20(3):393–402. e1004561. 15. Auffray J-C, Britton-Davidian J (1992) When did the house mouse colonize Europe? 31. Leffler EM, et al. (2013) Multiple instances of ancient balancing selection shared Biol J Linn Soc Lond 45(2):187–190. between humans and chimpanzees. Science 339(6127):1578–1582. 16. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and 32. Young JM, Trask BJ (2002) The sense of smell: of vertebrate odorant re- powerful approach to multiple testing. J R Stat Soc, B 57(1):289–300. ceptors. Hum Mol Genet 11(10):1153–1160.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1406298111 Liu et al. Downloaded by guest on September 29, 2021