Allopatric genetic origins for sympatric host-plant shifts and race formation in Rhagoletis

Jeffrey L. Feder†‡, Stewart H. Berlocher§, Joseph B. Roethele†, Hattie Dambroski†, James J. Smith¶, William L. Perryʈ, Vesna Gavrilovic¶, Kenneth E. Filchak†, Juan Rull††, and Martin Aluja††

†Department of Biological Sciences, P.O. Box 369, Galvin Life Science Center, University of Notre Dame, Notre Dame, IN 46556-0369; §Department of Entomology, University of Illinois, 320 Morrill Hall, 505 South Goodwin Avenue, Urbana, IL 61801; ¶Department of Zoology and Lyman Briggs School, W31 East Holmes Hall, Michigan State University, East Lansing, MI 48824; ʈDepartment of Biological Sciences, Illinois State University, Campus Box 4120, Normal, IL 61790-4120; and ††Instituto de Ecologı´a,Asociacion Civil, Km 2.5 Antigua Carretera a Coatepec No. 351, 91070 Xalapa, Veracruz, Mexico

Edited by David B. Wake, University of California, Berkeley, CA, and approved July 2, 2003 (received for review February 8, 2003) Tephritid fruit flies belonging to the Rhagoletis pomonella sibling rearrangements may contain beneficial, coadapted suites of species complex are controversial because they have been pro- genes (8). posed to diverge in (in the absence of geographic The inversions have been associated with the depth of the isolation) by shifting and adapting to new host plants. Here, we facultative overwintering pupal diapause (9–12). Multiple re- report evidence suggesting a surprising source of genetic variation gression of genotypes for the six allozymes demarcating the contributing to sympatric host shifts for these flies. From DNA inversions explained from 15% to Ͼ40% of the variation in adult sequence data for three nuclear loci and mtDNA, we infer that an eclosion time (13). Diapause variation is important because it ancestral, hawthorn-infesting R. pomonella population became differentially adapts apple and haw flies to a seasonal difference geographically subdivided into Mexican and North American iso- in the fruiting times of their hosts (9–12). In particular, the 3- to lates Ϸ1.57 million years ago. Episodes of gene flow from Mexico 4-week earlier phenology of apple selects for a deeper pupal subsequently infused the North American population with inver- diapause (greater recalcitrance to nondiapause development) in sion polymorphism affecting key diapause traits, forming adaptive apple flies, allowing them to better withstand longer periods of clines. Sometime later (perhaps ؎1 million years), diapause varia- warm weather before winter. A series of rearing experiments in tion in the latitudinal clines appears to have aided North American which conditions were manipulated to simulate the seasonal flies in adapting to a variety of plants with differing fruiting times, difference between apple and haw fruiting times supported the helping to spawn several new taxa. Thus, important raw genetic ‘‘diapause hypothesis’’ by inducing genetic responses in allozyme material facilitating the adaptive radiation of R. pomonella origi- frequencies in predicted directions (9–12). Life-history adapta- nated in a different time and place than the proximate ecological tion to apple was therefore due, in part, to inversion frequency host shifts triggering sympatric divergence. changes in apple flies that generated ecologically based pre- and postmating reproductive isolation from haw flies. he apple maggot, Rhagoletis pomonella, reflects a rift in Host plant-related differentiation in R. pomonella is superim- Tspeciation theory. Mayr (1) has championed the idea that posed on a broader pattern of geographic variation (14–17). animal divergence requires the complete geographic separation (Note: the distribution of the apple race is limited to the of populations. Hence, the genetic origins of species arise in northeastern and midwestern United States and Canada, allopatric demes. In contrast, Walsh (2) proposed that certain whereas the range of haw flies extends into Mexico; Fig. 1.) The host-specific phytophagous insects could speciate in sympatry recent shift of R. pomonella from hawthorn to the earlier-fruiting when they shift and adapt to new plants. Walsh (3) cited the apple exposed apple flies to selection pressures similar to those ancestral hawthorn (Crataegus spp.) and historically derived experienced by haw-infesting flies in the South (i.e., warmer apple (Malus pumila)-infesting forms of R. pomonella as an temperature conditions for longed periods). Mean growing example of sympatric in action. Bush (4) later hy- degree days until eclosion, a surrogate for diapause depth (18), pothesized that the entire complex of six or more sibling species increase with decreasing latitude for haw-fly populations (Fig. comprising the R. pomonella group, to which the apple maggot 2). Concomitant with this phenotypic trend, allozyme alleles belongs, arose by means of sympatric host shifts. Under the within the inversions that are common in the apple race at sympatric model, the genetic basis for adaptation to a novel host sympatric northern sites are essentially fixed in southern U.S. would be expected to trace to mutations occurring within the haw populations (14–17). Thus, the sympatrically derived apple ancestral geographic range. Here, we provide evidence for a race was extracted, in part, from preexisting adaptive inversion partial reconciliation of the sympatric and allopatric views by clines in the haw race. showing that inversion polymorphism forming latitudinal clines To understand the history of host-related diapause adaptation within R. pomonella and contributing to sympatric host race in R. pomonella, we sequenced three anonymous cDNA loci and formation may have had much earlier geographic roots. mtDNA from flies sampled across the United States and a Genetic studies (5, 6) have shown that apple and haw flies disjunct haw-infesting population in the Altiplano of Mexico represent partially reproductively isolated ‘‘host races,’’ the (Fig. 1). The three cDNA loci P220, P2956, and P7 were chosen hypothesized first stage of . The host races because they map to separate inversions on chromosomes 1, 2, display significant allele frequency, but not fixed, genetic dif- and 3, respectively (7). Our rationale was that low levels of ferences for six allozymes mapping to three different regions of

the genome (7, 8). (Frequency differences average from 0.10 to This paper was submitted directly (Track II) to the PNAS office. 0.20 at sympatric sites, but are often as high as 0.3.) Inversions Abbreviations: Mya, million years ago; MP, maximum parsimony; ML, maximum likelihood; subsume each of these three genomic regions on chromosomes LR, likelihood ratio. ϭ 1–3 (8) (haploid n 6 for Rhagoletis). Markers within the Data deposition: The sequences reported in this paper have been deposited in the GenBank inversions show patterns of strong gametic disequilibrium (5, 8), database [accession nos. AY152494–AY152510 (P220), AY152511–AY152526 (P2956), implying little or no recombination between inverted regions. AY152527–AY152542 (P7), and AY152477–AY152493 (mtDNA)]. The pattern of disequilibrium also suggests that alternative ‡To whom correspondence should be addressed. E-mail: [email protected].

10314–10319 ͉ PNAS ͉ September 2, 2003 ͉ vol. 100 ͉ no. 18 www.pnas.org͞cgi͞doi͞10.1073͞pnas.1730757100 Downloaded by guest on September 23, 2021 in the 5Ј and 3Ј directions on an automated laser fluorescence sequencer (Amersham Pharmacia).

Gene Tree Construction. Maximum parsimony (MP) and maximum likelihood (ML) gene trees were constructed by using PAUP*b8 (19). For the MP analysis, deletions were treated as a fifth base pair, with indels of identical length and position recoded to count as single mutational steps. Rhagoletis electromorpha, belonging to the sister species group (Rhagoletis tabellaria)toR. pomonella (4) and the more distantly related Rhagoletis cingulata and Rhagoletis suavis (4), were used to root trees. The molecular clock was tested for each locus for R. pomonella and R. electromorpha sequences by comparing log likelihood scores enforcing-versus- Fig. 1. Range of R. pomonella based on Bush (4) and Berlocher (17). The fly relaxing the clock hypothesis for the best supported DNA is also patchily distributed in the western United States, but these populations substitution model identified by using MODELTEST (20). Intra- may be recent introductions and are not shown. Numbers indicate sampling genic recombination was tested by using the method of Hudson sites in the study. (See Fig. 3 legend for site descriptions.) and Kaplan (21).

Estimating Divergence Dates. ML estimates for the time of diver- recombination within the inversions would let us resolve intact gence (tdiv) of the U.S. and disjunct Mexican Highland popula- nuclear haplotypes. Comparisons of gene trees (genealogies) and ͞ tions and the effective size (Neanc) of the ancestral U.S. Mexican coalescent events for these haplotypes among the cDNA loci and population were simultaneously determined from our sequence mtDNA would permit historical inferences to be made concern- data in a manner similar to that described in Takahata et al. (22) ing the sympatric speciation process. and Yang (23). Rate variation among loci was accounted for by calibrating the three nuclear genes relative to mtDNA under the Materials and Methods assumption of an insect mtDNA clock (1.15 ϫ 10Ϫ8 substitutions Fly Populations. Taxa, host plants, collecting locations, and sam- per bp͞year) (24). Below we present two major competing pling dates for flies are given in Figs. 1 and 3. Flies were collected hypotheses, lineage sorting and introgression, potentially respon- as larvae in infested fruit and either (i) dissected from the fruit sible for the topologies of gene trees. These models have and frozen for later genetic analysis, or (ii) reared to adulthood different ramifications for the estimation of tdiv and Neanc. Under in the laboratory. lineage sorting, the ML estimate for U.S.͞Mexican subdivision is based on the depths of the nodal partitions between SN and DNA Sequencing. Sequence data were generated for three cDNA M haplotypes among cDNA loci (see Fig. 3 and below for ͞ loci (P220, P2956, and P7) isolated from a R. pomonella EST haplotype designations). In contrast, U.S. Mexican divergence Ј time under introgression is related to the nodal partitions library (7) and for mtDNA (3 portion of COI, leucine tRNA, ͞ and COII). The three cDNA loci segregated as single-copy genes between N and SN M nuclear haplotypes. ML estimates for in crosses (7, 8). Genomic DNAs were PCR-amplified 35 cycles gene-flow events from Mexico into the United States were derived separately for each locus (inversion) based on the nodal (94°C, 30 sec; 52°C, 1 min; 72°C, 1.5 min) by using locus-specific depth of M and SN haplotypes. primers (7). Products were TA-cloned into pCR II vectors (Invitrogen). Four to six clones per locus per fly were sequenced Evaluating Lineage Sorting vs. Introgression Hypotheses. A likeli- hood ratio (LR) approach (25) was used to assess the lineage sorting vs. introgression hypotheses as causal explanations for the observed topologies of gene trees. LR tests measure the relative tenability of two competing hypotheses by comparing the ratio of their likelihoods (25). When ML1 specifies the maximum likelihood of the observed data under one hypothesis and ML2 under an alternative model, LR ϭ ML1͞ML2 (25). For non- nested hypotheses, such as lineage sorting and introgression, it has been suggested that the safest manner to assess the proba- bility distribution for a LR test statistic is by Monte Carlo computer simulation (25). Replicate data sets are simulated under the assumptions of one hypothesis, which in our case is the null lineage sorting model. For each simulated data set, a LR is calculated anew for the competing hypotheses and the propor- tion of times that the observed LR value for the actual data exceeds the simulated values is taken as the significance level. The rejection level typically is set at 5% (25). The different causes ascribed by lineage sorting and intro- EVOLUTION gression for nodal partitions on gene trees (Fig. 4) make them amenable to a LR test. The ML estimate for the null lineage- ϫ sorting hypotheses (LR numerator) was found by permitting Fig. 2. Mean growing degree days (cumulative days rearing temperature N͞SN nodes to have different origin times among loci (reflecting base Celsius) until adult eclosion vs. latitude for haw-fly populations from the the independent origins of inversions on chromosomes 1–3inthe United States and Mexico. Flies were reared for one generation in apples ͞ under standard laboratory conditions at 21°C, except for the Tancitaro, Mex- common ancestor), whereas constraining the date for SN M ico, and Urbana, IL, samples, which were reared from the field in haws at 26°C nuclear and US͞M mtDNA nodes to be the same (reflecting the and 24°C, respectively. Numbers in parentheses are sample sizes of eclosing common and more recent divergence of these haplotypes result- adults. ing from the geographic isolation of Mexican Highland flies). In

Feder et al. PNAS ͉ September 2, 2003 ͉ vol. 100 ͉ no. 18 ͉ 10315 Downloaded by guest on September 23, 2021 ϫ 5 contrast, the ML estimate for the alternate introgression model each for computer simulations varying Neanc from 5 10 to (LR denominator) was derived holding the time of origin for N 1 ϫ 104). and SN͞M nuclear and US͞M mtDNA nodes the same across loci (ϭ common biogeographic split of U.S.͞Mexican ancestor), Hypotheses for the Gene Trees. Lineage sorting of ancestral poly- while allowing the times for the SN͞M nodes to vary (ϭ morphism and introgression are the two most likely explanations independent gene-flow events from Mexico for each inversion). for the gene tree topologies. Under lineage sorting, the deep ML estimates for gene trees and parametric bootstrapping nodal concordance observed for P220, P2956, and P7 was caused were performed in a fashion similar to that described in Dopman by the chance origin of inversions on chromosomes 1–3ata et al. (26), except for the following instances. (i) ML estimates for similar time in the past in the ancestral U.S.͞Mexican haw the lineage sorting and introgression hypotheses were based on population (step 1a in Fig. 4A). The inversions then formed the product of likelihood scores across nuclear and mtDNA primary clines (step 2a). Some time later, as the Mexican deme trees. (ii) To streamline the analysis, we used the mean number became isolated, it captured progenitor M haplotypes (inver- of substitutions separating whole classes of R. pomonella hap- sions) from the southern edge of the ancestral range (step 3a). lotypes (e.g., N vs. SN alleles), rather than including all of the The U.S. deme retained the ancestral polymorphism, resulting in sequence data for a locus. (iii) Probabilities for observing a given SN alleles from the United States being more closely related to number of substitutions along a tree branch of a given length (in Mexican genes. Lineage sorting therefore attributes the nodal absolute time) were based on a binomial distribution by using partitions between SN and M nuclear (yellow circles in Fig. 3 rate constants for loci calibrated against an insect mtDNA clock A–C) and U.S. and M mtDNA haplotypes to the separation of (1.15 ϫ 10Ϫ8 substitutions per bp͞year). (iv) The null distribution U.S. and Mexican flies, with the difference in their depths caused for the LR statistic for lineage sorting was determined by by coalescent time variation in the common ancestor. simulating 10,000 trees based on ML estimates of 1.11 million In contrast, the introgression model posits that past from Mexico into the United States is responsible for the years ago (Mya) for tdiv and 37,200 for Neanc under lineage sorting, and randomly chosen origin times for the inversions. similarity of SN and M haplotypes. Under introgression, U.S. (v) Mutations were arrayed on simulated trees according to a and Mexican demes became isolated first (step 1b in Fig. 4B), time-homogeneous Poisson process with mutations rates for the promoting genetic divergence and the fixation of alternate cDNA loci calibrated as above. inversions in allopatry (step 2b). Note that phylogeography does not require contemporaneous origins for the inversions, Results and Discussion as haplotypes would accumulate differences in concert from Nuclear and mtDNA Gene Trees. MP gene trees for P220, P2956, P7, the time of isolation. Secondary gene flow from Mexico then and mtDNA are given in Fig. 3. MP and ML gene trees were delivered already-pronounced genetic variation for diapause traits in the guise of inversions (step 3b). The inversions were virtually identical, so only the MP results are presented. None of favored by selection, spreading to form adaptive clines in the the sequenced genes deviated significantly from a molecular U.S. haw population (step 4b). Sometime later (Ϯ1 million clock. [P levels for P220, P2956, P7, and mtDNA and the best-fit years), the variation in the latitudinal clines helped facilitate substitution models were 0.87 (Hasegawa–Kishino–Yano or the historical (Ͻ150 years ago) shift from haw to apple (step HKY model, 18 df), 0.47 (HKYϩG, 18 df), 0.69 (HKY, 20 df), 6b), the proximate basis for sympatric race formation. The and 0.35 (HKY, 27 df).] We found no evidence for any recom- introgression hypothesis therefore attributes variation in the bination in the three nuclear loci or mtDNA. Hence, the gene depth of SN and M nuclear nodes to a series of gene-flow trees can be interpreted as genealogies depicting the evolution- events from Mexico occurring between Ϸ0.8 and 1.4 Mya. The ary relationships of haplotypes. more basal concordance of cDNA and mtDNA loci (large filled circles in Fig. 3 A–C) reflects the earlier geographic Gene Tree Topologies. A shared feature of all three nuclear gene separation of U.S. and Mexican demes Ϸ1.57 Mya (ϭ ML trees was the presence of two major haplotype classes segregat- estimate for tdiv under the introgression model, with the ing in U.S. apple- and haw-fly populations (see red and blue tree ϭ corresponding value for Neanc 2). branches in Figs. 3 A–C). One of the two U.S. haplotypes (designated ‘‘SN’’ for ‘‘South and North’’ in red) was more Testing the Competing Hypotheses. Because lineage sorting and closely related to sequences from Mexican Highland haw flies introgression attribute different causes for nodal partitions on (green ‘‘M’’ for Mexico) than the alternate haplotype class in the gene trees (Fig. 4), the two hypotheses could be evaluated by same U.S. host population (blue ‘‘N’’ for ‘‘North’’). The SN using a LR approach (Materials and Methods). Computer sim- haplotypes demarcated the subset of inversions generally found ulations supported the introgression model. The probability was in higher frequency in apple than haw flies at sympatric U.S. P Յ 0.006 of generating gene trees (10,000 replicates) under the sites. This finding was evidenced by the strong gametic disequi- assumptions of the null lineage sorting model as consistent with librium of SN genes with corresponding allozyme alleles dis- the alternative introgression model as our actual data. A variant playing host-related differentiation (r between P220 and NADH introgression hypothesis of gene flow in the reverse direction ϭ Ͻ Ϫ3 ϭ diaphorase-2 gene 0.45, P 10 , n 96 chromosomes; from the United States into Mexico is unlikely. This model has P2956͞aconitase-2 gene ϭ 0.61, P Ͻ 5 ϫ 10Ϫ4, n ϭ 46; Ϫ problems similar to lineage sorting, requiring the contempora- P7͞hydroxyacid dehydrogenase gene ϭ 1.0, P Ͻ 10 6, n ϭ 96). neous origin of inversions in the U.S. deme soon after population Geographic variation for the three nuclear cDNA loci also subdivision. Moreover, if gene flow had occurred from the mirrored the clines observed for linked allozymes. N haplotypes United States, then it erased all traces of prior nuclear differ- were present only in R. pomonella populations from Michigan entiation in Mexico. In contrast, the signature of gene flow from and New York, whereas SN haplotypes varied in frequency Mexico is evident in the extant U.S. inversion clines; SN alleles, across the United States (Fig. 3). Most interestingly, all three the descendents of introgressed genes from Mexico, are found at nuclear loci showed a deep and concordant partition distinguish- highest frequency in the southern United States (13–16). ing N from SN and M haplotypes (compare large filled circles in Gene duplication and loss is another conceivable explanation Fig. 3 A–C). mtDNA had a similar partition, but strictly distin- for the data. Under this hypothesis, SN and N haplotypes guished all U.S. from Mexican flies (Fig. 3D). We estimated that represent paralogous loci, with one or the other of the genes the probability for such congruence occurring by chance among having become deleted from different chromosomes. When the nuclear and mtDNA gene trees was Յ0.025 (10,000 replicates Mexican isolate formed, it captured only ‘‘N’’ deletion chromo-

10316 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.1730757100 Feder et al. Downloaded by guest on September 23, 2021 Fig. 3. Maximum parsimony gene trees for P220 (A), P2956 (B), P7 (C), and mtDNA (D). The position of the basal node (p͞e ϭ small filled circles) between the R. pomonella species group and its sister , R. electromorpha (R.e.), was set by using R. cingulata and R. suavis as outgroups (two to nine sequences each per locus). Trees are scaled so that the longest distance from an allele to the p͞e node is relatively the same across loci. Bootstrap support for nodes (10,000 replicates), sequence length (number of base pairs), branch lengths (number of steps), and consistency indexes (c.i.) are given. N, North haplotype; SN, South͞North haplotype; M, Mexican haplotype; large filled circles, N and SN͞M nuclear and US͞M mtDNA nodes; yellow circles, SN͞M nodes. Taxa, hosts, location, and collecting dates (mo͞d͞yr) are as follows: 1, MI ϭ R. pomonella [apple (Malus pumila), Grant, MI, 8͞15͞95]; 2, NY ϭ R. pomonella [haw (Crataegus mollis), Geneva, NY, 9͞16͞00]; 3, TX ϭ R. pomonella [haw (C. mollis), Waxahachie, TX, 10͞6͞89]; 4, GA ϭ R. pomonella [haw (Crataegus flava), Lizella, GA, 9͞13͞89]; 5, MXa ϭ R. pomonella n.s. [haw (Crataegus mexicana), U.S. Department of Agriculture intercept, Mexico, 1989]; 6, MXb ϭ R. pomonella n.s. [haw (C. mexicana), U.S. Department of Agriculture intercept from Mexico City, Mexico, 10͞26͞00]; 7, MXc ϭ R. pomonella n.s. [haw (C. mexicana), Mexico City, Mexico, 10͞9͞00]; 8, R.m. ϭ Rhagoletis mendax [blueberry (Vaccinium corymbosum), Sawyer, MI, 8͞1͞99]; 9, R.z. ϭ Rhagoletis EVOLUTION zephyria [snowberry (Symphoricarpos albus), Dixie, WA, 8͞31͞97]; 10, R.f. ϭ undescribed Rhagoletis species infesting, flowering dogwood fly[(Cornus florida), South Bend, IN, 10͞2͞99]; 11, R.e. ϭ R. electromorpha [gray dogwood (Cornus racemosa), Dowagiac, MI, 9͞12͞99]; 12, R. cingulata [black cherry (Prunus serotina), Hart, MI, 7͞15͞89]; 13, R. suavis [black walnut (Juglans nigra), East Lansing, MI, 7͞25͞89]. Numbers in parentheses after taxon names in the figure are the number of similar haplotypes sequenced from site.

somes, accounting for the greater similarity of SN and M nuclear duplication events occurred just before the U.S.͞Mexican sub- haplotypes. The duplication hypothesis has a difficult time division. The duplication model also leaves unsolved how paralo- accounting for the deep nodal concordance of nuclear and gous loci became associated with different inversions. Finally, mtDNA gene trees, however, unless a series of genomewide genetic crosses have given no evidence for any null or duplicated

Feder et al. PNAS ͉ September 2, 2003 ͉ vol. 100 ͉ no. 18 ͉ 10317 Downloaded by guest on September 23, 2021 Fig. 4. Lineage sorting (A) and introgression hypotheses (B) accounting for R. pomonella haplotypes. Thin black lines define population boundaries and sequence of divergence events. Thick colored lines represent gene trees for alleles segregating at P220, P2956, or P7 within populations. Red line, SN haplotype; blue line, N haplotype; green line, M haplotype. The generally higher frequency of SN haplotypes in the apple race at sympatric northern sites is denoted by the thicker width of the red line for the apple population.

loci that would likely be observed if the duplication hypothesis older, neutral haplotypes with a series of more recent, selectively were true. driven mutations. Such selection should reduce haplotype vari- ation within the inversions, but no obvious trend exists for Introgression, Diapause, and the Apple Race. Our results imply that shallow coalescence times for nuclear haplotypes (Fig. 3). The inversions have introgressed in the past from Mexico into the balance of evidence therefore suggests that the inversions con- United States. We have separately observed that these inversions tained variation for diapause depth at their time of introduction are associated with diapause traits involved in the sympatric host from Mexico. shift of R. pomonella to apple (8). This observation suggests that gene flow introduced ready-made diapause variation in the form Divergence Time for Populations. Under the introgression hypoth- of inversion polymorphism that was acted on much later by esis, we estimate that Mexican and U.S. flies initially separated selection during sympatric host shifts. However, it does not rule Ϸ1.57 Mya. This date is consistent with the biogeography of out the possibility that the observed diapause variation accu- many temperate plants having disjunct eastern U.S.͞Mexican mulated because of mutations after introgression. distributions, including the Rhagoletis hosts Crataegus and Cor- Two considerations support the hypothesis that the inversions nus (29, 30). The timeframe and concordance of multiple taxa carried diapause variation at the time of their introgression. suggest a possible role for climatic and biotic changes associated First, the phenotypic, geographic, and genealogical pattern for with Pleistocene glaciation in the isolation of U.S. and Mexican haplotypes is consistent with the introgression hypothesis. The flies. In this regard, we have defined the extant U.S. and Mexican evolutionarily most closely related SN and M haplotypes are haw populations in terms to their current geography, but their associated with prolonged diapause in U.S. and Mexican pop- ranges have undoubtedly changed through time. To achieve full ulations for all three rearranged chromosomes (Fig. 1). One understanding of sympatric speciation in Rhagoletis will there- might expect a more equitable distribution of phenotypic effects fore paradoxically require elucidating the dynamic biogeo- across inversion haplotypes if diapause variation was all of recent graphic history of these flies and equating it with possible (post gene flow) origin. Second, theory predicts that the half-life secondary contact and gene-flow events between populations, as for the decay of disequilibrium between a neutral locus not has been done for other organisms (31). tightly linked to a gene under strong epistatic selection and an inversion is on the order of the reciprocal of the double-crossing Implications of Introgression for the R. pomonella Group. If true, over and gene conversion rate in heterokaryotypes (27, 28). introgression from Mexico could also have implications for the Thus, given (i) ML estimates for inversion introgression times radiation of the R. pomonella complex. Each sibling species in the from Ϸ0.8 to 1.4 Mya, (ii) that SN haplotypes are not the direct group infests a different, nonoverlapping set of host plants (4, targets of selection, and (iii) that gene flux rates between 17). Like the host races, these species differ in diapause traits inversions are on the order of 10Ϫ4 to 10Ϫ5 per meiosis, which differentially adapting them to variation in host fruiting time appear typical in Drosophila (27), current levels of disequilibrium (17). They also differ from each other in allozyme frequencies, for the introduced haplotypes should be well below 0.1. Conse- but to a much greater extent than the host races. Many of these quently, if SN haplotypes were not embedded within epistatic differences involve loci mapping to the same inverted regions of gene complexes at the time of introgression, then the elevated chromosomes 1–3 differentiating the host races (17). Indeed, R. levels of disequilibrium seen across the three rearranged chro- mendax, R. zephyria, and the undescribed flowering dogwood fly mosomal regions (8) are likely due to genetic hitchhiking of all possessed different combinations of N and SN haplotypes in

10318 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.1730757100 Feder et al. Downloaded by guest on September 23, 2021 common with apple and haw flies for P220, P2956, and P7 (Fig. shifts to novel plants with differing fruiting phenologies. We do 3 A–C), implying that they share inversion polymorphism. Thus, not mean to imply that all of the genes responsible for sympatric not only the host races, but more distinct host-related forms of host shifts arose in allopatry or are contained within inversions. R. pomonella appear to have evolved from prestanding genetic For example, premating isolation stemming from differences in variation (inversion clines) created by past adaptive introgres- host-plant preference also plays a critical role in R. pomonella sion from Mexico. However, in the case of the sibling species, we divergence (34), and could have a simpler allelic history than the cannot discount a possible role for recent gene flow contributing diapause-related inversions we have investigated here. Regard- to the shared polymorphism. less, before gene flow, no host shifting and no speciation appear to have occurred between Mexican and U.S. isolates (haws are Conclusions and were the common hosts for both populations). It was only Our findings suggest that allopatry and secondary introgression after gene flow and when diapause traits later became associated may sometimes act in conjunction with, rather than distinction with variation in host preference that R. pomonella apparently from, host shifts to facilitate sympatric speciation in R. radiated sympatrically to fill open plant niches. Questions re- pomonella. Walsh (3) and Bush (4) may therefore have been main concerning why mtDNA haplotypes did not introgress with correct in postulating that sympatric host shifts represent the nuclear alleles (perhaps gene flow is male-mediated) and the proximate ecological basis for R. pomonella divergence. But current taxonomic status of Mexican flies. Nevertheless, our Mayr (1) could also be right in that initial allopatry provided a results evoke a surprising ‘‘PostModern’’ synthesis of processes genetic architecture in the form of inversion polymorphism and personalities, adding to a growing literature (35–38) imply- conducive for sympatric speciation. Moreover, the view of both ing that the origins of animal species can be as dynamic and rich Anderson (32) and Stebbins (33) that hybridization is often a (reticulate) as those for plants. catalyst for the genesis of new ecotypes and species could also be germane to Rhagoletis. However, introgression did not immedi- We thank E. Dyreson, H. Hollocher, L. Knowles, D. Lodge, S. Lyons, W. ately precipitate the divergence of sympatric Rhagoletis popula- Maddison, B. McPheron, S. Muse, R. Nielsen, D. Posada, D. Schwarz, tions. Rather, a significant time lag occurred. The initial impact J. Thorne, U. Stolz, J. Wakeley, and X. Xie. This work was supported by was likely the favorable spread of the inversions, with the grants from the National Science Foundation (Integrated Research creation of secondary clines adapting the U.S. haw population to Challenges and Graduate Research Traineeship) and the U.S. Depart- latitudinal differences in the environment. This spread provided ment of Agriculture (to J.L.F. and S.H.B.) and by the State of Indiana a large genetic base for diapause that was later capitalized on for 21 Century Fund (to J.L.F.).

1. Mayr, E. (1963) Animal Species and Evolution (Belknap, Cambridge, MA). 19. Swofford, D. L. (2000) PAUP*: Phylogenetic Analysis Using Parsimony (*and 2. Walsh, B. J. (1864) Proc. Entomol. Soc. Philadelphia 3, 403–430. Other Methods) (Sinauer, Sunderland, MA), Version 4.0b8. 3. Walsh, B. J. (1867) J. Horticulture 2, 338–343. 20. Posada, D. & Crandall, K. A. (1998) Bioinformatics 14, 817–818. 4. Bush, G. L. (1966) The Taxonomy, Cytology and Evolution of the Genus 21. Hudson, R. R. & Kaplan, N. L. (1985) Genetics 111, 147–164. Rhagoletis in North America (Mus. Comp. Zool., Cambridge, MA). 22. Takahata, N., Satta, Y. & Klein, J. (1995) Theor. Popul. Biol. 48, 198–221. 5. Feder, J. F., Chilcote, C. A. & Bush, G. L. (1988) Nature 336, 61–64. 23. Yang, Z. (1997) Genet. Res. 69, 111–116. 6. McPheron, B. A., Smith, D. C. & Berlocher, S. H. (1988) Nature 336, 64–66. 24. Brower, A. V. Z. (1994) Proc. Natl. Acad. Sci. USA 91, 6491–6495. 7. Roethele, J. B., Romero-Severson, J. & Feder, J. L. (2001) Ann. Entomol. Soc. 25. Huelsenbeck, J. P. & Crandall, K. A. (1997) Annu. Rev. Ecol. Syst. 28, 437–466. Am. 94, 936–947. 26. Dopman, E. B., Sword, G. A. & Hillis, D. M. (2002) Evolution (Lawrence, 8. Feder, J. L., Roethele, J. B., Filchak, K. E., Niedbalski, J. & Romero-Severson, Kans.) 56, 731–740. J., Genetics 163, 939–953. 27. Ishii, K. & Charlesworth, B. (1977) Genet. Res. Camb. 30, 93–106. 9. Feder, J. L., Roethele, J. B., Wlazlo, B. & Berlocher, S. H. (1997) Proc. Natl. 28. Nei, M. & Li, W.-H. (1980) Genet. Res. 35, 65–83. Acad. Sci. USA 94, 11417–11421. 29. Graham, A. (1999) Am. J. Bot. 86, 32–38. 10. Feder, J. L., Stolz, U., Lewis, K. M., Perry, W., Roethele, J. B. & Rogers, A. 30. Xiang, Q., Brunsfeld, S. J., Soltis, D. E. & Soltis, P. S. (1996) Syst. Bot. 21, (1997) Evolution (Lawrence, Kans.) 51, 1862–1876. 515–534. 11. Feder, J. L. & Filchak, K. E. (1999) Entomol. Exp. Appl. 91, 211–225. 31. Hewitt, G. (2000) Nature 405, 907–913. 12. Filchak, K. E., Roethele, J. B. & Feder, J. L. (2000) Nature 407, 739–742. 32. Anderson, E. (1949) Introgressive Hybridization (Wiley, New York). 13. Feder, J. L., Hunt, T. A. & Bush, G. L. (1993) Entomol. Exp. Appl. 69, 117–135. 33. Stebbins, G. L. (1959) Proc. Am. Philos. Soc. 103, 231–251. 14. Feder, J. L., Chilcote, C. A. & Bush, G. L. (1990) Evolution (Lawrence, Kans.) 34. Feder, J. L., Opp, S., Wlazlo, B., Reynolds, K., Go, W. & Spisak, S. (1994) Proc. 44, 570–594. Natl. Acad. Sci. USA 91, 7990–7994. 15. Feder, J. L. & Bush, G. L. (1989) Heredity 63, 245–266. 35. Arnold, M. L. (1992) Annu. Rev. Ecol. Syst. 23, 237–261. 16. Berlocher, S. H. & McPheron, B. A. (1996) Heredity 77, 83–99. 36. Dowling, T. E. & Secor, C. L. (1997) Annu. Rev. Ecol. Syst. 28, 593–618. 17. Berlocher, S. H. (2000) Evolution (Lawrence, Kans.) 54, 543–557. 37. Grant, P. R. & Grant, B. R. (2002) Science 296, 707–711. 18. Tauber, M. J. & Tauber, C. A. (1976) Annu. Rev. Entomol. 21, 81–107. 38. Vollmer, S. & Palumbi, S. R. (2002) Science 296, 2023–2025. EVOLUTION

Feder et al. PNAS ͉ September 2, 2003 ͉ vol. 100 ͉ no. 18 ͉ 10319 Downloaded by guest on September 23, 2021